AI Workbench can expose a workspace as a Model Context Protocol server, so external agents — Claude Code, Cursor, Continue, hosted MCP gateways — can use the workspace as a context backend. The agent sees the workspace's read surface (KB search, documents, chats) as MCP tools and resources; it never sees the raw HTTP API or has to implement client code beyond the standard MCP SDK.
The façade is on by default. It shares the /api/v1/* auth
middleware and the workspace-scoped authz wrapper, so enabling it
does not widen the security boundary — it just exposes the existing
read surface over a second protocol. Disable explicitly with
mcp.enabled: false if you want a narrower surface than the REST API.
- Make sure
mcp.enabledis not set tofalseinworkbench.yaml(the default istrue). Optionally surface thechat_sendtool:mcp: # Optional: also expose `chat_send`, which routes a message # through the runtime's global chat service. Inherits the # `chat:` block; the tool is silently skipped when chat is # unconfigured. exposeChat: true
- Point an MCP client at
http://<your-runtime>/api/v1/workspaces/{workspaceId}/mcp.
The endpoint speaks Streamable HTTP — the modern MCP transport. Each request is stateless: no session id, no per-client state survives between requests.
mcp:
enabled: true | false # default: true
exposeChat: true | false # default: false; ignored when chat is unset| Field | Default | Notes |
|---|---|---|
enabled |
true |
When false the MCP route returns 404 not_found so the surface isn't probeable. |
exposeChat |
false |
Adds the chat_send tool. Requires the chat: block; without it the tool is silently skipped. |
The MCP route is mounted under /api/v1/workspaces/{w}/mcp, which
means the regular /api/v1/* auth middleware applies.
The shared workspace-route authorization wrapper is enforced on every
request: a scoped API key for workspace A cannot call MCP tools
against workspace B, even with the URL.
The default auth.mode: disabled (single-tenant dev runtime) lets
anonymous callers in. For any deployment exposing MCP to external
agents, set auth.mode: apiKey (or stricter) and mint a workspace
API key per agent.
Workspace API keys carry a scopes field (see
API-keys card in the workspace UI).
The MCP server enforces it on the write tools today:
| Tool | Required scope |
|---|---|
ingest_text, delete_document, create_knowledge_base, delete_knowledge_base |
write |
search_kb, list_*, get_agent, chat_send, run_agent |
read (passes any authenticated caller) |
A key minted with scopes: ["read"] will see read tools work
normally; a write-tool call returns isError: true with a JSON body
{ outcome: "denied", code: "scope_required", required: "write",
subjectScopes, message } — same shape MCP clients already handle
for tool failures, so LangGraph / CrewAI / ADK / MAF / watsonx surface
the denial as a regular tool error, not a transport-level 403.
OIDC + bootstrap operator credentials and anonymous (dev-mode) callers
have scopes: null and pass every gate — the scope check only fires
for concrete API-key subjects.
Denials are also recorded as mcp.invoke audit events with
outcome: "denied" (distinct from generic failure) so SIEM rules can
alert on bursts of scope rejections without parsing tool-specific
reason strings.
| Name | Args | Returns |
|---|---|---|
list_knowledge_bases |
none | JSON array of { knowledgeBaseId, name, description, status, language } |
list_agents |
none | JSON array of { agentId, name, description, knowledgeBaseIds, llmServiceId, rerankEnabled } |
get_agent |
{ agentId } |
Full agent configuration: prompts, tool ids, KB bindings, reranking overrides. |
list_documents |
{ knowledgeBaseId, limit? } |
JSON array of document metadata (documentId, sourceFilename, status, chunkTotal, contentHash, ingestedAt) |
search_kb |
{ knowledgeBaseId, text? | vector?, topK?, hybrid?, rerank? } |
JSON array of search hits (chunkId, score, documentId, content) |
list_chats |
{ agentId } |
JSON array of chat summaries (chatId, agentId, title, knowledgeBaseIds, createdAt) |
list_chat_messages |
{ chatId } |
Oldest-first message log (messageId, role, content, messageTs, metadata) |
ingest_text |
{ knowledgeBaseId, text, sourceFilename?, sourceDocId?, metadata?, overwriteOnNameConflict? } |
JSON envelope with one of three outcome values: completed (new document — documentId, sourceFilename, contentHash, chunks), duplicate (content-hash match — pipeline did not run; returns the existing documentId), or name_conflict (isError: true — filename matched but bytes differ; retry with overwriteOnNameConflict: true or pick a new name). Runs the same dedup + chunk + embed + upsert pipeline as the REST POST /ingest. Always synchronous from the MCP caller's POV. Requires the write scope on the calling key — read-only keys see isError: true + outcome: "denied" instead. |
delete_document |
{ knowledgeBaseId, documentId } |
JSON object with outcome: deleted (documentId, chunksDropped) or not_found (no row matched the id — returned without isError so speculative cleanup doesn't need to branch). Wraps the same cascade helper the REST DELETE /documents/{id} route uses; vector chunks come down first, then the control-plane row. Requires the write scope on the calling key. |
create_knowledge_base |
{ name, chunkingServiceId, embeddingServiceId, description?, rerankingServiceId?, language?, attach?, vectorCollection? } |
JSON envelope with outcome: "created" plus the new knowledgeBaseId, resolved vectorCollection, and owned flag. Wraps the same KnowledgeBaseService.create the REST POST /knowledge-bases route uses — so the collection-provision + rollback dance runs identically across front doors. Validation failures (kb_name_taken, collection_name_taken, embedding/dimension mismatch) return isError: true with a recognizable code. Requires the write scope on the calling key. |
delete_knowledge_base |
{ knowledgeBaseId } |
JSON object with outcome: deleted or not_found (idempotent — re-deleting a missing KB returns not_found without isError). For owned KBs, drops the underlying vector collection first; attached KBs are detached without touching the collection. Requires the write scope on the calling key. |
chat_send (opt-in) |
{ agentId, chatId, content } |
The assistant's reply as a single text block. Persists both turns through the runtime's global chat service; the system prompt falls back to DEFAULT_AGENT_SYSTEM_PROMPT when chat.systemPrompt is unset. Use run_agent when you want the tool to resolve or create the conversation for you. |
run_agent (opt-in) |
{ agentId, content, conversationId?, title? } |
JSON envelope { outcome, conversationId, agentId, content, finishReason, tokenCount, contextChunkIds }. One-call agent invocation — resolves (or creates) a conversation bound to the agent's KB set, then drives the same retrieval → prompt → complete → persist pipeline as chat_send. Honors the agent's stored systemPrompt. Returns outcome: "agent_not_found" / "chat_not_found" / "completion_error" for failure shapes; "completed" on success. |
All tool results are returned as a single MCP text content item
containing JSON; clients parse it into native objects. This keeps
the wire format predictable across providers that handle structured
content differently.
The façade is mostly retrieval-shaped (search_kb, list_*) so
external agents can ground their reasoning in the workspace. Two
write tools — ingest_text and delete_document — were added once
the LangGraph / CrewAI / ADK story made it clear that recording
what an agent gathers (and cleaning up afterward) is half the value
of the integration. Larger mutations (KB CRUD, workspace mutation,
service CRUD) stay off the surface. Reasons:
- Blast radius. A misbehaving agent that can
search_kbis a performance / cost concern; one that candelete_kbis a data-loss concern.ingest_textfalls in the middle — its only observable effect is more KB content, which is reversible bydelete_document.delete_documentitself is scoped to a single document at a time (no "delete by filter" surface) so the radius stays predictable. - Auth has a coarse two-tier scope today. Workspace API keys
carry
["read"]or["read", "write"]; the write tools (ingest_text,delete_document) refuse anything without"write". See the Per-tool scopes subsection in Auth above for the deny envelope. The split is intentionally narrow — addingwrite:ingest/write:adminis an additive change when a use case demands it. - Most useful surface first. Retrieval is the killer feature for an MCP integration; ingestion is the most-asked-for write tool; delete pairs naturally with ingest for agents that maintain their own KB; everything else is incremental.
chat_send is exposed under a separate flag because it's the only
tool that costs HuggingFace tokens. ingest_text and
delete_document are unflagged: their cost is bounded by the
chunker + embedder on the workspace, which the operator already
controls through the regular ingest config.
Streamable HTTP supports SSE-formatted responses for long-running
tool calls; the SDK uses them automatically when the server
chooses. Today our tool implementations are synchronous (the only
long-running one is chat_send, and we return its full reply at
once rather than streaming progress notifications), but the
transport is ready when we add a streaming variant.
For the chat UI's own streaming, see
agents.md — it uses the
POST /agents/{a}/conversations/{c}/messages/stream endpoint that
emits structured SSE events tailored to the UI rather than going
through MCP.
The MCP endpoint uses SSE (Server-Sent Events) to stream JSON-RPC responses. Most reverse proxies and local-tunnel tools work fine, but there are a few gotchas:
Quick tunnels (cloudflare tunnel --url ...) buffer SSE aggressively.
The client often sees an empty body or a stalled connection because
Cloudflare holds chunks until a flush threshold is reached or the
connection closes — the opposite of what SSE needs.
Recommended alternatives for public dev access:
| Option | Notes |
|---|---|
| Cloudflare Tunnel (named) | cloudflare tunnel create <name> + cloudflare tunnel route dns — persistent, named tunnels flush SSE correctly. |
| ngrok | ngrok http 8080 — SSE works reliably out of the box. |
| Real reverse proxy | nginx / Caddy with proxy_buffering off (nginx) or default Caddy config both pass SSE through without buffering. |
Add to the location block that proxies the runtime:
location /api/v1/ {
proxy_pass http://localhost:8080;
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 3600s;
proxy_set_header Connection '';
chunked_transfer_encoding on;
}Without proxy_buffering off, nginx accumulates the SSE stream and
delivers it in one shot when the connection closes — which looks like a
hanging request from the MCP client's perspective.
Most MCP clients require the endpoint URL to use https://. For local
development this means either:
- a named tunnel / ngrok (both provide HTTPS automatically), or
- a local TLS terminator (Caddy's
localhostcert,mkcert+ nginx).
http://localhost:8080/... works fine if your MCP client explicitly
allows plain HTTP local addresses.
| Symptom | Why | Fix |
|---|---|---|
404 not_found from /.../mcp |
mcp.enabled: false was set explicitly (the default is true). |
Remove mcp.enabled: false from workbench.yaml (or flip it to true). |
404 workspace_not_found |
Path workspace id doesn't exist. | Check the workspace id. |
401 / 403 |
Caller lacks access. | Verify the API key scope (workspace match). |
chat_send tool isn't registered |
exposeChat: false, OR chat: is unset. |
Set exposeChat: true AND wire the chat: block. |
- Specification — the MCP wire protocol.
docs/configuration.md— fullworkbench.yamlschema.docs/auth.md— the auth surface MCP inherits.docs/agents.md— the agent surface that the chat UI uses; thechat_sendMCP tool wraps the runtime's global chat service.