The AI Workbench HTTP contract. Every green box — the default TypeScript runtime and any future language-native runtime — serves this surface. Conformance is enforced by cross-runtime fixtures.
The machine-readable OpenAPI document is served at
/api/v1/openapi.json, and a Scalar-rendered reference UI is served
at /docs. This document exists to explain the shape narratively and
to flag what's coming.
- Functional routes live under
/api/v1/…. - Operational routes (
/,/healthz,/readyz,/version,/features,/metrics,/astra-cli,/astra-cli/profiles,/docs,/api/v1/openapi.json) are unversioned. - Breaking changes bump the prefix to
/api/v2/…;/api/v1/…stays until deprecated.
- Request and response bodies are JSON (
application/json). - Streaming endpoints use
text/event-stream. Today: async-ingest job progress atGET /jobs/{jobId}/events.
- All IDs are RFC 4122 v4 UUIDs rendered as lowercase hyphenated strings.
- Timestamps are ISO-8601 in UTC with millisecond precision
(
2026-04-22T10:11:12.345Z). - Secrets never appear by value. Fields like
credentialsorembedding.secretRefhold pointers of the form<provider>:<path>(e.g.env:ASTRA_DB_APPLICATION_TOKEN).
Every nested resource carries its parent IDs in the path:
/api/v1/workspaces/{workspaceId}
/api/v1/workspaces/{workspaceId}/knowledge-bases/{knowledgeBaseId}
/api/v1/workspaces/{workspaceId}/knowledge-bases/{kb}/documents/{documentId}
/api/v1/workspaces/{workspaceId}/{chunking,embedding,reranking}-services/{serviceId}
A request whose path references a non-existent workspace returns
404 workspace_not_found before the nested resource is ever
consulted.
Control-plane list endpoints accept:
limit— number of items to return, 1–200, default 50.cursor— opaque value from the previous page'snextCursor.
Paginated responses use:
{
"items": [],
"nextCursor": null
}When nextCursor is non-null, pass it back as ?cursor=... to read
the next page. Malformed cursors return 400 invalid_cursor.
All error responses share one envelope:
{
"error": {
"code": "workspace_not_found",
"message": "workspace '<workspaceId>' not found",
"requestId": "b48e…"
}
}Codes are stable, lowercase, snake_case. Messages are
human-readable and may change. Currently emitted:
| Status | Code | When |
|---|---|---|
| 400 | validation_error |
Request body / params / query failed Zod validation. message carries the first failing field path and its reason (name: Name is required, credentials.token: expected '<provider>:<path>', e.g. 'env:FOO'). |
| 401 | unauthorized |
Missing / malformed / invalid bearer token. WWW-Authenticate: Bearer set. See auth.md. |
| 403 | forbidden |
Token is valid but not authorized for the requested action — either the subject's workspaceScopes doesn't include the target workspace, or it's a scoped subject attempting a platform-level action (e.g. POST /workspaces). Also reserved for role-based checks in the upcoming RBAC phase. |
| 413 | payload_too_large |
/api/v1/workspaces/* request body exceeded the runtime's 10 MB default JSON body limit, or an ingest request exceeded the 50 MB ingest-only limit. |
| 404 | not_found |
Unknown route |
| 404 | workspace_not_found |
Workspace ID doesn't exist |
| 404 | knowledge_base_not_found |
Knowledge-base ID doesn't exist in workspace |
| 404 | document_not_found |
Document ID doesn't exist in the knowledge base |
| 404 | chunking_service_not_found / embedding_service_not_found / reranking_service_not_found |
Service ID doesn't exist in workspace |
| 404 | job_not_found |
Job ID doesn't exist in the workspace |
| 409 | conflict |
Create with an already-taken ID, or service deletion refused while a KB still references it |
| 501 | hybrid_not_supported |
Caller asked for hybrid search on a workspace kind whose driver doesn't implement searchHybrid |
| 501 | rerank_not_supported |
Caller asked for rerank on a workspace kind whose driver doesn't implement rerank |
| 400 | dimension_mismatch |
Supplied vector length doesn't match the KB's bound embedding service |
| 400 | embedding_unavailable |
Text search/upsert fallback could not build an embedder for the KB's bound embedding service |
| 400 | embedding_dimension_mismatch |
Embedder output dimension doesn't match the bound embedding service |
| 422 | workspace_misconfigured |
Workspace is missing url, token, keyspace, or similar driver-required config |
| 500 | internal_error |
Unhandled exception |
| 503 | control_plane_unavailable |
Backing store is unreachable |
| 503 | collection_unavailable |
Underlying vector collection is unreachable or missing |
| 503 | driver_unavailable |
Workspace kind has no registered vector-store driver |
/api/v1/* runs through a configurable auth middleware. The
default posture (auth.mode: disabled) tags every request
anonymous and lets it through — same behavior as before the
middleware existed. Flip auth.mode to turn enforcement on. See
auth.md for the full contract, config, and rollout
plan.
Header format is Authorization: Bearer <token> (RFC 6750). On
failure the response carries WWW-Authenticate: Bearer and the
canonical error envelope:
{ "error": { "code": "unauthorized", "message": "…", "requestId": "…" } }Operational routes (/, /healthz, /readyz, /version,
/features, /metrics, /astra-cli, /astra-cli/profiles,
/docs, /api/v1/openapi.json) bypass the middleware so
load balancers and ops tooling can always reach them.
API-key issuance, OIDC bearer verification, browser OIDC login, and
silent token refresh are all implemented. All verifier modes flow
through the same middleware — routes don't need to care which
verifier accepted the token. Browser-only /auth/* routes
(/auth/config, /auth/login, /auth/callback, /auth/me,
/auth/refresh, /auth/logout) are documented in
auth.md rather than here.
Every response carries X-Request-Id. If the client supplies one,
the runtime echoes it; otherwise the runtime generates a UUID-hex
string. Error responses include the same value in error.requestId.
Service banner.
Response 200
{
"name": "ai-workbench",
"version": "0.0.0",
"commit": "abc1234",
"docs": "/docs"
}Liveness. Returns 200 as long as the process is running.
{ "status": "ok" }Readiness. 200 once the control-plane store is reachable and
workspaces can be listed. The payload carries a workspace count
rather than a list — avoids O(N) responses when the store grows.
{ "status": "ready", "workspaces": 3 }Returns 503 draining during graceful shutdown (SIGINT /
SIGTERM). Kubernetes-style readiness probes will stop routing
traffic while the runtime finishes in-flight requests. See
configuration.md for the
drain sequence. /healthz stays 200 throughout so
livenessProbe doesn't restart a healthy, draining process.
Build metadata.
{
"version": "0.0.0",
"commit": "abc1234",
"buildTime": "2026-04-21T10:30:00Z",
"node": "v22.11.0"
}Runtime feature flags the bundled web UI reads to decide which surfaces to render. Reflects the active config (chat enabled, MCP enabled, auth posture, astra-cli inventory available, etc.). Never echoes secrets.
Prometheus exposition (text/plain; version=0.0.4). HTTP request
counter + duration histogram labeled by method, matched route
pattern, and status family (2xx/4xx/5xx); ingest semaphore
gauges (workbench_ingest_workers_{active,queued}); rate-limit
rejections by key type. No auth — same precedent as
/healthz / /readyz.
Auto-detected astra CLI defaults the runtime resolved at boot
(active profile, default org, default DB id + name + endpoint, etc.).
The web UI reads this to pre-fill the workspace onboarding form.
Returns an empty payload when no CLI / profile is configured.
Live shellout: lists every configured astra CLI profile and the
databases visible to each. Drives the profile picker in the
onboarding wizard. May take seconds depending on Astra API latency;
not part of the hot path.
Scalar-rendered OpenAPI reference UI. Human-facing.
Machine-readable OpenAPI 3.1 document. Generated from the route definitions — always in sync with the running runtime.
List all workspaces, sorted by createdAt ascending with workspaceId as
tie-breaker. Every backend (memory / file / astra) produces the same
ordering so UI renders are deterministic.
Response 200 — paginated Workspace records:
{
"items": [
{
"workspaceId": "…",
"name": "prod",
"url": "env:ASTRA_DB_API_ENDPOINT",
"kind": "astra",
"credentials": { "token": "env:ASTRA_DB_APPLICATION_TOKEN" },
"keyspace": "default_keyspace",
"createdAt": "2026-04-22T10:11:12.345Z",
"updatedAt": "2026-04-22T10:11:12.345Z"
}
],
"nextCursor": null
}Create a workspace. workspaceId is optional — the runtime generates one if
omitted.
Request
{
"name": "prod",
"kind": "astra",
"url": "env:ASTRA_DB_API_ENDPOINT",
"credentials": { "token": "env:ASTRA_DB_APPLICATION_TOKEN" },
"keyspace": "default_keyspace"
}kind is one of astra | hcd | openrag | mock. (mock stays a
first-class option for CI and offline work.) Once set, kind is
immutable — changing it would orphan any already-provisioned
KB collections.
url is the workspace's data-plane URL (for astra / hcd,
the Astra Data API endpoint). Accepts either a literal URL or a
SecretRef — the driver resolves refs at dial time so the same
record works across dev and prod without code changes.
Each value in credentials must be a SecretRef
(<provider>:<path>, e.g. env:ASTRA_DB_APPLICATION_TOKEN or
file:/etc/workbench/secrets/astra-token). Raw secret values are
rejected with 400.
Response 201 — the created Workspace.
Fetch a single workspace.
- 200 —
Workspace - 404
workspace_not_found
Patch one or more of name, url, credentials,
keyspace. Every field is optional; omitted fields are preserved.
kind and workspaceId are immutable after creation and are rejected with
400. Unknown fields are likewise rejected (strict body).
- 200 — updated
Workspace - 400 — body contains
kindor an unknown field - 404
workspace_not_found
Cascades to the workspace's knowledge bases, execution services, RAG documents, and API keys. Before removing the control-plane rows, the runtime drops each KB's underlying Astra collection through the workspace's driver.
- 204 — deleted
- 404
workspace_not_found - 503
driver_unavailable— workspace has knowledge bases but no registered driver to drop their collections
Run a live workspace connection check. For mock workspaces, this
always returns ok: true. Remote backends resolve their configured
connection details and ask the driver to make a data-plane call.
Response 200 — always 200 regardless of check outcome; the
ok field distinguishes success from failure:
{
"ok": true,
"kind": "astra",
"details": "Astra Data API responded to listCollections."
}{
"ok": false,
"kind": "astra",
"details": "credential 'token' could not be resolved: env var 'ASTRA_DB_APPLICATION_TOKEN' is not set"
}- 200 — probe executed; inspect
okfor pass/fail - 404
workspace_not_found
Workspace-scoped bearer tokens. Documented in auth.md;
re-capped here for the route contract.
List every key ever issued for the workspace, including revoked
ones. Never exposes the hash column.
An ApiKey:
{
"workspaceId": "…",
"keyId": "…",
"prefix": "abc123xyz789",
"label": "ci",
"createdAt": "…",
"lastUsedAt": null,
"revokedAt": null,
"expiresAt": null
}- 200 — paginated
ApiKeyrecords - 404
workspace_not_found
Issue a new key. The plaintext is returned exactly once — the runtime stores only a scrypt digest.
Request
{ "label": "ci", "expiresAt": null }Response 201
{
"plaintext": "wb_live_abc123xyz789_…",
"key": { "...ApiKey..." }
}- 201 — created;
plaintextis the only time you'll see the token - 400 — missing / empty label
- 404
workspace_not_found
Soft-revoke: stamps revokedAt, leaves the row visible so audit
tools still see the history. The next request bearing this token
gets 401 unauthorized. Re-revoking an already-revoked key is a
no-op that still returns 204.
- 204 — revoked (or was already revoked)
- 404
workspace_not_found/api_key_not_found
Workspace-scoped execution services. Knowledge bases compose one chunking + one embedding + (optionally) one reranking service at create time. The three surfaces share an identical CRUD shape; only the body fields differ.
List services in the workspace.
- 200 — paginated
ChunkingService/EmbeddingService/RerankingServicerecords (sorted bycreatedAtascending,*ServiceIdas tie-breaker) - 404
workspace_not_found
Create a service. The runtime generates the service ID if omitted. Required fields by kind:
| Kind | Required |
|---|---|
| chunking | name, engine |
| embedding | name, provider, modelName, embeddingDimension |
| reranking | name, provider, modelName |
Optional fields cover endpoint config (endpointBaseUrl,
endpointPath, requestTimeoutMs, authType, credentialRef),
provider/engine tuning, and supported language/content tags. See
the OpenAPI spec for the full per-kind shape.
{
"name": "openai-3-small",
"provider": "openai",
"modelName": "text-embedding-3-small",
"embeddingDimension": 1536,
"distanceMetric": "cosine",
"endpointBaseUrl": "https://api.openai.com/v1",
"credentialRef": "env:OPENAI_API_KEY",
"supportedLanguages": ["en", "fr"],
"supportedContent": ["text"]
}supportedLanguages and supportedContent arrive as arrays and are
returned deduplicated + sorted on the wire. (Astra-row layer keeps
them as SET<TEXT>; the converter normalises at the boundary.)
- 201 — the created record (with the generated
*ServiceId) - 400
validation_error— schema failure - 404
workspace_not_found - 409
conflict—*ServiceIdcollision
Fetch / patch / delete. PATCH accepts every field from create
(all optional). Strict bodies — unknown keys return 400.
DELETE is refused with 409 conflict while any KB still
references the service. Drop or rebind the dependent KBs first.
The error message names the offending KB so operators can navigate
straight to it.
A knowledge base is the runtime's atomic retrieval unit: a logical
group of documents indexed by exactly one embedding service and one
chunking service, optionally re-ranked by one reranker. Creating a
KB through POST does four things in lockstep:
- Validate the requested collection shape. Owned KBs use the
KB
nameas the underlying collection identifier. Attach-mode KBs (attach: true) must supplyvectorCollection, and the supplied value must equalnameso the KB row and data-plane collection cannot drift apart. - Insert the control-plane row. The
KnowledgeBaserecord is written before owned collection provisioning; if provisioning fails, the runtime rolls the row back so callers never observe a KB that points at a missing collection. - Materialize the underlying vector collection on the workspace's
driver. The driver (
mockfor tests,astrafor production) creates a collection sized for the bound embedding service'sembeddingDimensionwith the requestedvectorSimilarity. For Astra workspaces with anastra-provider embedding service, the collection is provisioned with aservice:block so embedding runs server-side (see Configuration §Vectorize-on-ingest). Attach mode skips this step and binds to the existing data-plane collection after validating compatibility. - Seed any default knowledge filters declared on the workspace.
Filters are mutable post-create via
POST /{kb}/filters.
Collection naming. Owned KBs derive vectorCollection from
name, and the KB name must match Astra collection-name rules
(letters, digits, underscores; starts with a letter; max 48 chars).
To adopt a pre-existing collection, set attach: true and supply
that collection name as both name and vectorCollection; the
driver verifies its dimension and vectorize provider/model match the
bound embedding service before the row is accepted. Renaming after
create is not supported because the name is the collection
identifier.
Idempotence. POST is not idempotent on its own — re-issuing
the same request creates a second KB with a fresh knowledgeBaseId.
To make creation safe to retry, supply an explicit knowledgeBaseId
in the body; if the row already exists with the same name and
service bindings, the route returns 409 conflict rather than
mutating the existing KB. Drop the KB explicitly before re-creating.
Dimension binding. The bound embedding service's
embeddingDimension is captured into the collection at create time
and is not re-checked on subsequent ingest / search calls — the
driver trusts the collection's dimension. Changing the embedding
service binding via PATCH is rejected (the field is immutable)
because the collection's stored vectors would no longer match the
new service's dimension.
Cascade on DELETE. The route drops the underlying collection
before the control-plane row so a partial failure leaves the KB
intact. Once the collection is gone, the row is removed and the
cascade clears RAG documents, knowledge filters, and any conversation
references in agent.knowledgeBaseIds /
conversation.knowledgeBaseIds.
List knowledge bases in the workspace.
- 200 — paginated
KnowledgeBaserecords - 404
workspace_not_found
A KnowledgeBase:
{
"workspaceId": "…",
"knowledgeBaseId": "…",
"name": "support-docs",
"description": "customer support knowledge base",
"status": "active",
"embeddingServiceId": "…",
"chunkingServiceId": "…",
"rerankingServiceId": null,
"language": "en",
"vectorCollection": "support_docs",
"lexical": { "enabled": false, "analyzer": null, "options": {} },
"createdAt": "…",
"updatedAt": "…"
}Create a KB and auto-provision its underlying Astra collection. Transactional — if collection provisioning fails, the KB row is rolled back so the control plane and data plane never drift.
For owned KBs, omit vectorCollection; the runtime uses name as
the collection name. To adopt a pre-existing collection, set
attach: true and supply the same collection name in both name and
vectorCollection.
Request
{
"name": "support-docs",
"description": "customer support",
"embeddingServiceId": "…",
"chunkingServiceId": "…",
"rerankingServiceId": null,
"language": "en"
}embeddingServiceId and chunkingServiceId are required. Both
must reference services that exist in the same workspace.
- 201 — the created
KnowledgeBase(collection now exists) - 404
workspace_not_found/embedding_service_not_found/chunking_service_not_found/reranking_service_not_found - 409
conflict—knowledgeBaseIdcollision - 422
workspace_misconfigured— workspace is missingurlorcredentials.tokenrequired by its driver - 503
driver_unavailable— no driver registered for the workspace'skind
GET reads the record. PATCH accepts a partial — description,
status, rerankingServiceId, language, and lexical are
mutable; name, embeddingServiceId, and chunkingServiceId are
immutable post-create and the schema is .strict(), so accidentally
including them in a body returns 400. DELETE drops the underlying
collection first for owned KBs, then the KB row, then cascades RAG
document rows. Attached KBs detach without dropping the external
collection.
Discover Astra collections in the workspace's keyspace that aren't already bound to a knowledge base. The web UI uses this to populate the "attach an existing collection" picker on the create-KB flow.
- 200 —
{ "items": [ { "name": string, "vectorDimension": number | null, "vectorMetric": string | null } ] } - 404
workspace_not_found - 422
workspace_misconfigured— workspace driver missing required config - 503
driver_unavailable
Workspace-scoped, KB-scoped saved retrieval filters. They are shallow-equal payload constraints applied at search time without requiring the caller to remember the exact JSON. Used by the playground's filter dropdown and by agents that want pre-defined narrowings.
| Method | Path | Purpose |
|---|---|---|
GET |
/{kb}/filters |
List filters in the KB (paginated) |
POST |
/{kb}/filters |
Create. Body: { knowledgeFilterId?, name, description?, filter }. 409 on duplicate explicit ID. |
GET |
/{kb}/filters/{filterId} |
Fetch one |
PATCH |
/{kb}/filters/{filterId} |
Mutate name, description, or filter |
DELETE |
/{kb}/filters/{filterId} |
204 |
filter is the same shape as POST /search's filter body — a
shallow-equal map over payload keys. Filters are seeded from the
workspace's configured defaults at KB-create time.
Request — each record carries exactly one of vector or text:
{
"records": [
{ "id": "doc-1", "vector": [0.01, -0.02, ...], "payload": { "title": "…" } },
{ "id": "doc-2", "text": "winter sweater in blue" },
{ "id": "doc-3", "text": "summer shorts", "payload": { "tag": "apparel" } }
]
}records— 1..500 items per request.idis the application's identifier; re-upsert replaces the prior value.vector.lengthmust equal the bound embedding service'sembeddingDimension.- Text dispatch mirrors search: the route tries
driver.upsertByText()for all-text batches (Astra$vectorizeinserts for collections with a service block). OnNotSupportedErrorthe runtime embeds each text record via the KB's bound embedding service and retries through plainupsert. Mixed batches always embed client-side so the whole batch stays in one transactional call.
Response 200
{ "upserted": 2 }- 400
validation_error— record has neither/both ofvector/text - 400
dimension_mismatch— vector length doesn't match the bound embedding service'sembeddingDimension - 400
embedding_unavailable/embedding_dimension_mismatch - 404
workspace_not_found/knowledge_base_not_found
Delete a single record. recordId is the application's id (any
non-empty string).
{ "deleted": true }Request — exactly one of vector or text, plus optional
hybrid / lexicalWeight / rerank:
{
"text": "how do refunds work?",
"topK": 5,
"filter": { "section": "billing" },
"hybrid": true,
"lexicalWeight": 0.3,
"rerank": true
}topKdefaults to 10, clamped to[1, 1000].filteris shallow-equal on payload keys.hybrid: trueruns the driver's vector + lexical lane (defaults to the KB'slexical.enabled). Requirestext.rerank: truereorders hits through the KB's bound reranking service. Defaults totruewhenrerankingServiceIdis non-null. Requirestext.
The route synthesises a driver-facing descriptor from the KB plus
its bound services (see kb-descriptor.ts) so the dispatch layer
stays unchanged.
Response 200 — array of hits, sorted by score descending:
[
{ "id": "doc-1", "score": 0.94, "payload": { "title": "…" } },
{ "id": "doc-2", "score": 0.87, "payload": { "title": "…" } }
]Score semantics match the bound embedding service's
distanceMetric:
| Metric | Score |
|---|---|
cosine |
Cosine similarity in [-1, 1]; 1 = exact match |
dot |
Raw dot product; unbounded |
euclidean |
1 / (1 + distance) so higher = closer |
- 400
validation_error— neither/both ofvector/text, orhybrid/rerankwithouttext - 400
dimension_mismatch/embedding_unavailable/embedding_dimension_mismatch - 404
workspace_not_found/knowledge_base_not_found - 501
hybrid_not_supported/rerank_not_supported
List RAG documents in the KB.
- 200 — paginated
RagDocumentrecords - 404
workspace_not_found/knowledge_base_not_found
A RagDocument:
{
"workspaceId": "…",
"knowledgeBaseId": "…",
"documentId": "…",
"sourceDocId": null,
"sourceFilename": "readme.md",
"fileType": "text/markdown",
"fileSize": 1024,
"contentHash": "sha256:…",
"chunkTotal": null,
"ingestedAt": null,
"updatedAt": "…",
"status": "pending",
"errorMessage": null,
"metadata": { "source": "upload" }
}status is one of pending | chunking | embedding | writing | ready | failed. The KB ingest pipeline is the canonical writer of
status / errorMessage / chunkTotal / ingestedAt. Clients
can also set these directly via PATCH if they own the lifecycle
externally.
Register a document in the KB without running the ingest pipeline.
{
"sourceFilename": "readme.md",
"fileType": "text/markdown",
"fileSize": 1024,
"contentHash": "sha256:…",
"metadata": { "source": "upload" }
}- 201 — the created
RagDocument(statusdefaults topending,metadatadefaults to{}) - 404
workspace_not_found/knowledge_base_not_found - 409
conflict—workspaceIdcollision within the same KB
Fetch / patch / delete. PATCH accepts every field from create (all
optional). DELETE cascades into the KB's collection: chunks
matched by payload.documentId are removed before the row is
dropped, so a successful delete leaves no traces in KB-scoped
search. Drivers exposing deleteRecords use a single bulk call;
older drivers fall back to a listRecords + per-row delete loop.
Lists the chunks the ingest pipeline extracted from this document.
Reads raw records out of the KB's collection filtered on
documentId, sorts by the chunkIndex payload key, and returns:
[
{
"id": "<documentId>:0",
"chunkIndex": 0,
"text": "First paragraph about apples.",
"payload": {
"knowledgeBaseId": "…",
"documentId": "…",
"chunkIndex": 0,
"chunkText": "First paragraph about apples.",
"source": "seed"
}
}
]Query params:
-
limit(1–1000, default 1000) — caps the number of chunks returned. -
200 — array of chunks, sorted by
chunkIndexascending -
404
workspace_not_found/knowledge_base_not_found/document_not_found -
501
list_records_not_supported— driver doesn't exposelistRecords
Synchronous end-to-end ingest. Chunks the input text, embeds every
chunk through the KB's bound embedding service (server-side via
$vectorize where the driver supports it, otherwise client-side),
upserts the chunks into the KB's collection, and creates a
RagDocument row with status: ready + chunkTotal.
Request
{
"text": "Apples are red. Bananas are yellow.",
"sourceFilename": "fruit.md",
"metadata": { "source": "seed" },
"chunker": { "maxChars": 1000, "minChars": 100, "overlapChars": 150 }
}chunker overrides the runtime defaults for this call only.
metadata is merged onto every chunk's payload; the reserved keys
knowledgeBaseId, documentId, chunkIndex, and chunkText are
always set by the runtime and override any caller-supplied values.
text is capped at 200,000 characters.
Response 201
{
"document": { "status": "ready", "chunkTotal": 3, "...": "..." },
"chunks": 3
}Chunk payloads. Every chunk upserted carries:
knowledgeBaseId— the KB's ID (used by/search)documentId— the ID of theRagDocumentrow this ingest createdchunkIndex— 0-based position within the source documentchunkText— the chunk's raw text (read back through/chunks)- Plus every caller-supplied
metadatakey
Failure semantics. When chunking or upsert throws, the
RagDocument row is marked status: failed with errorMessage
before the error is re-raised.
Same body. The pipeline runs in the background; the response returns immediately with a job pointer.
Response 202
{
"job": {
"workspaceId": "…",
"jobId": "…",
"kind": "ingest",
"knowledgeBaseId": "…",
"documentId": "…",
"status": "pending",
"processed": 0,
"total": null,
"result": null,
"errorMessage": null,
"createdAt": "…",
"updatedAt": "…"
},
"document": { "status": "writing", "…": "…" }
}Errors are the same set as the sync path. A 4xx means the request was rejected outright; nothing was enqueued and no job row exists.
Once the job is running, failures are captured into the job record
(status: failed, errorMessage populated) and the document row.
The runKbIngestJob worker resolves the KB descriptor on every
call so renames or service swaps mid-flight don't drift.
Multipart counterpart to /ingest. Accepts a binary upload (PDF,
DOCX, XLSX, or text) plus optional metadata, dispatches an
extractor based on the file's MIME type / extension, then runs the
same chunk → embed → upsert pipeline.
Form fields:
| Field | Required | Notes |
|---|---|---|
file |
yes | The document bytes. Must be a File part in multipart/form-data. |
metadata |
no | JSON object string merged onto every chunk's payload (same semantics as the JSON /ingest metadata field). |
chunker |
no | JSON object string overriding the runtime's chunker defaults for this call only. |
parser |
no | native | docling | auto (default). When DOCLING_URL is unset, native is the only option. See Configuration § Document extraction. |
Query: ?async=true → 202 + job pointer (same response shape as
the JSON variant). Body cap is 50 MB.
- 201 —
{ document, chunks } - 202 —
{ job, document }(whenasync=true) - 400
invalid_multipart/missing_file— body wasn't multipart, or thefilefield was missing - 400
validation_error— badmetadata/chunkerJSON - 400
extractor_unsupported— file type the runtime can't extract - 413
payload_too_large— body exceeded 50 MB - 503
docling_unavailable—parser=docling(orauto) couldn't reach the configured docling-serve
Job poll surface for anything that runs in the background. Today only async ingest creates jobs; future bulk ops (reindex, export, batch delete) plug in with the same record shape.
Point-in-time fetch, suitable for polling. Returns the Job
record described above.
- 200 —
Job - 404
job_not_found
Server-Sent Events stream. Emits event: job with the full record
as JSON on every update, plus a final event: done carrying
{ status } when the job hits a terminal state. The current record
is replayed as the first job event so clients don't race the
first update.
Headers: Content-Type: text/event-stream, Cache-Control: no-cache.
Same-replica updates fan out immediately through the in-process
subscription registry. With the Astra job store, subscribers on
other replicas poll the subscribed job records at
controlPlane.jobPollIntervalMs so an SSE client can see progress
even when the worker is running on a different pod. The memory and
file job stores remain single-replica deployment shapes.
| Field | Type | Notes |
|---|---|---|
workspaceId |
uuid | Owning workspace |
jobId |
uuid | |
kind |
"ingest" |
Discriminator — more kinds arrive with more async ops |
knowledgeBaseId |
uuid or null | Set for ingest jobs |
documentId |
uuid or null | Set for ingest jobs |
status |
"pending" | "running" | "succeeded" | "failed" |
Terminal: succeeded, failed |
processed |
int | Units completed |
total |
int or null | Units expected (null if unknown) |
result |
object or null | Kind-specific summary on success (ingest: { chunks: N }) |
errorMessage |
string or null | Populated on failed |
leasedBy |
string or null | Replica currently driving the job |
leasedAt |
iso-8601 or null | Last heartbeat from the lease holder |
ingestInput |
object or null | Persisted ingest snapshot used for orphan replay |
createdAt |
iso-8601 | |
updatedAt |
iso-8601 |
Persistence. The job store auto-matches the control-plane driver:
controlPlane.driver: memory→ jobs live in-process (lost on restart).controlPlane.driver: file→ jobs serialize to<controlPlane.root>/jobs.jsonalongside workspaces.json, survive restart.controlPlane.driver: astra→ jobs live inwb_jobs_by_workspace, reusing the existing Data API connection; durable across restart and across replicas. Subscriptions poll across replicas while local updates still fan out immediately.
Clustered Astra deployments can set
controlPlane.jobsResume.enabled: true. Running workers then stamp
leasedBy / leasedAt; the orphan sweeper claims stale leases and,
when ingestInput is present, replays the ingest pipeline. Chunk IDs
are deterministic, so replay is idempotent. Older jobs without an
input snapshot, or future job kinds that cannot replay yet, are
claimed and marked failed so clients still see a terminal state.
Workspace-scoped LLM execution services — describe how to call a
chat-completion or generation model. Mirrors the
chunking / embedding / reranking service surface. An agent in the
same workspace may bind one of these via agent.llmServiceId; the
agent's send + streaming pipeline then instantiates a chat service
from the bound record.
Today provider: "huggingface" and provider: "openai" are wired
end-to-end; other providers can be created and stored, but agent send
returns 422 llm_provider_unsupported until their adapters land.
List services in the workspace, oldest-first. Paginated.
- 200 — paginated
LlmServicerecords - 404
workspace_not_found
Create a service. Required: name, provider, modelName.
Optional fields cover endpoint config (endpointBaseUrl,
endpointPath, requestTimeoutMs, authType, credentialRef),
provider tuning (engine, modelVersion, contextWindowTokens,
maxOutputTokens, temperatureMin, temperatureMax,
supportsStreaming, supportsTools, maxBatchSize), and
language / content tags. See the OpenAPI spec for the full shape.
{
"name": "hf-mistral",
"provider": "huggingface",
"modelName": "mistralai/Mistral-7B-Instruct-v0.3",
"credentialRef": "env:HUGGINGFACE_API_KEY",
"maxOutputTokens": 1024
}- 201 — the created
LlmService - 400
validation_error - 404
workspace_not_found - 409
conflict— duplicate explicitllmServiceId
Fetch / patch / delete. PATCH accepts every field from create
(all optional). DELETE is refused with 409 conflict while any
agent still references the service via llmServiceId. Reassign
or delete the dependent agents first.
User-defined agents — workspace-scoped personas backed by the
Stage-2 agentic tables. See agents.md for the full
walkthrough; the route shapes are summarised below.
Historical note. Earlier drafts of this document described a parallel
/chatsroute surface and a singleton "Bobbie" agent. Both were retired; the agent surface is the single way to chat against a workspace.
List agents in the workspace, oldest-first. Paginated.
- Body:
CreateAgentInput(seeagents.md). - 201 —
Agent - 404 — workspace not found
- 409 — duplicate explicit
agentId
- 200 —
Agent
Patch any optional field except agentId. Sends null to clear
nullable fields (including llmServiceId).
204; cascades the agent's conversations and their messages.
List the agent's conversations, newest-first. Paginated.
- Body:
CreateConversationInput({ conversationId?, title?, knowledgeBaseIds? }). - 201 —
Conversation - 404 — workspace or agent not found
Single-conversation read / update (title + KB filter) / delete. Delete cascades messages. 404 when the conversation does not belong to the named agent.
Oldest-first message log, paginated.
- 200 — paginated
ChatMessagerecords - 404 when the workspace, agent, or conversation does not exist, or when the conversation does not belong to the named agent
Body: { content }. Persists the user turn, retrieves grounding
context, calls the agent's LLM (per the resolution order below),
persists the assistant turn, and returns:
{ "user": <ChatMessage>, "assistant": <ChatMessage> }LLM resolution. When agent.llmServiceId is set the runtime
instantiates a chat service from the bound LLM-service record.
When unset it falls back to the runtime's global chat: block.
- 201 —
{ user, assistant } - 404 when the conversation does not belong to the named agent
- 422
llm_provider_unsupported—agent.llmServiceIdpoints at an LLM service whoseprovideris neitherhuggingfacenoropenai - 422
llm_credential_missing— bound LLM service has nocredentialRef - 503
chat_disabled— runtime has no globalchat:block configured and the agent has nollmServiceId
Same body. Returns text/event-stream:
| Event | Payload | When |
|---|---|---|
user-message |
The persisted user ChatMessage |
Once, after the user turn is persisted |
token |
{ delta: string } |
Per model emission |
token-reset |
{} |
After a tool-call iteration so the UI can clear pre-tool narration before iteration N+1 streams in |
tool-call |
{ toolName, args, callId } |
The model requested a tool invocation (only on providers with native function calling — today OpenAI; HuggingFace skips this lane) |
tool-result |
{ toolName, callId, result } |
Each tool result fed back into the next iteration |
done |
The persisted assistant ChatMessage (metadata.finish_reason: "stop" / "length") |
Terminal on success |
error |
The persisted assistant ChatMessage with metadata.finish_reason: "error" |
Terminal on failure |
The stream emits exactly one of done / error. Tool-use loops
are capped at 6 iterations per turn. Client disconnect is treated
as a clean stop — whatever was already streamed gets persisted
with finish_reason: "stop". Status codes are the same as the
synchronous variant (404 / 422 / 503 surface as error events when
they occur after the response has already started).
Catalog of one-click agent templates the UI offers in the agent
gallery. Workspace-scoped for authz, but the body is workspace-
independent and ships with the binary. Returns the four entries
(Bobby, Maven, Quill, Sage) with their templateId, name,
description, persona prompt, and defaultOnNewWorkspace flag.
See agents.md § Template catalog.
Instantiate a catalog template as a new agent in the workspace.
{ "templateId": "bobby" }The new agent's name, description, and systemPrompt are
copied from the template; other fields default to the same values
as POST /agents. Audit event agent.create carries the
templateId slug.
- 201 —
Agent - 400 — unknown
templateId - 404 — workspace not found
| Field | Type | Notes |
|---|---|---|
workspaceId |
uuid | |
agentId |
uuid | Server-assigned unless caller supplied. |
name |
string | |
description |
string | null | |
systemPrompt |
string | null | |
userPrompt |
string | null | |
llmServiceId |
uuid | null | When set, points at an LLM service in the same workspace; the agent's chat service is instantiated from that record. When null, the runtime's global chat: block is used. Mutable. |
knowledgeBaseIds |
uuid[] | Default RAG-grounding set. |
ragEnabled |
bool | |
ragMaxResults |
int | null | |
ragMinScore |
number | null | |
rerankEnabled |
bool | |
rerankingServiceId |
uuid | null | Agent-level override of the KB-level reranker. |
rerankMaxResults |
int | null | |
createdAt |
iso-8601 | |
updatedAt |
iso-8601 |
| Field | Type | Notes |
|---|---|---|
workspaceId |
uuid | |
agentId |
uuid | |
conversationId |
uuid | |
title |
string | null | |
knowledgeBaseIds |
uuid[] | Per-conversation override of the agent's default KB set. |
createdAt |
iso-8601 |
| Field | Type | Notes |
|---|---|---|
workspaceId |
uuid | |
conversationId |
uuid | |
messageId |
uuid | |
messageTs |
iso-8601 | Cluster-key. Strictly increasing within a conversation. |
role |
"user" | "agent" | "system" | "tool" |
agent is the assistant turn. |
content |
string | null | |
tokenCount |
int | null | If the provider reports it. |
metadata |
Record<string, string> |
RAG provenance (context_document_ids, context_chunks), model, finish_reason (stop/length/error), error_message. |
Optional Model Context Protocol
façade. Speaks Streamable HTTP (the modern MCP transport) with
JSON-RPC payloads. Off by default; enable via
mcp.enabled: true in workbench.yaml. See mcp.md for
the full walkthrough.
| Method | Status | Body |
|---|---|---|
GET / POST / DELETE / OPTIONS |
200 | JSON-RPC response (or SSE stream for long-running tool calls). The four methods map to the Streamable-HTTP spec — POST for client→server messages, GET for the long-lived event stream, DELETE for session teardown, OPTIONS for CORS preflight. |
| any | 404 not_found |
When mcp.enabled is false |
| any | 404 workspace_not_found |
When the path workspace doesn't exist |
Tools surfaced (read-mostly, ground external agents in workspace context):
list_knowledge_baseslist_documentssearch_kb(vector / hybrid / rerank)list_chatslist_chat_messageschat_send(only whenmcp.exposeChat: trueandchat:is configured)
Auth flows through the regular /api/v1/* middleware plus the
shared workspace-route authorization wrapper, so workspace scoping is
enforced before any MCP tool is invoked.
These do not exist yet. Shapes may shift before they land.
huggingface and openai are wired end-to-end today; openai is
the only provider with native function calling, so the agent
tool-use loop only fires for OpenAI-bound agents (HuggingFace
agents still answer, just without tool dispatch). Other providers
(Cohere, Anthropic, Bedrock, …) can be created and stored, but
agent send returns 422 llm_provider_unsupported until the
provider is wired into the chat-service factory. Adding a provider
is mostly a one-case addition to the dispatcher.
/api/v1/workspaces/{w}/mcp-tools — CRUD over the
wb_config_mcp_tools_by_workspace rows, plus
/api/v1/workspaces/{w}/agents/{a}/run for an agent execution loop
with tool use. Now that the MCP server façade is in, the inverse —
letting an agent call MCP tools — is the next step.
See roadmap.md for the phase plan.
The generated document at /api/v1/openapi.json is always in sync
with the running runtime (routes register their Zod schemas directly).
Share it with downstream tooling (client generators, API gateway
configs, etc.).
To consume locally:
curl -s http://localhost:8080/api/v1/openapi.json > openapi.json