Skip to content

Latest commit

 

History

History
1346 lines (1040 loc) · 46 KB

File metadata and controls

1346 lines (1040 loc) · 46 KB

API Spec

The AI Workbench HTTP contract. Every green box — the default TypeScript runtime and any future language-native runtime — serves this surface. Conformance is enforced by cross-runtime fixtures.

The machine-readable OpenAPI document is served at /api/v1/openapi.json, and a Scalar-rendered reference UI is served at /docs. This document exists to explain the shape narratively and to flag what's coming.

Conventions

Base URL and versioning

  • Functional routes live under /api/v1/….
  • Operational routes (/, /healthz, /readyz, /version, /features, /metrics, /astra-cli, /astra-cli/profiles, /docs, /api/v1/openapi.json) are unversioned.
  • Breaking changes bump the prefix to /api/v2/…; /api/v1/… stays until deprecated.

Content type

  • Request and response bodies are JSON (application/json).
  • Streaming endpoints use text/event-stream. Today: async-ingest job progress at GET /jobs/{jobId}/events.

Identifiers

  • All IDs are RFC 4122 v4 UUIDs rendered as lowercase hyphenated strings.
  • Timestamps are ISO-8601 in UTC with millisecond precision (2026-04-22T10:11:12.345Z).
  • Secrets never appear by value. Fields like credentials or embedding.secretRef hold pointers of the form <provider>:<path> (e.g. env:ASTRA_DB_APPLICATION_TOKEN).

Resource scoping

Every nested resource carries its parent IDs in the path:

/api/v1/workspaces/{workspaceId}
/api/v1/workspaces/{workspaceId}/knowledge-bases/{knowledgeBaseId}
/api/v1/workspaces/{workspaceId}/knowledge-bases/{kb}/documents/{documentId}
/api/v1/workspaces/{workspaceId}/{chunking,embedding,reranking}-services/{serviceId}

A request whose path references a non-existent workspace returns 404 workspace_not_found before the nested resource is ever consulted.

Pagination

Control-plane list endpoints accept:

  • limit — number of items to return, 1–200, default 50.
  • cursor — opaque value from the previous page's nextCursor.

Paginated responses use:

{
  "items": [],
  "nextCursor": null
}

When nextCursor is non-null, pass it back as ?cursor=... to read the next page. Malformed cursors return 400 invalid_cursor.

Error envelope

All error responses share one envelope:

{
  "error": {
    "code": "workspace_not_found",
    "message": "workspace '<workspaceId>' not found",
    "requestId": "b48e…"
  }
}

Codes are stable, lowercase, snake_case. Messages are human-readable and may change. Currently emitted:

Status Code When
400 validation_error Request body / params / query failed Zod validation. message carries the first failing field path and its reason (name: Name is required, credentials.token: expected '<provider>:<path>', e.g. 'env:FOO').
401 unauthorized Missing / malformed / invalid bearer token. WWW-Authenticate: Bearer set. See auth.md.
403 forbidden Token is valid but not authorized for the requested action — either the subject's workspaceScopes doesn't include the target workspace, or it's a scoped subject attempting a platform-level action (e.g. POST /workspaces). Also reserved for role-based checks in the upcoming RBAC phase.
413 payload_too_large /api/v1/workspaces/* request body exceeded the runtime's 10 MB default JSON body limit, or an ingest request exceeded the 50 MB ingest-only limit.
404 not_found Unknown route
404 workspace_not_found Workspace ID doesn't exist
404 knowledge_base_not_found Knowledge-base ID doesn't exist in workspace
404 document_not_found Document ID doesn't exist in the knowledge base
404 chunking_service_not_found / embedding_service_not_found / reranking_service_not_found Service ID doesn't exist in workspace
404 job_not_found Job ID doesn't exist in the workspace
409 conflict Create with an already-taken ID, or service deletion refused while a KB still references it
501 hybrid_not_supported Caller asked for hybrid search on a workspace kind whose driver doesn't implement searchHybrid
501 rerank_not_supported Caller asked for rerank on a workspace kind whose driver doesn't implement rerank
400 dimension_mismatch Supplied vector length doesn't match the KB's bound embedding service
400 embedding_unavailable Text search/upsert fallback could not build an embedder for the KB's bound embedding service
400 embedding_dimension_mismatch Embedder output dimension doesn't match the bound embedding service
422 workspace_misconfigured Workspace is missing url, token, keyspace, or similar driver-required config
500 internal_error Unhandled exception
503 control_plane_unavailable Backing store is unreachable
503 collection_unavailable Underlying vector collection is unreachable or missing
503 driver_unavailable Workspace kind has no registered vector-store driver

Authentication

/api/v1/* runs through a configurable auth middleware. The default posture (auth.mode: disabled) tags every request anonymous and lets it through — same behavior as before the middleware existed. Flip auth.mode to turn enforcement on. See auth.md for the full contract, config, and rollout plan.

Header format is Authorization: Bearer <token> (RFC 6750). On failure the response carries WWW-Authenticate: Bearer and the canonical error envelope:

{ "error": { "code": "unauthorized", "message": "", "requestId": "" } }

Operational routes (/, /healthz, /readyz, /version, /features, /metrics, /astra-cli, /astra-cli/profiles, /docs, /api/v1/openapi.json) bypass the middleware so load balancers and ops tooling can always reach them.

API-key issuance, OIDC bearer verification, browser OIDC login, and silent token refresh are all implemented. All verifier modes flow through the same middleware — routes don't need to care which verifier accepted the token. Browser-only /auth/* routes (/auth/config, /auth/login, /auth/callback, /auth/me, /auth/refresh, /auth/logout) are documented in auth.md rather than here.

Request ID

Every response carries X-Request-Id. If the client supplies one, the runtime echoes it; otherwise the runtime generates a UUID-hex string. Error responses include the same value in error.requestId.


Operational routes

GET /

Service banner.

Response 200

{
  "name": "ai-workbench",
  "version": "0.0.0",
  "commit": "abc1234",
  "docs": "/docs"
}

GET /healthz

Liveness. Returns 200 as long as the process is running.

{ "status": "ok" }

GET /readyz

Readiness. 200 once the control-plane store is reachable and workspaces can be listed. The payload carries a workspace count rather than a list — avoids O(N) responses when the store grows.

{ "status": "ready", "workspaces": 3 }

Returns 503 draining during graceful shutdown (SIGINT / SIGTERM). Kubernetes-style readiness probes will stop routing traffic while the runtime finishes in-flight requests. See configuration.md for the drain sequence. /healthz stays 200 throughout so livenessProbe doesn't restart a healthy, draining process.

GET /version

Build metadata.

{
  "version": "0.0.0",
  "commit": "abc1234",
  "buildTime": "2026-04-21T10:30:00Z",
  "node": "v22.11.0"
}

GET /features

Runtime feature flags the bundled web UI reads to decide which surfaces to render. Reflects the active config (chat enabled, MCP enabled, auth posture, astra-cli inventory available, etc.). Never echoes secrets.

GET /metrics

Prometheus exposition (text/plain; version=0.0.4). HTTP request counter + duration histogram labeled by method, matched route pattern, and status family (2xx/4xx/5xx); ingest semaphore gauges (workbench_ingest_workers_{active,queued}); rate-limit rejections by key type. No auth — same precedent as /healthz / /readyz.

GET /astra-cli

Auto-detected astra CLI defaults the runtime resolved at boot (active profile, default org, default DB id + name + endpoint, etc.). The web UI reads this to pre-fill the workspace onboarding form. Returns an empty payload when no CLI / profile is configured.

GET /astra-cli/profiles

Live shellout: lists every configured astra CLI profile and the databases visible to each. Drives the profile picker in the onboarding wizard. May take seconds depending on Astra API latency; not part of the hot path.

GET /docs

Scalar-rendered OpenAPI reference UI. Human-facing.

GET /api/v1/openapi.json

Machine-readable OpenAPI 3.1 document. Generated from the route definitions — always in sync with the running runtime.


/api/v1/workspaces

GET /api/v1/workspaces

List all workspaces, sorted by createdAt ascending with workspaceId as tie-breaker. Every backend (memory / file / astra) produces the same ordering so UI renders are deterministic.

Response 200 — paginated Workspace records:

{
  "items": [
    {
      "workspaceId": "",
      "name": "prod",
      "url": "env:ASTRA_DB_API_ENDPOINT",
      "kind": "astra",
      "credentials": { "token": "env:ASTRA_DB_APPLICATION_TOKEN" },
      "keyspace": "default_keyspace",
      "createdAt": "2026-04-22T10:11:12.345Z",
      "updatedAt": "2026-04-22T10:11:12.345Z"
    }
  ],
  "nextCursor": null
}

POST /api/v1/workspaces

Create a workspace. workspaceId is optional — the runtime generates one if omitted.

Request

{
  "name": "prod",
  "kind": "astra",
  "url": "env:ASTRA_DB_API_ENDPOINT",
  "credentials": { "token": "env:ASTRA_DB_APPLICATION_TOKEN" },
  "keyspace": "default_keyspace"
}

kind is one of astra | hcd | openrag | mock. (mock stays a first-class option for CI and offline work.) Once set, kind is immutable — changing it would orphan any already-provisioned KB collections.

url is the workspace's data-plane URL (for astra / hcd, the Astra Data API endpoint). Accepts either a literal URL or a SecretRef — the driver resolves refs at dial time so the same record works across dev and prod without code changes.

Each value in credentials must be a SecretRef (<provider>:<path>, e.g. env:ASTRA_DB_APPLICATION_TOKEN or file:/etc/workbench/secrets/astra-token). Raw secret values are rejected with 400.

Response 201 — the created Workspace.

GET /api/v1/workspaces/{workspaceId}

Fetch a single workspace.

  • 200Workspace
  • 404 workspace_not_found

PATCH /api/v1/workspaces/{workspaceId}

Patch one or more of name, url, credentials, keyspace. Every field is optional; omitted fields are preserved.

kind and workspaceId are immutable after creation and are rejected with 400. Unknown fields are likewise rejected (strict body).

  • 200 — updated Workspace
  • 400 — body contains kind or an unknown field
  • 404 workspace_not_found

DELETE /api/v1/workspaces/{workspaceId}

Cascades to the workspace's knowledge bases, execution services, RAG documents, and API keys. Before removing the control-plane rows, the runtime drops each KB's underlying Astra collection through the workspace's driver.

  • 204 — deleted
  • 404 workspace_not_found
  • 503 driver_unavailable — workspace has knowledge bases but no registered driver to drop their collections

POST /api/v1/workspaces/{workspaceId}/test-connection

Run a live workspace connection check. For mock workspaces, this always returns ok: true. Remote backends resolve their configured connection details and ask the driver to make a data-plane call.

Response 200 — always 200 regardless of check outcome; the ok field distinguishes success from failure:

{
  "ok": true,
  "kind": "astra",
  "details": "Astra Data API responded to listCollections."
}
{
  "ok": false,
  "kind": "astra",
  "details": "credential 'token' could not be resolved: env var 'ASTRA_DB_APPLICATION_TOKEN' is not set"
}
  • 200 — probe executed; inspect ok for pass/fail
  • 404 workspace_not_found

/api/v1/workspaces/{workspaceId}/api-keys

Workspace-scoped bearer tokens. Documented in auth.md; re-capped here for the route contract.

GET

List every key ever issued for the workspace, including revoked ones. Never exposes the hash column.

An ApiKey:

{
  "workspaceId": "",
  "keyId": "",
  "prefix": "abc123xyz789",
  "label": "ci",
  "createdAt": "",
  "lastUsedAt": null,
  "revokedAt": null,
  "expiresAt": null
}
  • 200 — paginated ApiKey records
  • 404 workspace_not_found

POST

Issue a new key. The plaintext is returned exactly once — the runtime stores only a scrypt digest.

Request

{ "label": "ci", "expiresAt": null }

Response 201

{
  "plaintext": "wb_live_abc123xyz789_…",
  "key": { "...ApiKey..." }
}
  • 201 — created; plaintext is the only time you'll see the token
  • 400 — missing / empty label
  • 404 workspace_not_found

DELETE /{keyId}

Soft-revoke: stamps revokedAt, leaves the row visible so audit tools still see the history. The next request bearing this token gets 401 unauthorized. Re-revoking an already-revoked key is a no-op that still returns 204.

  • 204 — revoked (or was already revoked)
  • 404 workspace_not_found / api_key_not_found

/api/v1/workspaces/{workspaceId}/{chunking,embedding,reranking}-services

Workspace-scoped execution services. Knowledge bases compose one chunking + one embedding + (optionally) one reranking service at create time. The three surfaces share an identical CRUD shape; only the body fields differ.

GET

List services in the workspace.

  • 200 — paginated ChunkingService / EmbeddingService / RerankingService records (sorted by createdAt ascending, *ServiceId as tie-breaker)
  • 404 workspace_not_found

POST

Create a service. The runtime generates the service ID if omitted. Required fields by kind:

Kind Required
chunking name, engine
embedding name, provider, modelName, embeddingDimension
reranking name, provider, modelName

Optional fields cover endpoint config (endpointBaseUrl, endpointPath, requestTimeoutMs, authType, credentialRef), provider/engine tuning, and supported language/content tags. See the OpenAPI spec for the full per-kind shape.

{
  "name": "openai-3-small",
  "provider": "openai",
  "modelName": "text-embedding-3-small",
  "embeddingDimension": 1536,
  "distanceMetric": "cosine",
  "endpointBaseUrl": "https://api.openai.com/v1",
  "credentialRef": "env:OPENAI_API_KEY",
  "supportedLanguages": ["en", "fr"],
  "supportedContent": ["text"]
}

supportedLanguages and supportedContent arrive as arrays and are returned deduplicated + sorted on the wire. (Astra-row layer keeps them as SET<TEXT>; the converter normalises at the boundary.)

  • 201 — the created record (with the generated *ServiceId)
  • 400 validation_error — schema failure
  • 404 workspace_not_found
  • 409 conflict*ServiceId collision

GET /{serviceId} / PATCH /{serviceId} / DELETE /{serviceId}

Fetch / patch / delete. PATCH accepts every field from create (all optional). Strict bodies — unknown keys return 400.

DELETE is refused with 409 conflict while any KB still references the service. Drop or rebind the dependent KBs first. The error message names the offending KB so operators can navigate straight to it.


/api/v1/workspaces/{workspaceId}/knowledge-bases

Knowledge base provisioning

A knowledge base is the runtime's atomic retrieval unit: a logical group of documents indexed by exactly one embedding service and one chunking service, optionally re-ranked by one reranker. Creating a KB through POST does four things in lockstep:

  1. Validate the requested collection shape. Owned KBs use the KB name as the underlying collection identifier. Attach-mode KBs (attach: true) must supply vectorCollection, and the supplied value must equal name so the KB row and data-plane collection cannot drift apart.
  2. Insert the control-plane row. The KnowledgeBase record is written before owned collection provisioning; if provisioning fails, the runtime rolls the row back so callers never observe a KB that points at a missing collection.
  3. Materialize the underlying vector collection on the workspace's driver. The driver (mock for tests, astra for production) creates a collection sized for the bound embedding service's embeddingDimension with the requested vectorSimilarity. For Astra workspaces with an astra-provider embedding service, the collection is provisioned with a service: block so embedding runs server-side (see Configuration §Vectorize-on-ingest). Attach mode skips this step and binds to the existing data-plane collection after validating compatibility.
  4. Seed any default knowledge filters declared on the workspace. Filters are mutable post-create via POST /{kb}/filters.

Collection naming. Owned KBs derive vectorCollection from name, and the KB name must match Astra collection-name rules (letters, digits, underscores; starts with a letter; max 48 chars). To adopt a pre-existing collection, set attach: true and supply that collection name as both name and vectorCollection; the driver verifies its dimension and vectorize provider/model match the bound embedding service before the row is accepted. Renaming after create is not supported because the name is the collection identifier.

Idempotence. POST is not idempotent on its own — re-issuing the same request creates a second KB with a fresh knowledgeBaseId. To make creation safe to retry, supply an explicit knowledgeBaseId in the body; if the row already exists with the same name and service bindings, the route returns 409 conflict rather than mutating the existing KB. Drop the KB explicitly before re-creating.

Dimension binding. The bound embedding service's embeddingDimension is captured into the collection at create time and is not re-checked on subsequent ingest / search calls — the driver trusts the collection's dimension. Changing the embedding service binding via PATCH is rejected (the field is immutable) because the collection's stored vectors would no longer match the new service's dimension.

Cascade on DELETE. The route drops the underlying collection before the control-plane row so a partial failure leaves the KB intact. Once the collection is gone, the row is removed and the cascade clears RAG documents, knowledge filters, and any conversation references in agent.knowledgeBaseIds / conversation.knowledgeBaseIds.

GET

List knowledge bases in the workspace.

  • 200 — paginated KnowledgeBase records
  • 404 workspace_not_found

A KnowledgeBase:

{
  "workspaceId": "",
  "knowledgeBaseId": "",
  "name": "support-docs",
  "description": "customer support knowledge base",
  "status": "active",
  "embeddingServiceId": "",
  "chunkingServiceId": "",
  "rerankingServiceId": null,
  "language": "en",
  "vectorCollection": "support_docs",
  "lexical": { "enabled": false, "analyzer": null, "options": {} },
  "createdAt": "",
  "updatedAt": ""
}

POST

Create a KB and auto-provision its underlying Astra collection. Transactional — if collection provisioning fails, the KB row is rolled back so the control plane and data plane never drift.

For owned KBs, omit vectorCollection; the runtime uses name as the collection name. To adopt a pre-existing collection, set attach: true and supply the same collection name in both name and vectorCollection.

Request

{
  "name": "support-docs",
  "description": "customer support",
  "embeddingServiceId": "",
  "chunkingServiceId": "",
  "rerankingServiceId": null,
  "language": "en"
}

embeddingServiceId and chunkingServiceId are required. Both must reference services that exist in the same workspace.

  • 201 — the created KnowledgeBase (collection now exists)
  • 404 workspace_not_found / embedding_service_not_found / chunking_service_not_found / reranking_service_not_found
  • 409 conflictknowledgeBaseId collision
  • 422 workspace_misconfigured — workspace is missing url or credentials.token required by its driver
  • 503 driver_unavailable — no driver registered for the workspace's kind

GET /{knowledgeBaseId} / PATCH /{knowledgeBaseId} / DELETE /{knowledgeBaseId}

GET reads the record. PATCH accepts a partial — description, status, rerankingServiceId, language, and lexical are mutable; name, embeddingServiceId, and chunkingServiceId are immutable post-create and the schema is .strict(), so accidentally including them in a body returns 400. DELETE drops the underlying collection first for owned KBs, then the KB row, then cascades RAG document rows. Attached KBs detach without dropping the external collection.

GET /api/v1/workspaces/{workspaceId}/adoptable-collections

Discover Astra collections in the workspace's keyspace that aren't already bound to a knowledge base. The web UI uses this to populate the "attach an existing collection" picker on the create-KB flow.

  • 200{ "items": [ { "name": string, "vectorDimension": number | null, "vectorMetric": string | null } ] }
  • 404 workspace_not_found
  • 422 workspace_misconfigured — workspace driver missing required config
  • 503 driver_unavailable

Knowledge filters — …/knowledge-bases/{kb}/filters

Workspace-scoped, KB-scoped saved retrieval filters. They are shallow-equal payload constraints applied at search time without requiring the caller to remember the exact JSON. Used by the playground's filter dropdown and by agents that want pre-defined narrowings.

Method Path Purpose
GET /{kb}/filters List filters in the KB (paginated)
POST /{kb}/filters Create. Body: { knowledgeFilterId?, name, description?, filter }. 409 on duplicate explicit ID.
GET /{kb}/filters/{filterId} Fetch one
PATCH /{kb}/filters/{filterId} Mutate name, description, or filter
DELETE /{kb}/filters/{filterId} 204

filter is the same shape as POST /search's filter body — a shallow-equal map over payload keys. Filters are seeded from the workspace's configured defaults at KB-create time.

POST /{knowledgeBaseId}/records — upsert records

Request — each record carries exactly one of vector or text:

{
  "records": [
    { "id": "doc-1", "vector": [0.01, -0.02, ...], "payload": { "title": "" } },
    { "id": "doc-2", "text": "winter sweater in blue" },
    { "id": "doc-3", "text": "summer shorts", "payload": { "tag": "apparel" } }
  ]
}
  • records — 1..500 items per request.
  • id is the application's identifier; re-upsert replaces the prior value.
  • vector.length must equal the bound embedding service's embeddingDimension.
  • Text dispatch mirrors search: the route tries driver.upsertByText() for all-text batches (Astra $vectorize inserts for collections with a service block). On NotSupportedError the runtime embeds each text record via the KB's bound embedding service and retries through plain upsert. Mixed batches always embed client-side so the whole batch stays in one transactional call.

Response 200

{ "upserted": 2 }
  • 400 validation_error — record has neither/both of vector/text
  • 400 dimension_mismatch — vector length doesn't match the bound embedding service's embeddingDimension
  • 400 embedding_unavailable / embedding_dimension_mismatch
  • 404 workspace_not_found / knowledge_base_not_found

DELETE /{knowledgeBaseId}/records/{recordId}

Delete a single record. recordId is the application's id (any non-empty string).

{ "deleted": true }

POST /{knowledgeBaseId}/search — vector or text search

Request — exactly one of vector or text, plus optional hybrid / lexicalWeight / rerank:

{
  "text": "how do refunds work?",
  "topK": 5,
  "filter": { "section": "billing" },
  "hybrid": true,
  "lexicalWeight": 0.3,
  "rerank": true
}
  • topK defaults to 10, clamped to [1, 1000].
  • filter is shallow-equal on payload keys.
  • hybrid: true runs the driver's vector + lexical lane (defaults to the KB's lexical.enabled). Requires text.
  • rerank: true reorders hits through the KB's bound reranking service. Defaults to true when rerankingServiceId is non-null. Requires text.

The route synthesises a driver-facing descriptor from the KB plus its bound services (see kb-descriptor.ts) so the dispatch layer stays unchanged.

Response 200 — array of hits, sorted by score descending:

[
  { "id": "doc-1", "score": 0.94, "payload": { "title": "" } },
  { "id": "doc-2", "score": 0.87, "payload": { "title": "" } }
]

Score semantics match the bound embedding service's distanceMetric:

Metric Score
cosine Cosine similarity in [-1, 1]; 1 = exact match
dot Raw dot product; unbounded
euclidean 1 / (1 + distance) so higher = closer
  • 400 validation_error — neither/both of vector/text, or hybrid/rerank without text
  • 400 dimension_mismatch / embedding_unavailable / embedding_dimension_mismatch
  • 404 workspace_not_found / knowledge_base_not_found
  • 501 hybrid_not_supported / rerank_not_supported

GET /{knowledgeBaseId}/documents

List RAG documents in the KB.

  • 200 — paginated RagDocument records
  • 404 workspace_not_found / knowledge_base_not_found

A RagDocument:

{
  "workspaceId": "",
  "knowledgeBaseId": "",
  "documentId": "",
  "sourceDocId": null,
  "sourceFilename": "readme.md",
  "fileType": "text/markdown",
  "fileSize": 1024,
  "contentHash": "sha256:…",
  "chunkTotal": null,
  "ingestedAt": null,
  "updatedAt": "",
  "status": "pending",
  "errorMessage": null,
  "metadata": { "source": "upload" }
}

status is one of pending | chunking | embedding | writing | ready | failed. The KB ingest pipeline is the canonical writer of status / errorMessage / chunkTotal / ingestedAt. Clients can also set these directly via PATCH if they own the lifecycle externally.

POST /{knowledgeBaseId}/documents

Register a document in the KB without running the ingest pipeline.

{
  "sourceFilename": "readme.md",
  "fileType": "text/markdown",
  "fileSize": 1024,
  "contentHash": "sha256:…",
  "metadata": { "source": "upload" }
}
  • 201 — the created RagDocument (status defaults to pending, metadata defaults to {})
  • 404 workspace_not_found / knowledge_base_not_found
  • 409 conflictworkspaceId collision within the same KB

GET /{knowledgeBaseId}/documents/{documentId} / PATCH /{documentId} / DELETE /{documentId}

Fetch / patch / delete. PATCH accepts every field from create (all optional). DELETE cascades into the KB's collection: chunks matched by payload.documentId are removed before the row is dropped, so a successful delete leaves no traces in KB-scoped search. Drivers exposing deleteRecords use a single bulk call; older drivers fall back to a listRecords + per-row delete loop.

GET /{knowledgeBaseId}/documents/{documentId}/chunks

Lists the chunks the ingest pipeline extracted from this document. Reads raw records out of the KB's collection filtered on documentId, sorts by the chunkIndex payload key, and returns:

[
  {
    "id": "<documentId>:0",
    "chunkIndex": 0,
    "text": "First paragraph about apples.",
    "payload": {
      "knowledgeBaseId": "",
      "documentId": "",
      "chunkIndex": 0,
      "chunkText": "First paragraph about apples.",
      "source": "seed"
    }
  }
]

Query params:

  • limit (1–1000, default 1000) — caps the number of chunks returned.

  • 200 — array of chunks, sorted by chunkIndex ascending

  • 404 workspace_not_found / knowledge_base_not_found / document_not_found

  • 501 list_records_not_supported — driver doesn't expose listRecords

POST /{knowledgeBaseId}/ingest

Synchronous end-to-end ingest. Chunks the input text, embeds every chunk through the KB's bound embedding service (server-side via $vectorize where the driver supports it, otherwise client-side), upserts the chunks into the KB's collection, and creates a RagDocument row with status: ready + chunkTotal.

Request

{
  "text": "Apples are red. Bananas are yellow.",
  "sourceFilename": "fruit.md",
  "metadata": { "source": "seed" },
  "chunker": { "maxChars": 1000, "minChars": 100, "overlapChars": 150 }
}

chunker overrides the runtime defaults for this call only. metadata is merged onto every chunk's payload; the reserved keys knowledgeBaseId, documentId, chunkIndex, and chunkText are always set by the runtime and override any caller-supplied values. text is capped at 200,000 characters.

Response 201

{
  "document": { "status": "ready", "chunkTotal": 3, "...": "..." },
  "chunks": 3
}

Chunk payloads. Every chunk upserted carries:

  • knowledgeBaseId — the KB's ID (used by /search)
  • documentId — the ID of the RagDocument row this ingest created
  • chunkIndex — 0-based position within the source document
  • chunkText — the chunk's raw text (read back through /chunks)
  • Plus every caller-supplied metadata key

Failure semantics. When chunking or upsert throws, the RagDocument row is marked status: failed with errorMessage before the error is re-raised.

POST /{knowledgeBaseId}/ingest?async=true

Same body. The pipeline runs in the background; the response returns immediately with a job pointer.

Response 202

{
  "job": {
    "workspaceId": "",
    "jobId": "",
    "kind": "ingest",
    "knowledgeBaseId": "",
    "documentId": "",
    "status": "pending",
    "processed": 0,
    "total": null,
    "result": null,
    "errorMessage": null,
    "createdAt": "",
    "updatedAt": ""
  },
  "document": { "status": "writing", "…": "" }
}

Errors are the same set as the sync path. A 4xx means the request was rejected outright; nothing was enqueued and no job row exists.

Once the job is running, failures are captured into the job record (status: failed, errorMessage populated) and the document row. The runKbIngestJob worker resolves the KB descriptor on every call so renames or service swaps mid-flight don't drift.

POST /{knowledgeBaseId}/ingest/file

Multipart counterpart to /ingest. Accepts a binary upload (PDF, DOCX, XLSX, or text) plus optional metadata, dispatches an extractor based on the file's MIME type / extension, then runs the same chunk → embed → upsert pipeline.

Form fields:

Field Required Notes
file yes The document bytes. Must be a File part in multipart/form-data.
metadata no JSON object string merged onto every chunk's payload (same semantics as the JSON /ingest metadata field).
chunker no JSON object string overriding the runtime's chunker defaults for this call only.
parser no native | docling | auto (default). When DOCLING_URL is unset, native is the only option. See Configuration § Document extraction.

Query: ?async=true → 202 + job pointer (same response shape as the JSON variant). Body cap is 50 MB.

  • 201{ document, chunks }
  • 202{ job, document } (when async=true)
  • 400 invalid_multipart / missing_file — body wasn't multipart, or the file field was missing
  • 400 validation_error — bad metadata / chunker JSON
  • 400 extractor_unsupported — file type the runtime can't extract
  • 413 payload_too_large — body exceeded 50 MB
  • 503 docling_unavailableparser=docling (or auto) couldn't reach the configured docling-serve

/api/v1/workspaces/{workspaceId}/jobs/{jobId}

Job poll surface for anything that runs in the background. Today only async ingest creates jobs; future bulk ops (reindex, export, batch delete) plug in with the same record shape.

GET /{jobId}

Point-in-time fetch, suitable for polling. Returns the Job record described above.

  • 200Job
  • 404 job_not_found

GET /{jobId}/events

Server-Sent Events stream. Emits event: job with the full record as JSON on every update, plus a final event: done carrying { status } when the job hits a terminal state. The current record is replayed as the first job event so clients don't race the first update.

Headers: Content-Type: text/event-stream, Cache-Control: no-cache.

Same-replica updates fan out immediately through the in-process subscription registry. With the Astra job store, subscribers on other replicas poll the subscribed job records at controlPlane.jobPollIntervalMs so an SSE client can see progress even when the worker is running on a different pod. The memory and file job stores remain single-replica deployment shapes.

Job record

Field Type Notes
workspaceId uuid Owning workspace
jobId uuid
kind "ingest" Discriminator — more kinds arrive with more async ops
knowledgeBaseId uuid or null Set for ingest jobs
documentId uuid or null Set for ingest jobs
status "pending" | "running" | "succeeded" | "failed" Terminal: succeeded, failed
processed int Units completed
total int or null Units expected (null if unknown)
result object or null Kind-specific summary on success (ingest: { chunks: N })
errorMessage string or null Populated on failed
leasedBy string or null Replica currently driving the job
leasedAt iso-8601 or null Last heartbeat from the lease holder
ingestInput object or null Persisted ingest snapshot used for orphan replay
createdAt iso-8601
updatedAt iso-8601

Persistence. The job store auto-matches the control-plane driver:

  • controlPlane.driver: memory → jobs live in-process (lost on restart).
  • controlPlane.driver: file → jobs serialize to <controlPlane.root>/jobs.json alongside workspaces.json, survive restart.
  • controlPlane.driver: astra → jobs live in wb_jobs_by_workspace, reusing the existing Data API connection; durable across restart and across replicas. Subscriptions poll across replicas while local updates still fan out immediately.

Clustered Astra deployments can set controlPlane.jobsResume.enabled: true. Running workers then stamp leasedBy / leasedAt; the orphan sweeper claims stale leases and, when ingestInput is present, replays the ingest pipeline. Chunk IDs are deterministic, so replay is idempotent. Older jobs without an input snapshot, or future job kinds that cannot replay yet, are claimed and marked failed so clients still see a terminal state.

/api/v1/workspaces/{workspaceId}/llm-services

Workspace-scoped LLM execution services — describe how to call a chat-completion or generation model. Mirrors the chunking / embedding / reranking service surface. An agent in the same workspace may bind one of these via agent.llmServiceId; the agent's send + streaming pipeline then instantiates a chat service from the bound record.

Today provider: "huggingface" and provider: "openai" are wired end-to-end; other providers can be created and stored, but agent send returns 422 llm_provider_unsupported until their adapters land.

GET /llm-services

List services in the workspace, oldest-first. Paginated.

  • 200 — paginated LlmService records
  • 404 workspace_not_found

POST /llm-services

Create a service. Required: name, provider, modelName. Optional fields cover endpoint config (endpointBaseUrl, endpointPath, requestTimeoutMs, authType, credentialRef), provider tuning (engine, modelVersion, contextWindowTokens, maxOutputTokens, temperatureMin, temperatureMax, supportsStreaming, supportsTools, maxBatchSize), and language / content tags. See the OpenAPI spec for the full shape.

{
  "name": "hf-mistral",
  "provider": "huggingface",
  "modelName": "mistralai/Mistral-7B-Instruct-v0.3",
  "credentialRef": "env:HUGGINGFACE_API_KEY",
  "maxOutputTokens": 1024
}
  • 201 — the created LlmService
  • 400 validation_error
  • 404 workspace_not_found
  • 409 conflict — duplicate explicit llmServiceId

GET /llm-services/{llmServiceId} / PATCH /{id} / DELETE /{id}

Fetch / patch / delete. PATCH accepts every field from create (all optional). DELETE is refused with 409 conflict while any agent still references the service via llmServiceId. Reassign or delete the dependent agents first.

/api/v1/workspaces/{workspaceId}/agents

User-defined agents — workspace-scoped personas backed by the Stage-2 agentic tables. See agents.md for the full walkthrough; the route shapes are summarised below.

Historical note. Earlier drafts of this document described a parallel /chats route surface and a singleton "Bobbie" agent. Both were retired; the agent surface is the single way to chat against a workspace.

GET /agents

List agents in the workspace, oldest-first. Paginated.

POST /agents

  • Body: CreateAgentInput (see agents.md).
  • 201Agent
  • 404 — workspace not found
  • 409 — duplicate explicit agentId

GET /agents/{agentId}

  • 200Agent

PATCH /agents/{agentId}

Patch any optional field except agentId. Sends null to clear nullable fields (including llmServiceId).

DELETE /agents/{agentId}

204; cascades the agent's conversations and their messages.

GET /agents/{agentId}/conversations

List the agent's conversations, newest-first. Paginated.

POST /agents/{agentId}/conversations

  • Body: CreateConversationInput ({ conversationId?, title?, knowledgeBaseIds? }).
  • 201Conversation
  • 404 — workspace or agent not found

GET|PATCH|DELETE /agents/{agentId}/conversations/{conversationId}

Single-conversation read / update (title + KB filter) / delete. Delete cascades messages. 404 when the conversation does not belong to the named agent.

GET /agents/{agentId}/conversations/{conversationId}/messages

Oldest-first message log, paginated.

  • 200 — paginated ChatMessage records
  • 404 when the workspace, agent, or conversation does not exist, or when the conversation does not belong to the named agent

POST /agents/{agentId}/conversations/{conversationId}/messages (synchronous)

Body: { content }. Persists the user turn, retrieves grounding context, calls the agent's LLM (per the resolution order below), persists the assistant turn, and returns:

{ "user": <ChatMessage>, "assistant": <ChatMessage> }

LLM resolution. When agent.llmServiceId is set the runtime instantiates a chat service from the bound LLM-service record. When unset it falls back to the runtime's global chat: block.

  • 201{ user, assistant }
  • 404 when the conversation does not belong to the named agent
  • 422 llm_provider_unsupportedagent.llmServiceId points at an LLM service whose provider is neither huggingface nor openai
  • 422 llm_credential_missing — bound LLM service has no credentialRef
  • 503 chat_disabled — runtime has no global chat: block configured and the agent has no llmServiceId

POST /agents/{agentId}/conversations/{conversationId}/messages/stream (SSE)

Same body. Returns text/event-stream:

Event Payload When
user-message The persisted user ChatMessage Once, after the user turn is persisted
token { delta: string } Per model emission
token-reset {} After a tool-call iteration so the UI can clear pre-tool narration before iteration N+1 streams in
tool-call { toolName, args, callId } The model requested a tool invocation (only on providers with native function calling — today OpenAI; HuggingFace skips this lane)
tool-result { toolName, callId, result } Each tool result fed back into the next iteration
done The persisted assistant ChatMessage (metadata.finish_reason: "stop" / "length") Terminal on success
error The persisted assistant ChatMessage with metadata.finish_reason: "error" Terminal on failure

The stream emits exactly one of done / error. Tool-use loops are capped at 6 iterations per turn. Client disconnect is treated as a clean stop — whatever was already streamed gets persisted with finish_reason: "stop". Status codes are the same as the synchronous variant (404 / 422 / 503 surface as error events when they occur after the response has already started).

GET /agent-templates

Catalog of one-click agent templates the UI offers in the agent gallery. Workspace-scoped for authz, but the body is workspace- independent and ships with the binary. Returns the four entries (Bobby, Maven, Quill, Sage) with their templateId, name, description, persona prompt, and defaultOnNewWorkspace flag. See agents.md § Template catalog.

POST /agents/from-template

Instantiate a catalog template as a new agent in the workspace.

{ "templateId": "bobby" }

The new agent's name, description, and systemPrompt are copied from the template; other fields default to the same values as POST /agents. Audit event agent.create carries the templateId slug.

  • 201Agent
  • 400 — unknown templateId
  • 404 — workspace not found

Agent record

Field Type Notes
workspaceId uuid
agentId uuid Server-assigned unless caller supplied.
name string
description string | null
systemPrompt string | null
userPrompt string | null
llmServiceId uuid | null When set, points at an LLM service in the same workspace; the agent's chat service is instantiated from that record. When null, the runtime's global chat: block is used. Mutable.
knowledgeBaseIds uuid[] Default RAG-grounding set.
ragEnabled bool
ragMaxResults int | null
ragMinScore number | null
rerankEnabled bool
rerankingServiceId uuid | null Agent-level override of the KB-level reranker.
rerankMaxResults int | null
createdAt iso-8601
updatedAt iso-8601

Conversation record

Field Type Notes
workspaceId uuid
agentId uuid
conversationId uuid
title string | null
knowledgeBaseIds uuid[] Per-conversation override of the agent's default KB set.
createdAt iso-8601

ChatMessage record

Field Type Notes
workspaceId uuid
conversationId uuid
messageId uuid
messageTs iso-8601 Cluster-key. Strictly increasing within a conversation.
role "user" | "agent" | "system" | "tool" agent is the assistant turn.
content string | null
tokenCount int | null If the provider reports it.
metadata Record<string, string> RAG provenance (context_document_ids, context_chunks), model, finish_reason (stop/length/error), error_message.

/api/v1/workspaces/{workspaceId}/mcp

Optional Model Context Protocol façade. Speaks Streamable HTTP (the modern MCP transport) with JSON-RPC payloads. Off by default; enable via mcp.enabled: true in workbench.yaml. See mcp.md for the full walkthrough.

Method Status Body
GET / POST / DELETE / OPTIONS 200 JSON-RPC response (or SSE stream for long-running tool calls). The four methods map to the Streamable-HTTP spec — POST for client→server messages, GET for the long-lived event stream, DELETE for session teardown, OPTIONS for CORS preflight.
any 404 not_found When mcp.enabled is false
any 404 workspace_not_found When the path workspace doesn't exist

Tools surfaced (read-mostly, ground external agents in workspace context):

  • list_knowledge_bases
  • list_documents
  • search_kb (vector / hybrid / rerank)
  • list_chats
  • list_chat_messages
  • chat_send (only when mcp.exposeChat: true and chat: is configured)

Auth flows through the regular /api/v1/* middleware plus the shared workspace-route authorization wrapper, so workspace scoping is enforced before any MCP tool is invoked.

Planned routes

These do not exist yet. Shapes may shift before they land.

Multi-provider LLM execution

huggingface and openai are wired end-to-end today; openai is the only provider with native function calling, so the agent tool-use loop only fires for OpenAI-bound agents (HuggingFace agents still answer, just without tool dispatch). Other providers (Cohere, Anthropic, Bedrock, …) can be created and stored, but agent send returns 422 llm_provider_unsupported until the provider is wired into the chat-service factory. Adding a provider is mostly a one-case addition to the dispatcher.

MCP tool execution

/api/v1/workspaces/{w}/mcp-tools — CRUD over the wb_config_mcp_tools_by_workspace rows, plus /api/v1/workspaces/{w}/agents/{a}/run for an agent execution loop with tool use. Now that the MCP server façade is in, the inverse — letting an agent call MCP tools — is the next step.

See roadmap.md for the phase plan.


OpenAPI

The generated document at /api/v1/openapi.json is always in sync with the running runtime (routes register their Zod schemas directly). Share it with downstream tooling (client generators, API gateway configs, etc.).

To consume locally:

curl -s http://localhost:8080/api/v1/openapi.json > openapi.json