AI Workbench is a polyglot HTTP runtime sitting in front of Astra DB.
It exposes a stable /api/v1/* contract for workspaces, knowledge
bases, execution services (chunking / embedding / reranking),
documents, ingestion, and search. Each language-native
implementation of the runtime is a "green box"; the default
TypeScript green box is embedded with the UI, and alternatives live
under runtimes/.
- One HTTP contract, N runtimes. Workspaces, knowledge bases, execution services, and RAG documents are defined by the HTTP API — not by any one runtime's internals. Every language green box honors the same contract, enforced by fixture-based conformance tests.
- Thin, boring runtime core. The runtime is an HTTP server + a pluggable control-plane store. Complexity lives in pluggable services bound to a knowledge base (chunking, embedding, reranking).
- Workspaces are runtime data, not config.
workbench.yamlpicks which control-plane backend to use; workspaces themselves are mutable records managed via the HTTP API. - A KB owns its collection end-to-end. Creating a knowledge
base auto-provisions the underlying Astra collection named by the
KB's
vectorCollection(owned KBs derive it fromname), sized to the bound embedding service's dimension; deleting an owned KB drops the collection. The control plane and data plane never diverge. - Driver-based control plane.
memoryfor CI and demos,filefor single-node self-hosted,astrafor production. Same contract. - Astra-native where real. The
astrabackend uses@datastax/astra-db-tsdirectly. The Python runtime usesastrapy. No wrapper libraries in between. - Secrets by reference. Credentials live behind
SecretRefpointers (env:FOO/file:/path) resolved at use time by a pluggable provider. No raw secrets in config, records, or logs. - Immutable records. Every update returns a new object. The
in-memory backend holds
Map<workspaceId, Record>; the file backend rewrites atomically; the astra backend does$setupdates through the Data API. - Contract-first for new surfaces. The HTTP API is versioned
(
/api/v1/…) and documented inapi-spec.mdand the generated OpenAPI at/api/v1/openapi.json.
The architecture is one green box per language-native runtime. Every green box:
- Serves the same
/api/v1/*surface. - Speaks Astra via its own language-native SDK internally.
- Runs as a standalone HTTP server (a Docker container in production).
The UI picks which green box to target via the BACKEND_URL
environment variable at deploy time. The shipping path is UI +
TypeScript runtime in one container, so BACKEND_URL is
same-origin. The Python (FastAPI) and Java (Spring Boot) green boxes
are preview scaffolds: they boot, serve operational endpoints,
and answer every /api/v1/* route with HTTP 501 until handlers are
implemented. The cross-runtime contract and conformance harness exist
specifically so those handlers can land incrementally without
breaking parity guarantees.
See green-boxes.md for the full model, and
runtimes/README.md for the per-runtime
status table.
The default process lives at runtimes/typescript/ and
boots from runtimes/typescript/src/root.ts. Responsibilities:
- Load and validate
workbench.yaml. - Build a
SecretResolverfrom the configured secret providers. - Build a
ControlPlaneStorefrom the configured backend. - Create and serve the Hono app (routes + middleware).
- Emit structured logs with request IDs and (soon) OpenTelemetry traces.
Backend-agnostic interface in
runtimes/typescript/src/control-plane/store.ts,
composed from per-aggregate repo interfaces under
runtimes/typescript/src/control-plane/repos/ (WorkspaceRepo,
KnowledgeBaseRepo, ChatMessageRepo, etc.). Three backend implementations:
| Backend | File | When to use |
|---|---|---|
memory |
memory/store.ts |
CI, tests, docker run demos. Not durable. |
file |
file/store.ts |
Single-node self-hosted. Per-table mutex + atomic rename. |
astra |
astra/store.ts |
Production. Data API Tables via astra-db-ts. |
PR #156 split the interface layer into per-aggregate repos; PR #199
followed up by extracting the memory backend's implementation into
per-aggregate slices under
runtimes/typescript/src/control-plane/memory/
(workspaces.ts, agents.ts, knowledge-bases.ts, …) that close
over a shared MemoryStoreState. The file and astra backends
remain single-class implementations today — splitting them is the
queued follow-through. All three pass the same shared contract suite
in
runtimes/typescript/tests/control-plane/contract.ts
(grows as routes ship).
Data-plane counterparts to the control-plane store. Where
ControlPlaneStore owns records (workspaces, KBs, services,
RAG documents), the VectorStoreDriver owns actual vectors in
the per-KB Astra collection.
| File | Purpose |
|---|---|
vector-store.ts |
Driver interface — createCollection, dropCollection, upsert, deleteRecord, search, plus optional searchByText, upsertByText, searchHybrid, rerank, listRecords (chunks under a document), deleteRecords (delete-document cascade) |
mock/store.ts |
In-memory driver; used by workspaces with kind: "mock" and by the conformance suite |
astra/store.ts |
Data API Collections via astra-db-ts; per-workspace DataAPIClient cache, lazy init |
registry.ts |
Dispatches based on workspace.kind; unknown kinds surface as 503 driver_unavailable |
factory.ts |
Wires the registry at startup from the SecretResolver |
The route layer in
api-v1/kb-descriptor.ts
materialises a driver-facing descriptor on the fly from a KB plus
its bound embedding/reranking services. Drivers and the search /
upsert dispatch surfaces consume this synthesised shape unchanged —
they don't need to know KBs exist.
POST /api/v1/workspaces/{w}/knowledge-bases is the transactional
entry point: it writes the KB row, calls the driver to create the
collection, and rolls back the row on failure so the control plane
and data plane never diverge. DELETE reverses this — drop the
collection first, then the row.
Both drivers pass the same 8-assertion
driver contract suite. The Astra
driver runs it against an in-memory fake Db that mimics
$vector sort semantics faithfully; real-Astra integration is
gated on ASTRA_DB_* env vars and lives in a follow-up.
Thin layer over astra-db-ts scoped to the wb_* tables:
table-definitions.ts— Data API Table DDL.row-types.ts— snake_case JSON row shapes.converters.ts— pure record ↔ row conversion.tables.ts—TablesBundle— narrow structural interface used by the astra store (lets tests inject fakes).client.ts—openAstraClient(): creates the tables idempotently at init and returns aTablesBundle.
The Python runtime has a symmetric internal layer that wraps
astrapy for the same tables — no shared library, just a shared
schema.
SecretResolver— dispatches aSecretRefto the matching provider based on its prefix.EnvSecretProvider— resolvesenv:VAR→process.env.VAR.FileSecretProvider— resolvesfile:/path→ trimmed file contents.
Used at startup to resolve controlPlane.astra.tokenRef. Future
uses include per-workspace credentialsRef when the runtime starts
talking to workspace-scoped backends.
| Module | Prefix | Contents |
|---|---|---|
operational.ts |
(unversioned) | /, /healthz, /readyz, /version, /features, /metrics, /astra-cli, /astra-cli/profiles |
auth.ts |
/auth |
OIDC login + silent refresh — /auth/{config,login,callback,me,refresh,logout} (mounted only when auth.oidc.client is configured) |
api-v1/workspaces.ts |
/api/v1/workspaces |
Workspace CRUD + POST {w}/test-connection |
api-v1/knowledge-bases.ts |
/api/v1/workspaces/{w}/knowledge-bases |
KB CRUD (POST auto-provisions collection); also serves GET {w}/adoptable-collections |
api-v1/kb-data-plane.ts |
…/knowledge-bases/{kb}/{records,search} |
Upsert / delete record / search |
api-v1/kb-documents.ts |
…/knowledge-bases/{kb}/{documents,ingest,ingest/file} |
Document metadata, sync + async JSON ingest, multipart /ingest/file (PDF / DOCX / XLSX / text), chunk listing |
api-v1/kb-descriptor.ts |
— | resolveKb() — synthesises a driver-facing descriptor from a KB + bound services |
api-v1/knowledge-filters.ts |
…/knowledge-bases/{kb}/filters |
KB-scoped saved retrieval filters |
api-v1/{chunking,embedding,reranking}-services.ts |
…/{chunking,embedding,reranking}-services |
Service CRUD (incl. PATCH) |
api-v1/llm-services.ts |
…/llm-services |
Workspace-scoped chat-completion executor CRUD |
api-v1/agents.ts |
/api/v1/workspaces/{w}/agents |
Agent + conversation CRUD, chat send + SSE stream, agent-templates catalog, agents/from-template |
api-v1/jobs.ts |
/api/v1/workspaces/{w}/jobs |
Job poll + SSE stream |
api-v1/api-keys.ts |
/api/v1/workspaces/{w}/api-keys |
Per-workspace API-key management (list / issue / revoke) |
api-v1/mcp.ts |
/api/v1/workspaces/{w}/mcp |
Model Context Protocol Streamable-HTTP façade — app.all over GET / POST / DELETE / OPTIONS, gated by mcp.enabled |
api-v1/helpers.ts |
— | Error mapping (invoked from app-level onError) |
Workspace-scoped routes are mounted via the in-tree
route-plugin registry — every
module above except operational.ts and auth.ts is wrapped as a
RoutePlugin and iterated in app.ts. See
route-plugins.md.
Route handlers validate with Zod (via @hono/zod-openapi) and
delegate to the ControlPlaneStore. Typed errors (ControlPlaneNot FoundError, …ConflictError, …UnavailableError) bubble to the
top-level onError handler which maps them to the canonical HTTP
envelope.
Data API tables backed by CQL-style schemas. The exact DDL lives in
runtimes/typescript/src/astra-client/table-definitions.ts;
here's the logical shape:
wb_workspaces PK (workspaceId)
workspaceId, name, endpoint, kind, credentials_ref, keyspace,
created_at, updated_at
wb_config_knowledge_bases_by_workspace PK ((workspace_id), knowledge_base_id)
name, description, status,
embedding_service_id, chunking_service_id, reranking_service_id,
language, vector_collection,
lexical_{enabled,analyzer,options},
created_at, updated_at
wb_config_chunking_service_by_workspace PK ((workspace_id), chunking_service_id)
name, description, status,
engine, engine_version, strategy,
{min,max}_chunk_size, chunk_unit,
overlap_size, overlap_unit, preserve_structure,
language, max_payload_size_kb,
enable_ocr, extract_tables, extract_figures, reading_order,
endpoint_*, request_timeout_ms, auth_type, credential_ref,
created_at, updated_at
wb_config_embedding_service_by_workspace PK ((workspace_id), embedding_service_id)
name, description, status,
provider, model_name, embedding_dimension, distance_metric,
max_batch_size, max_input_tokens,
supported_languages SET<TEXT>, supported_content SET<TEXT>,
endpoint_*, request_timeout_ms, auth_type, credential_ref,
created_at, updated_at
wb_config_reranking_service_by_workspace PK ((workspace_id), reranking_service_id)
name, description, status,
provider, engine, model_name, model_version,
max_candidates, scoring_strategy,
score_normalized, return_scores, max_batch_size,
supported_languages SET<TEXT>, supported_content SET<TEXT>,
endpoint_*, request_timeout_ms, auth_type, credential_ref,
created_at, updated_at
wb_rag_documents_by_knowledge_base PK ((workspace_id, knowledge_base_id), document_id)
source_*, file_*, content_hash, chunk_total,
ingested_at, updated_at,
status, error_message, metadata
wb_rag_documents_by_knowledge_base_and_status (secondary index, by status)
wb_rag_documents_by_content_hash (dedup lookup)
wb_jobs_by_workspace PK ((workspace), job_id)
kind, knowledge_base_id, document_id, status,
processed, total, result_json, error_message,
leased_by, leased_at, ingest_input_json,
created_at, updated_at
wb_api_key_by_workspace, wb_api_key_lookup (per-workspace tokens)
kind on workspaces is one of astra | hcd | openrag | mock. It
describes the backend that the workspace itself targets (useful
later, when a single runtime routes requests to different
data-plane backends per workspace). The runtime's own control plane
is separate — chosen via workbench.yaml.
Knowledge bases own their collection. vector_collection on
the KB row is the Astra collection name. Owned KBs derive it from
the KB name; attached KBs bind to an existing collection with the
same name. The actual vector data lives in that Data API Collection,
provisioned transactionally when an owned KB is created and dropped
when that owned KB is deleted.
Reserved chunk-payload keys. The KB-scoped ingest pipeline
stamps knowledgeBaseId, documentId, chunkIndex, and
chunkText onto every chunk's payload so KB-scoped search and the
chunk listing endpoint can filter / display them without a
secondary lookup.
Stage 2 schema. Five additional control-plane tables back the
agent surface:
wb_config_llm_service_by_workspace (LLM executors;
/api/v1/workspaces/{w}/llm-services CRUD),
wb_config_mcp_tools_by_workspace (provisioned, not yet wired —
lands with agent tool-use),
wb_agentic_agents_by_workspace,
wb_agentic_conversations_by_agent, and
wb_agentic_messages_by_conversation. The agent surface — CRUD
plus send + streaming — runs against the last three tables; an
agent's optional llmServiceId points at a row in the LLM-service
table for per-agent provider selection. ChatMessageRepo tracks
agent-role messages per conversation, with cascade delete wired via
the AGENT_CASCADE_STEPS contract.
The agent message send/stream flow lives in
runtimes/typescript/src/chat/
and is layered:
agent-dispatch.ts— Outer dispatcher loop. Owns the iteration-cap loop (MAX_TOOL_ITERATIONS = 6) and the SSE stream variant. Coordinates per-turn resolution, RAG retrieval, persistence, and tool dispatch.agent-resolution.ts— Per-turn effective config resolution. Binds agent + conversation + workspace, resolves the LLM service.agent-persistence.ts— Persistence + tool execution helpers. Persists user/assistant turns, wires tool outputs back into the prompt.tools/dispatcher.ts— Per-call workspace tool dispatch. Executes tool calls, tracks results.prompt.ts— Prompt assembly. Builds the full request to the LLM from conversation history, system prompt, and RAG context.retrieval.ts— RAG retrieval. Fetches context chunks if enabled.
The SSE wire emits events in sequence: user-message → token+ → token-reset? → tool-call → tool-result → done|error. The token-reset event fires after
each tool-call iteration so the SPA can clear pre-tool-call narration from the
live preview before iteration N+1 streams in.
- Every request targeting a specific resource carries the workspace
ID in the path:
/api/v1/workspaces/{workspaceId}/.... - The control-plane store asserts the workspace exists before
returning nested resources. Requests against a non-existent
workspace return
404 workspace_not_found. - Cascade delete:
DELETE /api/v1/workspaces/{w}→ drops the workspace, all knowledge bases (and their underlying collections), all execution services, all RAG documents, all agents, all API keys. The exact order is enumerated incascade.ts.DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb}→ drops the underlying Astra collection first, then the KB row, then cascades RAG document rows.DELETE /api/v1/workspaces/{w}/agents/{a}→ cascades conversations and messages owned by that agent.
- Service → KB binding is N:1. A KB binds exactly one
embedding service, one chunking service, and (optionally) one
reranking service. Multiple KBs can share the same service. A
service deletion is refused (409) while any KB still references
it (error code specialized per
IN_USE_CODES). - Service references are immutable post-create. The
embeddingServiceIdandchunkingServiceIdon a KB are pinned at creation time — vectors and chunks on disk are bound to the models that produced them. Re-embedding requires a new KB; the PATCH schema is.strict()so accidentally including those keys in an update body returns 400.
Workspace creation today:
Client ──► POST /api/v1/workspaces body={name, kind}
│
▼
Hono middleware (request ID, JSON body parse)
│
▼
Zod validation via @hono/zod-openapi
│
▼
workspaceRoutes.createWorkspace handler
│
▼
ControlPlaneStore.createWorkspace(input) ◄── one of memory / file / astra
│
▼ (astra only)
TablesBundle.workspaces.insertOne(row) ────► @datastax/astra-db-ts
│
▼
Astra Data API Table insert
│
▼
c.json(record, 201)
The KB ingest pipeline extends the same shape with calls to a
Chunker, an Embedder, and the KB's auto-provisioned vector
collection (resolved through resolveKb), plus a RagDocument
row that tracks ingest status. Synchronous and async
(?async=true) variants live at
POST /knowledge-bases/{kb}/ingest; the async path returns 202
with a job pointer and updates progress through the JobStore
until terminal.
Every language green box must produce byte-identical /api/v1/*
responses for the shared scenarios in
conformance/scenarios.json.
Fixtures in
conformance/fixtures/
are the source of truth; they're materialized from the canonical
TypeScript runtime via npm run conformance:regenerate.
See conformance.md for details.
Async ingest jobs are backed by the Astra
wb_jobs_by_workspace
table and visible via /api/v1/workspaces/{w}/jobs. Every replica can poll
or SSE-stream job progress.
The orphan-reclaim mechanism in
jobs/sweeper.ts
periodically scans for status: "running" jobs whose lease (leasedAt) is
stale. On a successful CAS-claim of the orphan:
- If the job carries an
ingestInputsnapshot and aresumecallback is configured, the sweeper hands off to the async-ingest worker, which replays the pipeline (chunk IDs are deterministic, so re-upserting is idempotent — wasted embedding cost, correct final state). - Otherwise, the sweeper marks it
failedwith an actionable error.
The snapshot is taken at ingest time and captured in ingest_input_json on
the job row. It enables resumption even if the originating replica has died.
The sweeper is opt-in via controlPlane.jobsResume config.
See cross-replica-jobs.md for details.
- Multi-tenant SaaS concerns (quotas, billing, per-tenant encryption keys).
- General cluster coordination — horizontal scale comes from running
multiple containers behind a load balancer with an
astra(or futurehcd) control plane as the shared source of truth. Broader distributed concerns such as quotas, global rate limits, and external event streams remain deployment-level responsibilities. - Direct database migrations — Astra manages its own.
Tracked in roadmap.md so the architecture doc stays
focused.