Runtime behavior is driven by a single YAML file, conventionally named
workbench.yaml. The runtime loads it at startup and validates it
against a strict schema.
Workspaces, knowledge bases, and execution services are not in
config. They're runtime data, mutable via the HTTP API.
workbench.yaml decides two things:
- Where that data is persisted (the control-plane backend).
- Optionally, which seed workspaces to load into the memory backend at startup.
The runtime looks for the config file in this order and takes the first match:
--config <file>CLI flag.WORKBENCH_CONFIGenvironment variable../workbench.yamlin the process working directory../examples/workbench.yaml— the sample config this runtime ships with. Letsnpm run devwork out-of-the-box when run from the runtime directory./etc/workbench/workbench.yaml(the Docker image default).
No cross-source merging — config is a single declarative document.
--config and WORKBENCH_CONFIG are returned verbatim; they fail
loudly if the target doesn't exist rather than silently falling
through to the next step.
Any string value may reference an environment variable with ${VAR}
or ${VAR:-default} syntax. Interpolation happens before schema
validation.
controlPlane:
driver: astra
endpoint: ${ASTRA_DB_API_ENDPOINT}
tokenRef: env:ASTRA_DB_APPLICATION_TOKENReferences to unset variables without a default fail loudly at startup.
Note: tokenRef above is a SecretRef string, not an
interpolation. Secret refs are resolved at use time by the runtime's
SecretResolver, which is separate from YAML interpolation. See
§ Secrets below.
version: 1 # required
runtime: { port, logLevel, ... } # optional, with defaults
controlPlane: { driver, ... } # optional, default: memory
seedWorkspaces: [ ... ] # optional, memory-onlySchema version. Currently 1. The runtime refuses to start on an
unknown version.
| Field | Type | Default | Notes |
|---|---|---|---|
environment |
enum | development |
development | production. Production mode enforces durable persistence, enabled auth, rejected anonymous traffic, HTTPS publicOrigin, and a persistent OIDC session secret when browser login is configured. |
port |
int | 8080 |
HTTP listen port |
logLevel |
enum | info |
trace | debug | info | warn | error. The LOG_LEVEL env var overrides this when set. |
requestIdHeader |
string | X-Request-Id |
Name of the request-ID header |
uiDir |
string | null | null |
Directory of pre-built UI assets to serve from / (with SPA fallback). null auto-detects /app/public → ${cwd}/public → ${cwd}/apps/web/dist. The UI_DIR env var also works as an override. The official Docker image sets this up automatically. |
replicaId |
string | null | null |
Identifier this replica writes into job leases (used by the cross-replica orphan sweeper to tell whose lease is whose). null auto-generates ${HOSTNAME or "wb"}-<short-uuid> at boot — fine for single-replica deployments and tests; set explicitly for clustered runs if you want the lease holder to be deterministic. |
publicOrigin |
URL | null | null |
Externally visible browser origin, e.g. https://workbench.example.com. Used for OIDC redirect URI construction and secure-cookie decisions. Required for production OIDC browser login. |
trustProxyHeaders |
boolean | false |
Trust X-Forwarded-Proto / X-Forwarded-Host when publicOrigin is not set. Also extends to the rate limiter (X-Forwarded-For / X-Real-IP). Enable only behind a trusted proxy that overwrites those headers. |
csrfOriginCheck |
boolean | true |
CSRF Origin/Referer check on cookie-protected routes (/api/v1/workspaces/* state-changing methods, plus /auth/refresh and /auth/logout). Bearer-token requests bypass the check. Disable only for non-browser clients that authenticate with cookies but cannot send Origin — prefer Bearer auth instead. |
rateLimit |
object | (defaults below) | In-process per-IP rate limiter. See § Rate limiting. |
blockPrivateNetworkEndpoints |
boolean | false |
Layered SSRF defense: when true, operator-supplied endpointBaseUrl values on chunking / embedding / reranking / LLM services are rejected if they resolve to RFC1918 (10/8, 172.16/12, 192.168/16), loopback, or IPv6 unique-local hosts. Auto-flipped to true when runtime.environment: production. Default false so the local-Ollama / local-vLLM dev workflow keeps working; production deployments should still pair this with VPC-level egress controls. |
maxConcurrentIngestJobs |
int (≥1) | 4 |
Per-replica cap on in-flight ingest workers. Beyond the cap, queued jobs wait in-process for a slot rather than slamming the embedding provider's quota. Persisted job state is unaffected; raise for dedicated provisioned-throughput deployments. Surfaced as workbench_ingest_workers_{active,queued} on /metrics. |
tracing |
object | (off) | OpenTelemetry tracing knobs. See § Tracing. |
Production deployments should start from
runtimes/typescript/examples/workbench.production.yaml.
Defense-in-depth limiter applied to /api/v1/* (capacity from
config) and /auth/* (a tighter fixed cap of 30 req/window — login
flows shouldn't burst). Per-IP, per-replica fixed window. Distributed
deployments should still front the runtime with an upstream WAF /
API gateway for accurate aggregate ceilings; this layer protects
against runaway clients and naive scanners.
runtime:
rateLimit:
enabled: true # default
capacity: 600 # max requests per window per IP for /api/v1/*
windowMs: 60000 # window length, ms| Field | Type | Default | Notes |
|---|---|---|---|
enabled |
bool | true |
Set false to skip the limiter entirely. |
capacity |
int (1–1_000_000) | 600 |
Per-IP requests per window for /api/v1/*. The auth surface uses a fixed 30. |
windowMs |
int (1000–3_600_000) | 60000 |
Window length in milliseconds. |
Rejected requests get 429 Too Many Requests with the canonical
error envelope, a Retry-After header (seconds), and
X-RateLimit-{Limit,Remaining,Reset} headers on every response.
Client IP is derived from the socket; set
runtime.trustProxyHeaders: true to honor X-Forwarded-For /
X-Real-IP instead.
OpenTelemetry tracing knobs. Off by default — flipping
enabled: true starts a NodeSDK with the OTLP HTTP trace exporter
and the standard auto-instrumentations bundle. When disabled, the
runtime still creates manual server spans through
@opentelemetry/api so flipping tracing on later does not require
code changes — the spans are just no-ops without a registered SDK.
runtime:
tracing:
enabled: false
serviceName: null # null → "ai-workbench-runtime"
exporterUrl: null # null → OTEL_EXPORTER_OTLP_ENDPOINT / SDK default| Field | Type | Default | Notes |
|---|---|---|---|
enabled |
bool | false |
Start the NodeSDK + auto-instrumentations bundle. |
serviceName |
string | null | null |
Override the service.name resource attribute. null keeps the default ai-workbench-runtime. |
exporterUrl |
URL | null | null |
OTLP HTTP traces endpoint, e.g. https://otel-collector.example.com/v1/traces. null falls back to OTEL_EXPORTER_OTLP_ENDPOINT and the SDK default. |
For full HTTP / fetch / pino auto-instrumentation, preload the SDK
at process launch (node --import ./dist/lib/tracing-preload.js dist/root.js). Without --import, manual server spans cover every
request but outbound HTTP / fetch / DB clients won't emit child
spans. See production.md for the deploy-side
walkthrough.
Picks where workspaces, knowledge bases, execution services, and RAG
documents are persisted. Discriminated on driver.
When controlPlane: is omitted entirely, the runtime infers a
default: if both ASTRA_DB_API_ENDPOINT and
ASTRA_DB_APPLICATION_TOKEN are populated (the
astra-cli auto-detection on boot fills these for any
developer with a working profile), the runtime selects the astra
driver against ASTRA_DB_KEYSPACE (or default_keyspace).
Otherwise it falls back to a file backend rooted at
./.workbench-data. Set controlPlane.driver: memory explicitly if
you want pure in-process state without the on-disk fallback.
controlPlane:
driver: memoryIn-process Maps. State is lost when the runtime exits. Best for CI,
tests, and ephemeral demos. Note that omitting controlPlane
entirely no longer falls through to memory — the runtime's default
prefers Astra (when env vars are present) or a file backend. Set
driver: memory explicitly to opt in.
controlPlane:
driver: file
root: /var/lib/workbenchJSON-on-disk. One file per table, per-table mutex, atomic rename on writes. Single-node self-hosted. Not safe for multiple writers — if you run two containers pointing at the same directory, they'll clobber each other.
| Field | Type | Required | Notes |
|---|---|---|---|
root |
string | yes | Directory that will hold workspaces.json et al. Created if absent. |
controlPlane:
driver: astra
endpoint: https://<db-id>-<region>.apps.astra.datastax.com
tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
keyspace: default_keyspaceAstra Data API Tables via @datastax/astra-db-ts. Production-grade,
multi-writer-safe.
Tip — astra-cli auto-config. If you have the
astraCLI installed and a profile configured, you can leaveASTRA_DB_APPLICATION_TOKENandASTRA_DB_API_ENDPOINTunset locally — the runtime will pick them up from the CLI at startup. Production deployments inject them from a secret manager and the CLI integration is automatically inert. Seeastra-cli.md.
| Field | Type | Required | Notes |
|---|---|---|---|
endpoint |
URL | yes | Astra Data API endpoint |
tokenRef |
SecretRef | yes | Pointer to the application token (env:… / file:…) |
keyspace |
string | no (default default_keyspace) |
Keyspace hosting the wb_* control-plane tables. Defaults to default_keyspace — the keyspace Astra DB auto-creates on every new database — so out-of-the-box deployments boot without pre-creating one. |
jobPollIntervalMs |
int (50–60000) | 500 |
Cross-replica job-subscriber poll interval in ms. Each subscribed (workspace, jobId) pair is re-read at this cadence so SSE clients on a different replica from the worker still see updates. Same-replica updates fan out instantly; the poller is a no-op when no one is subscribed. Raise for cost-sensitive deployments where second-scale staleness is fine; lower for hot SSE paths. Astra-only — memory and file are single-replica by definition. |
jobsResume |
object | off | Cross-replica orphan-sweeper config. See below. |
The runtime creates the wb_* tables at startup if they don't exist
(using createTable(..., { ifNotExists: true })). The keyspace
itself must already exist.
Off by default — only useful for clustered deployments where one
replica can crash mid-ingest while another stays up. Single-replica
operators don't need it (their pipelines always fail-fast on the
same process). When enabled, every replica scans the durable job
store on an interval for running jobs whose lease is older than
the grace window and CAS-claims them. Jobs with a persisted ingest
snapshot replay the pipeline idempotently; older rows without a
snapshot still become terminal failed records so SSE clients do
not hang forever. See
cross-replica-jobs.md.
jobsResume.enabled: true with controlPlane.driver: memory is
rejected at validation time — there is no shared store for sibling
replicas to scan when the leases live in another replica's process
memory.
controlPlane:
driver: astra
endpoint: https://...
tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
jobsResume:
enabled: true
graceMs: 60000 # how stale a lease must be before reclaim
intervalMs: 60000 # how often each replica scans| Field | Type | Default | Notes |
|---|---|---|---|
enabled |
bool | false |
Set to true to start the sweeper. Off by default; clustered deployments opt in. |
graceMs |
int (1000–600000) | 60000 |
Maximum age of a lease (relative to last heartbeat) before the job is considered orphaned. |
intervalMs |
int (1000–600000) | 60000 |
How often each replica scans for stale leases. |
Heartbeats are stamped on every progress update (processed
ticking, status flipping), so any active worker keeps its lease
fresh. Each replica writes its own replicaId (see
runtime.replicaId) into leasedBy so the sweeper can
tell what claim belongs to whom.
Optional list of workspace records loaded into the memory backend at
startup. Lets developers skip the POST /api/v1/workspaces dance
when running locally.
seedWorkspaces:
- name: demo
kind: mock
- name: prod-astra
kind: astra
url: env:ASTRA_DB_API_ENDPOINT
credentials:
token: env:ASTRA_DB_APPLICATION_TOKEN
keyspace: workbench| Field | Type | Required | Notes |
|---|---|---|---|
name |
string | yes | Workspace name |
kind |
enum | yes | astra | hcd | openrag | mock |
uid |
UUID | no (auto-generated) | Only useful if other seeds reference it |
url |
URL or SecretRef | no | Workspace-specific data-plane URL |
credentials |
map<string, SecretRef> | no | Per-key secret pointers |
keyspace |
string | no | Workspace-specific keyspace |
Using seedWorkspaces with any driver other than memory is a
validation error — workspaces already persist in the backend, so
there's nothing to seed.
Embedding services are workspace-scoped runtime data — created via
POST /api/v1/workspaces/{w}/embedding-services, not in
workbench.yaml. The yaml only seeds workspaces; everything past
that flows through the API. Embedding services control how the
runtime turns text into vectors during ingest and search.
The runtime supports two execution paths:
- Client-side embedding (the default). The runtime resolves
endpointBaseUrl+endpointPath+credentialRef, calls the provider's HTTP API at ingest / search time, and writes the resulting vectors to Astra. Works with any embedding provider (OpenAI, Cohere, NVIDIA NIM, self-hosted, etc.) — the runtime carries the bytes. - Vectorize-on-ingest (server-side, Astra-only). When the
embedding service has
provider: "astra"and the bound vector collection was created with a matching Astraserviceblock (e.g.nvidia:nvidia/nv-embedqa-e5-v5), the runtime delegates embedding to Astra's$vectorizecolumn type. Documents are upserted with raw text; Astra runs the embedding model in its own infrastructure and stores both the text and the vector. Search likewise sends the query as text and Astra embeds it server-side.
The vectorize path is faster on hot paths (no extra round-trip to a provider), simpler to operate (no provider credential lives on the runtime), and avoids the dimension-mismatch class of errors. Two constraints to know:
- The embedding service and the collection must agree. Creating
a knowledge base with
provider: "astra"materializes the Astra collection with the service block baked in. Changing the embedding service binding on an existing KB is rejected if the dimension or provider doesn't match what the collection was provisioned with. credentialRefis irrelevant on the runtime side. Astra itself holds whatever provider credentials the vectorize service needs; the runtime never sees them. SettingcredentialRefon aprovider: "astra"embedding service is a no-op.
Mixed-batch upserts (some records carry a precomputed vector,
others carry text) always fall back to client-side embedding so
the entire batch lands in one transactional call. See
api-spec.md §POST /{knowledgeBaseId}/records for
the exact dispatch rules.
Wires up the runtime-wide default chat-completion executor used by
agents that have no llmServiceId of their own. When unset and
the agent also has no LLM service bound, agent send + streaming
return 503 chat_disabled. See agents.md for the
agent surface and the per-agent LLM-service binding.
chat:
tokenRef: env:HUGGINGFACE_API_KEY
model: mistralai/Mistral-7B-Instruct-v0.3
maxOutputTokens: 1024
retrievalK: 6
systemPrompt: null| Field | Type | Default | Notes |
|---|---|---|---|
tokenRef |
SecretRef | required | Resolved once at boot. env:VAR or file:/path. |
model |
string | mistralai/Mistral-7B-Instruct-v0.3 |
Any chat-completion-compatible HuggingFace Inference API model. |
maxOutputTokens |
int (1–8192) | 1024 |
Per-turn cap on the assistant's reply length. |
retrievalK |
int (1–64) | 6 |
Top-K KB chunks per knowledge base. The total injected into the prompt is retrievalK * ceil(sqrt(numKbs)) so multi-KB conversations don't blow up the prompt. |
systemPrompt |
string | null | null |
Default system prompt when neither the agent nor the agent's LLM service supplies one. null falls back to the runtime's persona-agnostic DEFAULT_AGENT_SYSTEM_PROMPT. |
Per-agent override. When an agent has llmServiceId set, the
agent's bound LLM service overrides this block — the runtime
instantiates a chat service from the LLM-service record instead of
using the global block. The agent's own systemPrompt likewise
wins over chat.systemPrompt when present. Multi-provider support
is incremental; today provider: "huggingface" and
provider: "openai" LLM services are wired end-to-end. Other
providers can be stored but return 422 llm_provider_unsupported
until their adapters land.
The runtime exposes a multipart ingest route at
POST /api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest/file
that accepts PDF, DOCX, XLSX, and text uploads. Extraction is
dispatched based on the upload's MIME type / extension;
configuration is via environment variables, not workbench.yaml.
| Variable | Default | Notes |
|---|---|---|
DOCLING_URL |
unset | Base URL of a docling-serve instance. When set, the dispatcher prefers docling over the native pipeline for non-text files (PDF, DOCX, XLSX) and falls back to native if docling is unreachable. The route also accepts an explicit per-upload parser=native|docling|auto form field. |
DOCLING_TIMEOUT_MS |
60000 |
Per-request budget for docling-serve calls. Scanned/OCR'd PDFs can run long; raise this if you see docling_unavailable with timed out after … messages. |
When DOCLING_URL is unset (the default), the runtime uses
pdfjs-dist for PDFs, mammoth for DOCX, and exceljs for XLSX
(rendered as one markdown table per worksheet). Native extraction
is fast and zero-ops but flattens layout-specific structure and
skips OCR; docling preserves layout and does OCR for scanned
documents.
Toggles the Model Context Protocol façade at
/api/v1/workspaces/{w}/mcp. Off by default — when disabled the
route returns 404 so the surface isn't probeable. See
mcp.md for the full feature walkthrough.
mcp:
enabled: true
exposeChat: false| Field | Type | Default | Notes |
|---|---|---|---|
enabled |
bool | false |
When false, MCP is not exposed at all. |
exposeChat |
bool | false |
Adds the chat_send MCP tool. Requires the chat: block; silently skipped when chat is unset. |
Configures the /api/v1/* auth middleware. See
auth.md for the full contract and rollout plan.
auth:
mode: disabled # disabled | apiKey | oidc | any
anonymousPolicy: allow # allow | reject
# oidc: … # required when mode is `oidc` or `any`| Field | Type | Default | Notes |
|---|---|---|---|
mode |
enum | disabled |
Which verifiers are active. |
anonymousPolicy |
enum | allow |
allow lets tokenless requests through as anonymous; reject returns 401 unauthorized. |
bootstrapTokenRef |
SecretRef | null | null |
Optional 32+ character break-glass bearer token. Accepted as an unscoped operator subject when mode is apiKey, oidc, or any; invalid with mode: disabled. |
acknowledgeOpenAccess |
boolean | true |
Controls how the deployment guard reacts when a durable control plane (file / astra) is paired with open auth (mode: disabled or anonymousPolicy: allow). Default true keeps that pairing as a loud startup warning so the dev loop (file CP + open auth) keeps booting. Flip to false in CI / shared environments to convert the warning into a hard fatal. Production deployments should set runtime.environment: production instead — that forces apiKey/oidc/any + anonymousPolicy: reject at the schema layer regardless of this flag. |
oidc |
object | — | Required when mode is oidc or any. See table below. |
The default (disabled + allow) matches pre-auth behavior: the
middleware runs, tags every request anonymous, and lets it
through. Set anonymousPolicy: reject in CI to assert the
middleware is mounted.
| Field | Type | Default | Notes |
|---|---|---|---|
issuer |
url | required | Must equal the JWT iss claim exactly. Discovery URL is derived from this. |
audience |
string | string[] | required | At least one value must match the JWT aud claim. |
jwksUri |
url | null | null |
When null, the runtime fetches ${issuer}/.well-known/openid-configuration at startup and uses jwks_uri from the response. |
clockToleranceSeconds |
int | 30 |
Skew allowance for exp / nbf. |
claims.subject |
string | sub |
JWT claim → AuthSubject.id. |
claims.label |
string | email |
JWT claim → AuthSubject.label (nullable). |
claims.workspaceScopes |
string | wb_workspace_scopes |
Array-valued claim → AuthSubject.workspaceScopes. A JSON null marks the subject unscoped (admin). |
client |
object | — | Optional. When present, the runtime hosts /auth/{login,callback,me,logout} so the bundled web UI can drive a browser PKCE login. See table below. |
| Field | Type | Default | Notes |
|---|---|---|---|
clientId |
string | required | OAuth client identifier registered at the IdP. |
clientSecretRef |
SecretRef | null | null |
Client secret. Omit for public (SPA-style) clients. |
redirectPath |
string | /auth/callback |
Path the IdP redirects to after authorization. Must be in the IdP's allow-list. |
postLogoutPath |
string | / |
Where /auth/logout sends the user. |
scopes |
string[] | [openid, profile, email] |
OAuth scopes requested at login. |
sessionCookieName |
string | wb_session |
Cookie that carries the encrypted session. |
sessionSecretRef |
SecretRef | null | null |
Key material for encrypting session cookies. Must resolve to ≥32 bytes. When null, runtime auto-generates an ephemeral key at boot (dev only). |
Secrets reach the runtime through two disjoint paths:
Applies before schema validation. Good for non-secret runtime settings like endpoints, and for pulling secrets that need to be literal strings in the config document.
The preferred path for anything credential-shaped. A SecretRef is a
string like env:ASTRA_DB_APPLICATION_TOKEN or
file:/etc/workbench/secrets/astra-token. The runtime resolves it
when it actually needs the secret (at control-plane init, for
example), so the value never lives in memory longer than necessary
and never crosses process logs.
Providers available today:
| Provider | Ref shape | Behavior |
|---|---|---|
env |
env:VAR_NAME |
Reads process.env.VAR_NAME. Errors if unset or empty. |
file |
file:/abs/path |
Reads the file and trims trailing whitespace. |
astra-cli |
astra-cli:<profile>:<dbId>:<token|endpoint> |
Sources the token / Data API endpoint from a specific astra CLI profile + database. Lets different workspaces target different Astra databases without restarting. Cached for the process lifetime; errors are not cached. |
Future providers (Vault, AWS SM, etc.) plug into the same
SecretProvider interface. See
runtimes/typescript/src/secrets/provider.ts.
At startup the runtime enforces:
- Every
${VAR}reference resolves or has a default. controlPlane.driveris one ofmemory | file | astra.- Driver-specific required fields are present (e.g.
rootfor file,endpoint+tokenReffor astra). - Every
tokenRef/credentialsvalue matches the<prefix>:<path>shape. seedWorkspacesis only non-empty whencontrolPlane.driver == memory.- No duplicate names within
seedWorkspaces.
Validation failures abort startup with a non-zero exit code and a human-readable error message.
Not supported. The current model is "restart the process to pick up changes." Since only the control-plane backend is configured here (workspaces themselves are runtime data), most day-to-day operations don't require a config change anyway.
SIGINT and SIGTERM trigger a graceful-shutdown sequence:
/readyzstarts returning503 draining. Kubernetes-style readiness probes will stop routing traffic here.server.close()stops accepting new connections. In-flight requests keep going.- When every connection finishes (or after 15 seconds, whichever
comes first), the control-plane store closes and the process
exits
0. A timeout exits1so the supervisor knows we didn't drain cleanly. - A second
SIGINT/SIGTERMwhile the first is still draining short-circuits straight to exit — the operator can force-kill a stuck process without waiting for the timeout.
/healthz stays 200 throughout the drain (the process is still
alive, just closed to new traffic). That's the split that k8s
expects — livenessProbe hits /healthz, readinessProbe hits
/readyz.
The runtime auto-loads a .env file at startup so local dev doesn't
need you to export secrets by hand. Uses Node 21.7+'s built-in
process.loadEnvFile — no dotenv dependency.
Location. Put it at the repo root. The runtime walks up from
the process's current working directory looking for .env, stopping
at the repo root (.git sentinel). That means the same file works
whether you run npm run dev from the repo root or from
runtimes/typescript/.
Precedence. Values already present in process.env win —
.env never overwrites shell exports or container env vars. Matches
every other dotenv loader.
Override the path. Set WORKBENCH_ENV_FILE=/abs/path/to/.env to
skip the walk and load an explicit file (missing files fail loudly).
Useful for production container boots where the token lives on a
mounted secret.
Template. .env.example at the repo root is
a committed starting point — copy to .env and fill in the secrets
you need. .env itself is gitignored.
Production. The runtime ships the same loader in production, but
standard container practice (Docker -e / K8s Secrets → env vars)
usually means no .env is present and the loader silently skips.
All canonical examples live under
runtimes/typescript/examples/:
workbench.yaml— the default dev config the Docker image ships with, with annotated comments covering all three backends,seedWorkspaces, and auth stanzas.workbench.production.yaml— hardened production preset (astra backend, OIDC, security headers).workbench.memory.yaml— CI / smoke-test preset (in-memory only, no persistence).