Configuration (`workbench.yaml`)

Runtime behavior is driven by a single YAML file, conventionally named workbench.yaml. The runtime loads it at startup and validates it against a strict schema.

Workspaces, knowledge bases, and execution services are not in config. They're runtime data, mutable via the HTTP API. workbench.yaml decides two things:

Where that data is persisted (the control-plane backend).
Optionally, which seed workspaces to load into the memory backend at startup.

Resolution order

The runtime looks for the config file in this order and takes the first match:

--config <file> CLI flag.
WORKBENCH_CONFIG environment variable.
./workbench.yaml in the process working directory.
./examples/workbench.yaml — the sample config this runtime ships with. Lets npm run dev work out-of-the-box when run from the runtime directory.
/etc/workbench/workbench.yaml (the Docker image default).

No cross-source merging — config is a single declarative document. --config and WORKBENCH_CONFIG are returned verbatim; they fail loudly if the target doesn't exist rather than silently falling through to the next step.

Environment variable interpolation

Any string value may reference an environment variable with ${VAR} or ${VAR:-default} syntax. Interpolation happens before schema validation.

controlPlane:
  driver: astra
  endpoint: ${ASTRA_DB_API_ENDPOINT}
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN

References to unset variables without a default fail loudly at startup.

Note: tokenRef above is a SecretRef string, not an interpolation. Secret refs are resolved at use time by the runtime's SecretResolver, which is separate from YAML interpolation. See § Secrets below.

Top-level schema

version: 1                          # required
runtime: { port, logLevel, ... }    # optional, with defaults
controlPlane: { driver, ... }       # optional, default: memory
seedWorkspaces: [ ... ]             # optional, memory-only

`version`

Schema version. Currently 1. The runtime refuses to start on an unknown version.

`runtime`

Field	Type	Default	Notes
`environment`	enum	`development`	`development \| production`. Production mode enforces durable persistence, enabled auth, rejected anonymous traffic, HTTPS `publicOrigin`, and a persistent OIDC session secret when browser login is configured.
`port`	int	`8080`	HTTP listen port
`logLevel`	enum	`info`	`trace \| debug \| info \| warn \| error`. The `LOG_LEVEL` env var overrides this when set.
`requestIdHeader`	string	`X-Request-Id`	Name of the request-ID header
`uiDir`	string \| null	`null`	Directory of pre-built UI assets to serve from `/` (with SPA fallback). `null` auto-detects `/app/public` → `${cwd}/public` → `${cwd}/apps/web/dist`. The `UI_DIR` env var also works as an override. The official Docker image sets this up automatically.
`replicaId`	string \| null	`null`	Identifier this replica writes into job leases (used by the cross-replica orphan sweeper to tell whose lease is whose). `null` auto-generates `${HOSTNAME or "wb"}-<short-uuid>` at boot — fine for single-replica deployments and tests; set explicitly for clustered runs if you want the lease holder to be deterministic.
`publicOrigin`	URL \| null	`null`	Externally visible browser origin, e.g. `https://workbench.example.com`. Used for OIDC redirect URI construction and secure-cookie decisions. Required for production OIDC browser login.
`trustProxyHeaders`	boolean	`false`	Trust `X-Forwarded-Proto` / `X-Forwarded-Host` when `publicOrigin` is not set. Also extends to the rate limiter (`X-Forwarded-For` / `X-Real-IP`). Enable only behind a trusted proxy that overwrites those headers.
`csrfOriginCheck`	boolean	`true`	CSRF Origin/Referer check on cookie-protected routes (`/api/v1/workspaces/*` state-changing methods, plus `/auth/refresh` and `/auth/logout`). Bearer-token requests bypass the check. Disable only for non-browser clients that authenticate with cookies but cannot send `Origin` — prefer Bearer auth instead.
`rateLimit`	object	(defaults below)	In-process per-IP rate limiter. See § Rate limiting.
`blockPrivateNetworkEndpoints`	boolean	`false`	Layered SSRF defense: when `true`, operator-supplied `endpointBaseUrl` values on chunking / embedding / reranking / LLM services are rejected if they resolve to RFC1918 (`10/8`, `172.16/12`, `192.168/16`), loopback, or IPv6 unique-local hosts. Auto-flipped to `true` when `runtime.environment: production`. Default `false` so the local-Ollama / local-vLLM dev workflow keeps working; production deployments should still pair this with VPC-level egress controls.
`maxConcurrentIngestJobs`	int (≥1)	`4`	Per-replica cap on in-flight ingest workers. Beyond the cap, queued jobs wait in-process for a slot rather than slamming the embedding provider's quota. Persisted job state is unaffected; raise for dedicated provisioned-throughput deployments. Surfaced as `workbench_ingest_workers_{active,queued}` on `/metrics`.
`tracing`	object	(off)	OpenTelemetry tracing knobs. See § Tracing.

Production deployments should start from runtimes/typescript/examples/workbench.production.yaml.

Rate limiting

Defense-in-depth limiter applied to /api/v1/* (capacity from config) and /auth/* (a tighter fixed cap of 30 req/window — login flows shouldn't burst). Per-IP, per-replica fixed window. Distributed deployments should still front the runtime with an upstream WAF / API gateway for accurate aggregate ceilings; this layer protects against runaway clients and naive scanners.

runtime:
  rateLimit:
    enabled: true        # default
    capacity: 600        # max requests per window per IP for /api/v1/*
    windowMs: 60000      # window length, ms

Field	Type	Default	Notes
`enabled`	bool	`true`	Set `false` to skip the limiter entirely.
`capacity`	int (1–1_000_000)	`600`	Per-IP requests per window for `/api/v1/*`. The auth surface uses a fixed `30`.
`windowMs`	int (1000–3_600_000)	`60000`	Window length in milliseconds.

Rejected requests get 429 Too Many Requests with the canonical error envelope, a Retry-After header (seconds), and X-RateLimit-{Limit,Remaining,Reset} headers on every response. Client IP is derived from the socket; set runtime.trustProxyHeaders: true to honor X-Forwarded-For / X-Real-IP instead.

Tracing

OpenTelemetry tracing knobs. Off by default — flipping enabled: true starts a NodeSDK with the OTLP HTTP trace exporter and the standard auto-instrumentations bundle. When disabled, the runtime still creates manual server spans through @opentelemetry/api so flipping tracing on later does not require code changes — the spans are just no-ops without a registered SDK.

runtime:
  tracing:
    enabled: false
    serviceName: null         # null → "ai-workbench-runtime"
    exporterUrl: null         # null → OTEL_EXPORTER_OTLP_ENDPOINT / SDK default

Field	Type	Default	Notes
`enabled`	bool	`false`	Start the NodeSDK + auto-instrumentations bundle.
`serviceName`	string \| null	`null`	Override the `service.name` resource attribute. `null` keeps the default `ai-workbench-runtime`.
`exporterUrl`	URL \| null	`null`	OTLP HTTP traces endpoint, e.g. `https://otel-collector.example.com/v1/traces`. `null` falls back to `OTEL_EXPORTER_OTLP_ENDPOINT` and the SDK default.

For full HTTP / fetch / pino auto-instrumentation, preload the SDK at process launch (node --import ./dist/lib/tracing-preload.js dist/root.js). Without --import, manual server spans cover every request but outbound HTTP / fetch / DB clients won't emit child spans. See production.md for the deploy-side walkthrough.

`controlPlane`

Picks where workspaces, knowledge bases, execution services, and RAG documents are persisted. Discriminated on driver.

When controlPlane: is omitted entirely, the runtime infers a default: if both ASTRA_DB_API_ENDPOINT and ASTRA_DB_APPLICATION_TOKEN are populated (the astra-cli auto-detection on boot fills these for any developer with a working profile), the runtime selects the astra driver against ASTRA_DB_KEYSPACE (or default_keyspace). Otherwise it falls back to a file backend rooted at ./.workbench-data. Set controlPlane.driver: memory explicitly if you want pure in-process state without the on-disk fallback.

`memory`

controlPlane:
  driver: memory

In-process Maps. State is lost when the runtime exits. Best for CI, tests, and ephemeral demos. Note that omitting controlPlane entirely no longer falls through to memory — the runtime's default prefers Astra (when env vars are present) or a file backend. Set driver: memory explicitly to opt in.

`file`

controlPlane:
  driver: file
  root: /var/lib/workbench

JSON-on-disk. One file per table, per-table mutex, atomic rename on writes. Single-node self-hosted. Not safe for multiple writers — if you run two containers pointing at the same directory, they'll clobber each other.

Field	Type	Required	Notes
`root`	string	yes	Directory that will hold `workspaces.json` et al. Created if absent.

`astra`

controlPlane:
  driver: astra
  endpoint: https://<db-id>-<region>.apps.astra.datastax.com
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
  keyspace: default_keyspace

Astra Data API Tables via @datastax/astra-db-ts. Production-grade, multi-writer-safe.

Tip — astra-cli auto-config. If you have the astra CLI installed and a profile configured, you can leave ASTRA_DB_APPLICATION_TOKEN and ASTRA_DB_API_ENDPOINT unset locally — the runtime will pick them up from the CLI at startup. Production deployments inject them from a secret manager and the CLI integration is automatically inert. See astra-cli.md.

Field	Type	Required	Notes
`endpoint`	URL	yes	Astra Data API endpoint
`tokenRef`	SecretRef	yes	Pointer to the application token (`env:…` / `file:…`)
`keyspace`	string	no (default `default_keyspace`)	Keyspace hosting the `wb_*` control-plane tables. Defaults to `default_keyspace` — the keyspace Astra DB auto-creates on every new database — so out-of-the-box deployments boot without pre-creating one.
`jobPollIntervalMs`	int (50–60000)	`500`	Cross-replica job-subscriber poll interval in ms. Each subscribed `(workspace, jobId)` pair is re-read at this cadence so SSE clients on a different replica from the worker still see updates. Same-replica updates fan out instantly; the poller is a no-op when no one is subscribed. Raise for cost-sensitive deployments where second-scale staleness is fine; lower for hot SSE paths. Astra-only — `memory` and `file` are single-replica by definition.
`jobsResume`	object	off	Cross-replica orphan-sweeper config. See below.

The runtime creates the wb_* tables at startup if they don't exist (using createTable(..., { ifNotExists: true })). The keyspace itself must already exist.

`controlPlane.jobsResume` (file / astra)

Off by default — only useful for clustered deployments where one replica can crash mid-ingest while another stays up. Single-replica operators don't need it (their pipelines always fail-fast on the same process). When enabled, every replica scans the durable job store on an interval for running jobs whose lease is older than the grace window and CAS-claims them. Jobs with a persisted ingest snapshot replay the pipeline idempotently; older rows without a snapshot still become terminal failed records so SSE clients do not hang forever. See cross-replica-jobs.md.

jobsResume.enabled: true with controlPlane.driver: memory is rejected at validation time — there is no shared store for sibling replicas to scan when the leases live in another replica's process memory.

controlPlane:
  driver: astra
  endpoint: https://...
  tokenRef: env:ASTRA_DB_APPLICATION_TOKEN
  jobsResume:
    enabled: true
    graceMs: 60000     # how stale a lease must be before reclaim
    intervalMs: 60000  # how often each replica scans

Field	Type	Default	Notes
`enabled`	bool	`false`	Set to `true` to start the sweeper. Off by default; clustered deployments opt in.
`graceMs`	int (1000–600000)	`60000`	Maximum age of a lease (relative to last heartbeat) before the job is considered orphaned.
`intervalMs`	int (1000–600000)	`60000`	How often each replica scans for stale leases.

Heartbeats are stamped on every progress update (processed ticking, status flipping), so any active worker keeps its lease fresh. Each replica writes its own replicaId (see runtime.replicaId) into leasedBy so the sweeper can tell what claim belongs to whom.

`seedWorkspaces` (memory only)

Optional list of workspace records loaded into the memory backend at startup. Lets developers skip the POST /api/v1/workspaces dance when running locally.

seedWorkspaces:
  - name: demo
    kind: mock
  - name: prod-astra
    kind: astra
    url: env:ASTRA_DB_API_ENDPOINT
    credentials:
      token: env:ASTRA_DB_APPLICATION_TOKEN
    keyspace: workbench

Field	Type	Required	Notes
`name`	string	yes	Workspace name
`kind`	enum	yes	`astra \| hcd \| openrag \| mock`
`uid`	UUID	no (auto-generated)	Only useful if other seeds reference it
`url`	URL or SecretRef	no	Workspace-specific data-plane URL
`credentials`	map<string, SecretRef>	no	Per-key secret pointers
`keyspace`	string	no	Workspace-specific keyspace

Using seedWorkspaces with any driver other than memory is a validation error — workspaces already persist in the backend, so there's nothing to seed.

Embedding services and vectorize-on-ingest

Embedding services are workspace-scoped runtime data — created via POST /api/v1/workspaces/{w}/embedding-services, not in workbench.yaml. The yaml only seeds workspaces; everything past that flows through the API. Embedding services control how the runtime turns text into vectors during ingest and search.

The runtime supports two execution paths:

Client-side embedding (the default). The runtime resolves endpointBaseUrl + endpointPath + credentialRef, calls the provider's HTTP API at ingest / search time, and writes the resulting vectors to Astra. Works with any embedding provider (OpenAI, Cohere, NVIDIA NIM, self-hosted, etc.) — the runtime carries the bytes.
Vectorize-on-ingest (server-side, Astra-only). When the embedding service has provider: "astra" and the bound vector collection was created with a matching Astra service block (e.g. nvidia:nvidia/nv-embedqa-e5-v5), the runtime delegates embedding to Astra's $vectorize column type. Documents are upserted with raw text; Astra runs the embedding model in its own infrastructure and stores both the text and the vector. Search likewise sends the query as text and Astra embeds it server-side.

The vectorize path is faster on hot paths (no extra round-trip to a provider), simpler to operate (no provider credential lives on the runtime), and avoids the dimension-mismatch class of errors. Two constraints to know:

The embedding service and the collection must agree. Creating a knowledge base with provider: "astra" materializes the Astra collection with the service block baked in. Changing the embedding service binding on an existing KB is rejected if the dimension or provider doesn't match what the collection was provisioned with.
credentialRef is irrelevant on the runtime side. Astra itself holds whatever provider credentials the vectorize service needs; the runtime never sees them. Setting credentialRef on a provider: "astra" embedding service is a no-op.

Mixed-batch upserts (some records carry a precomputed vector, others carry text) always fall back to client-side embedding so the entire batch lands in one transactional call. See api-spec.md §POST /{knowledgeBaseId}/records for the exact dispatch rules.

`chat` (optional)

Wires up the runtime-wide default chat-completion executor used by agents that have no llmServiceId of their own. When unset and the agent also has no LLM service bound, agent send + streaming return 503 chat_disabled. See agents.md for the agent surface and the per-agent LLM-service binding.

chat:
  tokenRef: env:HUGGINGFACE_API_KEY
  model: mistralai/Mistral-7B-Instruct-v0.3
  maxOutputTokens: 1024
  retrievalK: 6
  systemPrompt: null

Field	Type	Default	Notes
`tokenRef`	SecretRef	required	Resolved once at boot. `env:VAR` or `file:/path`.
`model`	string	`mistralai/Mistral-7B-Instruct-v0.3`	Any chat-completion-compatible HuggingFace Inference API model.
`maxOutputTokens`	int (1–8192)	`1024`	Per-turn cap on the assistant's reply length.
`retrievalK`	int (1–64)	`6`	Top-K KB chunks per knowledge base. The total injected into the prompt is `retrievalK * ceil(sqrt(numKbs))` so multi-KB conversations don't blow up the prompt.
`systemPrompt`	string \| null	`null`	Default system prompt when neither the agent nor the agent's LLM service supplies one. `null` falls back to the runtime's persona-agnostic `DEFAULT_AGENT_SYSTEM_PROMPT`.

Per-agent override. When an agent has llmServiceId set, the agent's bound LLM service overrides this block — the runtime instantiates a chat service from the LLM-service record instead of using the global block. The agent's own systemPrompt likewise wins over chat.systemPrompt when present. Multi-provider support is incremental; today provider: "huggingface" and provider: "openai" LLM services are wired end-to-end. Other providers can be stored but return 422 llm_provider_unsupported until their adapters land.

Document extraction (optional)

The runtime exposes a multipart ingest route at POST /api/v1/workspaces/{w}/knowledge-bases/{kb}/ingest/file that accepts PDF, DOCX, XLSX, and text uploads. Extraction is dispatched based on the upload's MIME type / extension; configuration is via environment variables, not workbench.yaml.

Variable	Default	Notes
`DOCLING_URL`	unset	Base URL of a docling-serve instance. When set, the dispatcher prefers docling over the native pipeline for non-text files (PDF, DOCX, XLSX) and falls back to native if docling is unreachable. The route also accepts an explicit per-upload `parser=native\|docling\|auto` form field.
`DOCLING_TIMEOUT_MS`	`60000`	Per-request budget for docling-serve calls. Scanned/OCR'd PDFs can run long; raise this if you see `docling_unavailable` with `timed out after …` messages.

When DOCLING_URL is unset (the default), the runtime uses pdfjs-dist for PDFs, mammoth for DOCX, and exceljs for XLSX (rendered as one markdown table per worksheet). Native extraction is fast and zero-ops but flattens layout-specific structure and skips OCR; docling preserves layout and does OCR for scanned documents.

`mcp` (optional)

Toggles the Model Context Protocol façade at /api/v1/workspaces/{w}/mcp. Off by default — when disabled the route returns 404 so the surface isn't probeable. See mcp.md for the full feature walkthrough.

mcp:
  enabled: true
  exposeChat: false

Field	Type	Default	Notes
`enabled`	bool	`false`	When false, MCP is not exposed at all.
`exposeChat`	bool	`false`	Adds the `chat_send` MCP tool. Requires the `chat:` block; silently skipped when chat is unset.

`auth`

Configures the /api/v1/* auth middleware. See auth.md for the full contract and rollout plan.

auth:
  mode: disabled          # disabled | apiKey | oidc | any
  anonymousPolicy: allow  # allow | reject
  # oidc: …               # required when mode is `oidc` or `any`

Field	Type	Default	Notes
`mode`	enum	`disabled`	Which verifiers are active.
`anonymousPolicy`	enum	`allow`	`allow` lets tokenless requests through as anonymous; `reject` returns `401 unauthorized`.
`bootstrapTokenRef`	SecretRef \| null	`null`	Optional 32+ character break-glass bearer token. Accepted as an unscoped operator subject when `mode` is `apiKey`, `oidc`, or `any`; invalid with `mode: disabled`.
`acknowledgeOpenAccess`	boolean	`true`	Controls how the deployment guard reacts when a durable control plane (`file` / `astra`) is paired with open auth (`mode: disabled` or `anonymousPolicy: allow`). Default `true` keeps that pairing as a loud startup warning so the dev loop (file CP + open auth) keeps booting. Flip to `false` in CI / shared environments to convert the warning into a hard fatal. Production deployments should set `runtime.environment: production` instead — that forces `apiKey`/`oidc`/`any` + `anonymousPolicy: reject` at the schema layer regardless of this flag.
`oidc`	object	—	Required when `mode` is `oidc` or `any`. See table below.

The default (disabled + allow) matches pre-auth behavior: the middleware runs, tags every request anonymous, and lets it through. Set anonymousPolicy: reject in CI to assert the middleware is mounted.

`auth.oidc`

Field	Type	Default	Notes
`issuer`	url	required	Must equal the JWT `iss` claim exactly. Discovery URL is derived from this.
`audience`	string \| string[]	required	At least one value must match the JWT `aud` claim.
`jwksUri`	url \| null	`null`	When null, the runtime fetches `${issuer}/.well-known/openid-configuration` at startup and uses `jwks_uri` from the response.
`clockToleranceSeconds`	int	`30`	Skew allowance for `exp` / `nbf`.
`claims.subject`	string	`sub`	JWT claim → `AuthSubject.id`.
`claims.label`	string	`email`	JWT claim → `AuthSubject.label` (nullable).
`claims.workspaceScopes`	string	`wb_workspace_scopes`	Array-valued claim → `AuthSubject.workspaceScopes`. A JSON `null` marks the subject unscoped (admin).
`client`	object	—	Optional. When present, the runtime hosts `/auth/{login,callback,me,logout}` so the bundled web UI can drive a browser PKCE login. See table below.

`auth.oidc.client`

Field	Type	Default	Notes
`clientId`	string	required	OAuth client identifier registered at the IdP.
`clientSecretRef`	SecretRef \| null	`null`	Client secret. Omit for public (SPA-style) clients.
`redirectPath`	string	`/auth/callback`	Path the IdP redirects to after authorization. Must be in the IdP's allow-list.
`postLogoutPath`	string	`/`	Where `/auth/logout` sends the user.
`scopes`	string[]	`[openid, profile, email]`	OAuth scopes requested at login.
`sessionCookieName`	string	`wb_session`	Cookie that carries the encrypted session.
`sessionSecretRef`	SecretRef \| null	`null`	Key material for encrypting session cookies. Must resolve to ≥32 bytes. When null, runtime auto-generates an ephemeral key at boot (dev only).

Secrets

Secrets reach the runtime through two disjoint paths:

YAML interpolation (`${VAR}`)

Applies before schema validation. Good for non-secret runtime settings like endpoints, and for pulling secrets that need to be literal strings in the config document.

Secret refs (`env:` / `file:`)

The preferred path for anything credential-shaped. A SecretRef is a string like env:ASTRA_DB_APPLICATION_TOKEN or file:/etc/workbench/secrets/astra-token. The runtime resolves it when it actually needs the secret (at control-plane init, for example), so the value never lives in memory longer than necessary and never crosses process logs.

Providers available today:

Provider	Ref shape	Behavior
`env`	`env:VAR_NAME`	Reads `process.env.VAR_NAME`. Errors if unset or empty.
`file`	`file:/abs/path`	Reads the file and trims trailing whitespace.
`astra-cli`	`astra-cli:<profile>:<dbId>:<token\|endpoint>`	Sources the token / Data API endpoint from a specific `astra` CLI profile + database. Lets different workspaces target different Astra databases without restarting. Cached for the process lifetime; errors are not cached.

Future providers (Vault, AWS SM, etc.) plug into the same SecretProvider interface. See runtimes/typescript/src/secrets/provider.ts.

Validation rules

At startup the runtime enforces:

Every ${VAR} reference resolves or has a default.
controlPlane.driver is one of memory | file | astra.
Driver-specific required fields are present (e.g. root for file, endpoint + tokenRef for astra).
Every tokenRef / credentials value matches the <prefix>:<path> shape.
seedWorkspaces is only non-empty when controlPlane.driver == memory.
No duplicate names within seedWorkspaces.

Validation failures abort startup with a non-zero exit code and a human-readable error message.

Hot reload

Not supported. The current model is "restart the process to pick up changes." Since only the control-plane backend is configured here (workspaces themselves are runtime data), most day-to-day operations don't require a config change anyway.

Graceful shutdown

SIGINT and SIGTERM trigger a graceful-shutdown sequence:

/readyz starts returning 503 draining. Kubernetes-style readiness probes will stop routing traffic here.
server.close() stops accepting new connections. In-flight requests keep going.
When every connection finishes (or after 15 seconds, whichever comes first), the control-plane store closes and the process exits 0. A timeout exits 1 so the supervisor knows we didn't drain cleanly.
A second SIGINT / SIGTERM while the first is still draining short-circuits straight to exit — the operator can force-kill a stuck process without waiting for the timeout.

/healthz stays 200 throughout the drain (the process is still alive, just closed to new traffic). That's the split that k8s expects — livenessProbe hits /healthz, readinessProbe hits /readyz.

`.env` file (dev convenience)

The runtime auto-loads a .env file at startup so local dev doesn't need you to export secrets by hand. Uses Node 21.7+'s built-in process.loadEnvFile — no dotenv dependency.

Location. Put it at the repo root. The runtime walks up from the process's current working directory looking for .env, stopping at the repo root (.git sentinel). That means the same file works whether you run npm run dev from the repo root or from runtimes/typescript/.

Precedence. Values already present in process.env win — .env never overwrites shell exports or container env vars. Matches every other dotenv loader.

Override the path. Set WORKBENCH_ENV_FILE=/abs/path/to/.env to skip the walk and load an explicit file (missing files fail loudly). Useful for production container boots where the token lives on a mounted secret.

Template. .env.example at the repo root is a committed starting point — copy to .env and fill in the secrets you need. .env itself is gitignored.

Production. The runtime ships the same loader in production, but standard container practice (Docker -e / K8s Secrets → env vars) usually means no .env is present and the loader silently skips.

Examples

All canonical examples live under runtimes/typescript/examples/:

workbench.yaml — the default dev config the Docker image ships with, with annotated comments covering all three backends, seedWorkspaces, and auth stanzas.
workbench.production.yaml — hardened production preset (astra backend, OIDC, security headers).
workbench.memory.yaml — CI / smoke-test preset (in-memory only, no persistence).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration (`workbench.yaml`)

Resolution order

Environment variable interpolation

Top-level schema

`version`

`runtime`

Rate limiting

Tracing

`controlPlane`

`memory`

`file`

`astra`

`controlPlane.jobsResume` (file / astra)

`seedWorkspaces` (memory only)

Embedding services and vectorize-on-ingest

`chat` (optional)

Document extraction (optional)

`mcp` (optional)

`auth`

`auth.oidc`

`auth.oidc.client`

Secrets

YAML interpolation (`${VAR}`)

Secret refs (`env:` / `file:`)

Validation rules

Hot reload

Graceful shutdown

`.env` file (dev convenience)

Examples

FilesExpand file tree

configuration.md

Latest commit

History

configuration.md

File metadata and controls

Configuration (workbench.yaml)

Resolution order

Environment variable interpolation

Top-level schema

version

runtime

Rate limiting

Tracing

controlPlane

memory

file

astra

controlPlane.jobsResume (file / astra)

seedWorkspaces (memory only)

Embedding services and vectorize-on-ingest

chat (optional)

Document extraction (optional)

mcp (optional)

auth

auth.oidc

auth.oidc.client

Secrets

YAML interpolation (${VAR})

Secret refs (env: / file:)

Validation rules

Hot reload

Graceful shutdown

.env file (dev convenience)

Examples

Configuration (`workbench.yaml`)

`version`

`runtime`

`controlPlane`

`memory`

`file`

`astra`

`controlPlane.jobsResume` (file / astra)

`seedWorkspaces` (memory only)

`chat` (optional)

`mcp` (optional)

`auth`

`auth.oidc`

`auth.oidc.client`

YAML interpolation (`${VAR}`)

Secret refs (`env:` / `file:`)

`.env` file (dev convenience)