Production checklist

AI Workbench can boot with zero credentials for local development. Production deployments should tighten a few knobs before exposing the runtime beyond a trusted loopback or private admin network.

Required before exposure

Use the production environment guard. Set runtime.environment: production in your deploy config. The TypeScript runtime then refuses configs that leave auth disabled, allow anonymous API traffic, use the in-memory control plane, or enable browser OIDC login without a persistent session secret. Start from runtimes/typescript/examples/workbench.production.yaml.
Turn on auth. Use auth.mode: apiKey, oidc, or any with anonymousPolicy: reject. The runtime logs a warning when a non-memory control plane accepts anonymous API traffic.
Use a bootstrap token instead of opening anonymous access. Set auth.bootstrapTokenRef to a 32+ character secret ref, create the first workspace/API key with Authorization: Bearer <bootstrap>, then remove or rotate the bootstrap secret once operator access is established.
Use a persistent OIDC session key. Browser login deployments should set auth.oidc.client.sessionSecretRef to a 32+ byte secret. Without it, all sessions invalidate on restart and multi-replica sessions cannot decrypt consistently.
Serve over HTTPS. Put the runtime behind a TLS-terminating reverse proxy or ingress and set runtime.publicOrigin to the externally visible https://... origin. Only set runtime.trustProxyHeaders: true when a trusted proxy overwrites incoming X-Forwarded-* headers.
Keep local files out of build contexts. The repo ships a .dockerignore that excludes .env*, local state, build output, and dependency folders. Keep deployment-specific secrets outside the Docker build context as well.

Persistence

Use astra for multi-replica production. The file backend is single-node only. Do not point two containers at the same file root.
Back up file-backed state. If using controlPlane.driver: file, back up the JSON root and mount it on durable storage. The backend uses atomic rename for writes, but it is not a database and does not provide multi-process locking or point-in-time restore.
Enable job resume in clustered deployments. For astra, set controlPlane.jobsResume.enabled: true so another replica can claim stale async-ingest leases after a crash.

Operational hardening

Pin and rotate secrets. Prefer env: or file: secret refs for Astra, OIDC, session, provider, and bootstrap credentials — every credential lives behind a SecretRef, so the value stays in your secret source and never lands in the control plane. Rotate refs-with-a-server-side-cache (Astra, OIDC client + session, bootstrap, provider API keys) by updating the secret source and restarting the runtime so in-process driver and provider clients reconnect with fresh credentials; external MCP-server credentials are resolved per turn and take effect on the next agent turn without a restart. Workspace API keys (wb_live_*) rotate by revoke + re-mint with the desired role/scopes (scopes are fixed at issue time). GET /setup-status reports configuredKeys — the names of configured credentials, never their values — so you can confirm a key is present without reading it back. Full per-credential procedures live in docs/auth.md → Secret rotation.
Forward audit events to a durable sink. The runtime emits structured audit events for API-key issuance/revocation, workspace + KB + agent + document create/delete, MCP tool invocations, job claim, and OIDC login/refresh/logout (see docs/audit.md for the full catalog and envelope shape). Events are pino lines at info with audit: true; route them to a SIEM/file via your container log pipeline. RBAC enforcement remains on the roadmap.

Scrape metrics + propagate trace context. The runtime exposes a Prometheus exposition at GET /metrics (text format, no auth — same precedent as /healthz / /readyz). HTTP request counter + duration histogram are labeled by method, matched route pattern, and status family (2xx/4xx/5xx) to keep cardinality bounded. Rate-limit rejections are counted by key type, and the ingest semaphore exposes workbench_ingest_workers_{active,queued} gauges. Domain counters added in 0.2:

Metric	Labels
`workbench_chat_requests_total`	`provider`, `outcome` (stop / length / tool_calls / error / stream_ok)
`workbench_ingest_documents_total`	`outcome` (ok / failed)
`workbench_search_requests_total`	`mode` (vector / hybrid / *_rerank), `outcome` (ok / error)
`workbench_search_duration_seconds`	`mode`

A starter Grafana dashboard with rows for HTTP, chat, ingest, and search lives at docs/observability/grafana-workbench.json — import via Dashboards → Import → Upload JSON, pick your Prometheus datasource at the prompt. The dashboard targets the metric names verbatim; no recording rules required.

Two deeper-health endpoints land alongside the metrics: GET /health/details returns probe results for the control plane and chat provider ({status, detail, durationMs}), and GET /health/recent-errors returns the last 100 error envelopes (code, status, method, route pattern, request id, timestamp — no PII). Both are unauthenticated.

Inbound traceparent headers are honored when valid (the trace id becomes the request id) and a fresh W3C traceparent is emitted on every response so service meshes can correlate.

Opt-in anonymous telemetry. Off by default. Enable via WORKBENCH_TELEMETRY=1 (env wins over runtime.telemetry.enabled in YAML) and point at a sink with WORKBENCH_TELEMETRY_URL; without a URL the emitter runs in dark mode (events are constructed but never sent). The CLI has the same posture under AIW_TELEMETRY / AIW_TELEMETRY_URL. Strictly categorical fields only — see the full event catalog + no-PII guarantee in docs/telemetry.md.
Enable OpenTelemetry tracing when the deployment has a collector. The runtime always creates manual SERVER spans through @opentelemetry/api (no-op when no SDK is registered), so flipping tracing on is a config change, not a code change:
```
runtime:
  tracing:
    enabled: true
    serviceName: ai-workbench-runtime  # optional override
    exporterUrl: https://otel-collector.example.com/v1/traces  # or use OTEL_EXPORTER_OTLP_ENDPOINT
```
Standard OTEL_* env vars (OTEL_EXPORTER_OTLP_HEADERS, OTEL_TRACES_SAMPLER, …) work as documented by the SDK. For full HTTP / fetch / pino auto-instrumentation, preload the SDK at process launch:
```
node --import ./dist/lib/tracing-preload.js dist/root.js
```
Without --import, manual server spans cover every request but outbound HTTP / fetch / DB clients won't emit child spans.
Apply rate limiting in front of the runtime. The in-process limiter defaults to 600 req/min/IP for /api/v1/* and 30 req/min/IP for /auth/*; tune via runtime.rateLimit or set runtime.rateLimit.enabled: false and front the runtime with a WAF / API gateway. See docs/configuration.md.
Keep dependency automation on. CI runs lint/typecheck/test/build, coverage, secret scanning, Docker smoke, Playwright, Python/Java scaffold tests, and Dependabot updates. Code scanning runs via .github/workflows/codeql.yml (CodeQL advanced setup) across the TypeScript, Python, and Java runtimes.

Browser posture

The bundled runtime emits security headers for the SPA and API docs: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy, and Cross-Origin-Opener-Policy. If a reverse proxy injects its own headers, make sure it preserves or tightens those defaults rather than stripping them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production checklist

Required before exposure

Persistence

Operational hardening

Browser posture

FilesExpand file tree

production.md

Latest commit

History

production.md

File metadata and controls

Production checklist

Required before exposure

Persistence

Operational hardening

Browser posture