AI Workbench can boot with zero credentials for local development. Production deployments should tighten a few knobs before exposing the runtime beyond a trusted loopback or private admin network.
- Use the production environment guard. Set
runtime.environment: productionin your deploy config. The TypeScript runtime then refuses configs that leave auth disabled, allow anonymous API traffic, use the in-memory control plane, or enable browser OIDC login without a persistent session secret. Start fromruntimes/typescript/examples/workbench.production.yaml. - Turn on auth. Use
auth.mode: apiKey,oidc, oranywithanonymousPolicy: reject. The runtime logs a warning when a non-memory control plane accepts anonymous API traffic. - Use a bootstrap token instead of opening anonymous access. Set
auth.bootstrapTokenRefto a 32+ character secret ref, create the first workspace/API key withAuthorization: Bearer <bootstrap>, then remove or rotate the bootstrap secret once operator access is established. - Use a persistent OIDC session key. Browser login deployments
should set
auth.oidc.client.sessionSecretRefto a 32+ byte secret. Without it, all sessions invalidate on restart and multi-replica sessions cannot decrypt consistently. - Serve over HTTPS. Put the runtime behind a TLS-terminating
reverse proxy or ingress and set
runtime.publicOriginto the externally visiblehttps://...origin. Only setruntime.trustProxyHeaders: truewhen a trusted proxy overwrites incomingX-Forwarded-*headers. - Keep local files out of build contexts. The repo ships a
.dockerignorethat excludes.env*, local state, build output, and dependency folders. Keep deployment-specific secrets outside the Docker build context as well.
- Use
astrafor multi-replica production. Thefilebackend is single-node only. Do not point two containers at the same file root. - Back up file-backed state. If using
controlPlane.driver: file, back up the JSON root and mount it on durable storage. The backend uses atomic rename for writes, but it is not a database and does not provide multi-process locking or point-in-time restore. - Enable job resume in clustered deployments. For
astra, setcontrolPlane.jobsResume.enabled: trueso another replica can claim stale async-ingest leases after a crash.
-
Pin and rotate secrets. Prefer
env:orfile:secret refs for Astra, OIDC, session, provider, and bootstrap credentials — every credential lives behind aSecretRef, so the value stays in your secret source and never lands in the control plane. Rotate refs-with-a-server-side-cache (Astra, OIDC client + session, bootstrap, provider API keys) by updating the secret source and restarting the runtime so in-process driver and provider clients reconnect with fresh credentials; external MCP-server credentials are resolved per turn and take effect on the next agent turn without a restart. Workspace API keys (wb_live_*) rotate by revoke + re-mint with the desired role/scopes (scopes are fixed at issue time).GET /setup-statusreportsconfiguredKeys— the names of configured credentials, never their values — so you can confirm a key is present without reading it back. Full per-credential procedures live indocs/auth.md→ Secret rotation. -
Forward audit events to a durable sink. The runtime emits structured audit events for API-key issuance/revocation, workspace + KB + agent + document create/delete, MCP tool invocations, job claim, and OIDC login/refresh/logout (see
docs/audit.mdfor the full catalog and envelope shape). Events are pino lines atinfowithaudit: true; route them to a SIEM/file via your container log pipeline. RBAC enforcement remains on the roadmap. -
Scrape metrics + propagate trace context. The runtime exposes a Prometheus exposition at
GET /metrics(text format, no auth — same precedent as/healthz//readyz). HTTP request counter + duration histogram are labeled by method, matched route pattern, and status family (2xx/4xx/5xx) to keep cardinality bounded. Rate-limit rejections are counted by key type, and the ingest semaphore exposesworkbench_ingest_workers_{active,queued}gauges. Domain counters added in 0.2:Metric Labels workbench_chat_requests_totalprovider,outcome(stop / length / tool_calls / error / stream_ok)workbench_ingest_documents_totaloutcome(ok / failed)workbench_search_requests_totalmode(vector / hybrid / *_rerank),outcome(ok / error)workbench_search_duration_secondsmodeA starter Grafana dashboard with rows for HTTP, chat, ingest, and search lives at
docs/observability/grafana-workbench.json— import via Dashboards → Import → Upload JSON, pick your Prometheus datasource at the prompt. The dashboard targets the metric names verbatim; no recording rules required.Two deeper-health endpoints land alongside the metrics:
GET /health/detailsreturns probe results for the control plane and chat provider ({status, detail, durationMs}), andGET /health/recent-errorsreturns the last 100 error envelopes (code, status, method, route pattern, request id, timestamp — no PII). Both are unauthenticated.Inbound
traceparentheaders are honored when valid (the trace id becomes the request id) and a fresh W3Ctraceparentis emitted on every response so service meshes can correlate. -
Opt-in anonymous telemetry. Off by default. Enable via
WORKBENCH_TELEMETRY=1(env wins overruntime.telemetry.enabledin YAML) and point at a sink withWORKBENCH_TELEMETRY_URL; without a URL the emitter runs in dark mode (events are constructed but never sent). The CLI has the same posture underAIW_TELEMETRY/AIW_TELEMETRY_URL. Strictly categorical fields only — see the full event catalog + no-PII guarantee indocs/telemetry.md. -
Enable OpenTelemetry tracing when the deployment has a collector. The runtime always creates manual SERVER spans through
@opentelemetry/api(no-op when no SDK is registered), so flipping tracing on is a config change, not a code change:runtime: tracing: enabled: true serviceName: ai-workbench-runtime # optional override exporterUrl: https://otel-collector.example.com/v1/traces # or use OTEL_EXPORTER_OTLP_ENDPOINT
Standard
OTEL_*env vars (OTEL_EXPORTER_OTLP_HEADERS,OTEL_TRACES_SAMPLER, …) work as documented by the SDK. For full HTTP / fetch / pino auto-instrumentation, preload the SDK at process launch:node --import ./dist/lib/tracing-preload.js dist/root.js
Without
--import, manual server spans cover every request but outbound HTTP / fetch / DB clients won't emit child spans. -
Apply rate limiting in front of the runtime. The in-process limiter defaults to 600 req/min/IP for
/api/v1/*and 30 req/min/IP for/auth/*; tune viaruntime.rateLimitor setruntime.rateLimit.enabled: falseand front the runtime with a WAF / API gateway. Seedocs/configuration.md. -
Keep dependency automation on. CI runs lint/typecheck/test/build, coverage, secret scanning, Docker smoke, Playwright, Python/Java scaffold tests, and Dependabot updates. Code scanning runs via
.github/workflows/codeql.yml(CodeQL advanced setup) across the TypeScript, Python, and Java runtimes.
The bundled runtime emits security headers for the SPA and API docs:
Content-Security-Policy, X-Frame-Options, X-Content-Type-Options,
Referrer-Policy, Permissions-Policy, and Cross-Origin-Opener-Policy.
If a reverse proxy injects its own headers, make sure it preserves or
tightens those defaults rather than stripping them.