Skip to content

Latest commit

 

History

History
156 lines (140 loc) · 8.08 KB

File metadata and controls

156 lines (140 loc) · 8.08 KB

Production checklist

AI Workbench can boot with zero credentials for local development. Production deployments should tighten a few knobs before exposing the runtime beyond a trusted loopback or private admin network.

Required before exposure

  • Use the production environment guard. Set runtime.environment: production in your deploy config. The TypeScript runtime then refuses configs that leave auth disabled, allow anonymous API traffic, use the in-memory control plane, or enable browser OIDC login without a persistent session secret. Start from runtimes/typescript/examples/workbench.production.yaml.
  • Turn on auth. Use auth.mode: apiKey, oidc, or any with anonymousPolicy: reject. The runtime logs a warning when a non-memory control plane accepts anonymous API traffic.
  • Use a bootstrap token instead of opening anonymous access. Set auth.bootstrapTokenRef to a 32+ character secret ref, create the first workspace/API key with Authorization: Bearer <bootstrap>, then remove or rotate the bootstrap secret once operator access is established.
  • Use a persistent OIDC session key. Browser login deployments should set auth.oidc.client.sessionSecretRef to a 32+ byte secret. Without it, all sessions invalidate on restart and multi-replica sessions cannot decrypt consistently.
  • Serve over HTTPS. Put the runtime behind a TLS-terminating reverse proxy or ingress and set runtime.publicOrigin to the externally visible https://... origin. Only set runtime.trustProxyHeaders: true when a trusted proxy overwrites incoming X-Forwarded-* headers.
  • Keep local files out of build contexts. The repo ships a .dockerignore that excludes .env*, local state, build output, and dependency folders. Keep deployment-specific secrets outside the Docker build context as well.

Persistence

  • Use astra for multi-replica production. The file backend is single-node only. Do not point two containers at the same file root.
  • Back up file-backed state. If using controlPlane.driver: file, back up the JSON root and mount it on durable storage. The backend uses atomic rename for writes, but it is not a database and does not provide multi-process locking or point-in-time restore.
  • Enable job resume in clustered deployments. For astra, set controlPlane.jobsResume.enabled: true so another replica can claim stale async-ingest leases after a crash.

Operational hardening

  • Pin and rotate secrets. Prefer env: or file: secret refs for Astra, OIDC, session, provider, and bootstrap credentials — every credential lives behind a SecretRef, so the value stays in your secret source and never lands in the control plane. Rotate refs-with-a-server-side-cache (Astra, OIDC client + session, bootstrap, provider API keys) by updating the secret source and restarting the runtime so in-process driver and provider clients reconnect with fresh credentials; external MCP-server credentials are resolved per turn and take effect on the next agent turn without a restart. Workspace API keys (wb_live_*) rotate by revoke + re-mint with the desired role/scopes (scopes are fixed at issue time). GET /setup-status reports configuredKeys — the names of configured credentials, never their values — so you can confirm a key is present without reading it back. Full per-credential procedures live in docs/auth.md → Secret rotation.

  • Forward audit events to a durable sink. The runtime emits structured audit events for API-key issuance/revocation, workspace + KB + agent + document create/delete, MCP tool invocations, job claim, and OIDC login/refresh/logout (see docs/audit.md for the full catalog and envelope shape). Events are pino lines at info with audit: true; route them to a SIEM/file via your container log pipeline. RBAC enforcement remains on the roadmap.

  • Scrape metrics + propagate trace context. The runtime exposes a Prometheus exposition at GET /metrics (text format, no auth — same precedent as /healthz / /readyz). HTTP request counter + duration histogram are labeled by method, matched route pattern, and status family (2xx/4xx/5xx) to keep cardinality bounded. Rate-limit rejections are counted by key type, and the ingest semaphore exposes workbench_ingest_workers_{active,queued} gauges. Domain counters added in 0.2:

    Metric Labels
    workbench_chat_requests_total provider, outcome (stop / length / tool_calls / error / stream_ok)
    workbench_ingest_documents_total outcome (ok / failed)
    workbench_search_requests_total mode (vector / hybrid / *_rerank), outcome (ok / error)
    workbench_search_duration_seconds mode

    A starter Grafana dashboard with rows for HTTP, chat, ingest, and search lives at docs/observability/grafana-workbench.json — import via Dashboards → Import → Upload JSON, pick your Prometheus datasource at the prompt. The dashboard targets the metric names verbatim; no recording rules required.

    Two deeper-health endpoints land alongside the metrics: GET /health/details returns probe results for the control plane and chat provider ({status, detail, durationMs}), and GET /health/recent-errors returns the last 100 error envelopes (code, status, method, route pattern, request id, timestamp — no PII). Both are unauthenticated.

    Inbound traceparent headers are honored when valid (the trace id becomes the request id) and a fresh W3C traceparent is emitted on every response so service meshes can correlate.

  • Opt-in anonymous telemetry. Off by default. Enable via WORKBENCH_TELEMETRY=1 (env wins over runtime.telemetry.enabled in YAML) and point at a sink with WORKBENCH_TELEMETRY_URL; without a URL the emitter runs in dark mode (events are constructed but never sent). The CLI has the same posture under AIW_TELEMETRY / AIW_TELEMETRY_URL. Strictly categorical fields only — see the full event catalog + no-PII guarantee in docs/telemetry.md.

  • Enable OpenTelemetry tracing when the deployment has a collector. The runtime always creates manual SERVER spans through @opentelemetry/api (no-op when no SDK is registered), so flipping tracing on is a config change, not a code change:

    runtime:
      tracing:
        enabled: true
        serviceName: ai-workbench-runtime  # optional override
        exporterUrl: https://otel-collector.example.com/v1/traces  # or use OTEL_EXPORTER_OTLP_ENDPOINT

    Standard OTEL_* env vars (OTEL_EXPORTER_OTLP_HEADERS, OTEL_TRACES_SAMPLER, …) work as documented by the SDK. For full HTTP / fetch / pino auto-instrumentation, preload the SDK at process launch:

    node --import ./dist/lib/tracing-preload.js dist/root.js

    Without --import, manual server spans cover every request but outbound HTTP / fetch / DB clients won't emit child spans.

  • Apply rate limiting in front of the runtime. The in-process limiter defaults to 600 req/min/IP for /api/v1/* and 30 req/min/IP for /auth/*; tune via runtime.rateLimit or set runtime.rateLimit.enabled: false and front the runtime with a WAF / API gateway. See docs/configuration.md.

  • Keep dependency automation on. CI runs lint/typecheck/test/build, coverage, secret scanning, Docker smoke, Playwright, Python/Java scaffold tests, and Dependabot updates. Code scanning runs via .github/workflows/codeql.yml (CodeQL advanced setup) across the TypeScript, Python, and Java runtimes.

Browser posture

The bundled runtime emits security headers for the SPA and API docs: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy, and Cross-Origin-Opener-Policy. If a reverse proxy injects its own headers, make sure it preserves or tightens those defaults rather than stripping them.