Skip to content

Latest commit

 

History

History
222 lines (189 loc) · 10.2 KB

File metadata and controls

222 lines (189 loc) · 10.2 KB

Audit logging

The TypeScript runtime emits structured audit events for the sensitive operations listed below. Events are pino log lines at info level with a stable discriminator field audit: true, so deployments can route them to a dedicated sink (file, syslog, SIEM) by filter.

{
  "level": 30,
  "time": 1735603200000,
  "audit": true,
  "action": "api_key.create",
  "outcome": "success",
  "requestId": "01JFE5...",
  "subject": {
    "type": "oidc",            // "apiKey" | "oidc" | "bootstrap" | "anonymous"
    "id": "sub-123",
    "label": "alice@example.com"
  },
  "workspaceId": "ws-1",
  "details": { "keyId": "...", "label": "ci-deployer" },
  "msg": "audit api_key.create success"
}

What gets logged

Action Trigger Notes
api_key.create POST /api/v1/workspaces/{w}/api-keys Plaintext is never logged. Only keyId + caller-supplied label.
api_key.revoke DELETE /api/v1/workspaces/{w}/api-keys/{keyId} Soft revoke; emitted on the first revoke only.
workspace.create POST /api/v1/workspaces Includes the workspace label (the human-friendly name).
workspace.delete DELETE /api/v1/workspaces/{w} Emitted after the cascade completes.
kb.create POST /api/v1/workspaces/{w}/knowledge-bases Provisions an Astra collection — destructive on rollback. Includes knowledgeBaseId + label.
kb.delete DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb} Cascades the underlying collection drop and all rag-document rows. Emitted after the cascade.
document.delete DELETE /api/v1/workspaces/{w}/knowledge-bases/{kb}/documents/{d} Cascades chunk wipe before the row drop. Includes knowledgeBaseId + documentId.
agent.create POST /api/v1/workspaces/{w}/agents or POST /api/v1/workspaces/{w}/agents/from-template Includes agentId + label. The from-template variant additionally includes templateId (the catalog slug).
agent.delete DELETE /api/v1/workspaces/{w}/agents/{a} Cascades conversations + chat messages owned by the agent. Emitted after the cascade.
job.claim Cross-replica orphan reclaim in jobs/sweeper.ts Emitted when a replica successfully CAS-claims an orphaned job. Includes jobId + jobKind. Subject is the replica id (synthetic), not a user.
mcp.invoke Any tool call into /api/v1/workspaces/{w}/mcp Includes the toolName. Argument payloads are not logged.
auth.api_denied 401/403 auth decisions on /api/v1/* outcome: "denied" with reason; unauthenticated 401s have subject: null, while scoped 403s include the resolved subject when available.
auth.bootstrap_use Any request authenticated with the bootstrap operator token Includes scheme: "bootstrap". The plaintext bootstrap token is never logged.
auth.csrf_rejected A state-changing request to a cookie-protected route was rejected by the Origin/Referer check outcome: "failure" with reason{ "no allowed origin available", "missing Origin and Referer on state-changing request", "origin mismatch (got <claimed>)" }. The HTTP response is 403 forbidden_origin. Bearer-token requests bypass the check and never emit this event.
auth.login OIDC /auth/callback outcome: "success" once the access token passes the runtime's own verifier; outcome: "failure" with reason on token-validation errors.
auth.refresh OIDC /auth/refresh outcome: "success" on a clean rotate. outcome: "failure" with reason{ "no_refresh_token", "idp_rejected", "token_validation_failed" } covers the three failure paths (missing cookie, IdP refused the refresh_token, freshly-issued access token failed self-verification).
auth.logout OIDC /auth/logout Emitted on every cookie clear, even when no session was present.
principal.create POST /api/v1/workspaces/{w}/principals RLAC prototype. Includes the principalId.
principal.update PATCH /api/v1/workspaces/{w}/principals/{principalId} RLAC prototype. Includes the principalId.
principal.delete DELETE /api/v1/workspaces/{w}/principals/{principalId} RLAC prototype. Includes the principalId.

The set is intentionally small. Adding a new event is a one-line call from a route handler — see src/lib/audit.ts.

Sample envelopes

Concrete payloads from a live runtime, lightly redacted. Field order is audit, action, outcome, requestId, subject, workspaceId, details, msg — pino emits in declaration order, which makes the envelope stable enough to grep with cut/jq.

workspace.create — anonymous in dev mode

{
  "level": 30,
  "time": 1735603195123,
  "audit": true,
  "action": "workspace.create",
  "outcome": "success",
  "requestId": "01KQG3MCDGC3VWP07BNQWX7NPB",
  "subject": {
    "type": "anonymous",
    "id": null,
    "label": null
  },
  "workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
  "details": { "label": "support-docs" },
  "msg": "audit workspace.create success"
}

subject.type: "anonymous" is normal in development (default auth.mode: disabled). In production, subject.type will be "apiKey" or "oidc" — the auth deployment guard refuses to start with anonymous access on a non-memory control plane.

auth.login — failed JWT validation

{
  "level": 30,
  "time": 1735603612877,
  "audit": true,
  "action": "auth.login",
  "outcome": "failure",
  "requestId": "01KQG3PV9ZH7T82R4KAE8WBN3X",
  "subject": {
    "type": "anonymous",
    "id": null,
    "label": null
  },
  "details": { "scheme": "oidc", "reason": "audience_mismatch" },
  "msg": "audit auth.login failure"
}

workspaceId is absent because the request never resolves a workspace before the auth middleware rejects it. details.reason is one of the verifier's terminal error codes (audience_mismatch, signature_invalid, token_expired, issuer_mismatch, malformed); see src/auth/oidc/verifier.ts.

api_key.create — authenticated by an OIDC subject

{
  "level": 30,
  "time": 1735603889104,
  "audit": true,
  "action": "api_key.create",
  "outcome": "success",
  "requestId": "01KQG3QRVF20YGD6MTFB8KKCN5",
  "subject": {
    "type": "oidc",
    "id": "auth0|7c2d4f12",
    "label": "alice@example.com"
  },
  "workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
  "details": { "keyId": "3a4977c8-3e01-4fd0-9b02-2e082950bd40", "label": "ci-deployer" },
  "msg": "audit api_key.create success"
}

The plaintext token (wb_live_…) is only in the HTTP response body, never the audit log. details.keyId is the row id; label is the operator-supplied tag.

Seed-failure events (non-route)

Workspace creation tries to seed default agents, LLM services, chunking services, and embedding services. Per-row failures emit audit: true error lines (not routed through audit() because they don't cleanly fit <resource>.<verb>):

{
  "level": 50,
  "time": 1735603195310,
  "audit": true,
  "workspaceId": "ab907991-dba4-4d9d-81f0-4756ec5ccf43",
  "serviceName": "openai-text-embedding-3-small",
  "err": { "type": "ControlPlaneConflictError", "message": "..." },
  "msg": "failed to seed default embedding service"
}

When every seed of a kind fails (systemic — DB outage, broken config), an aggregate line follows with expected: <count> so monitoring can alert on "workspace shipped with no embedders" rather than counting individual failures.

Design rules

The audit module enforces a few rules so events stay safe to ship to external systems:

  • No secret material. The details field is typed and only accepts a known set of identifier fields (keyId, knowledgeBaseId, scheme, reason, label). Plaintext tokens, refresh tokens, hashes, OAuth codes, and PII are not part of the contract and have no path into the envelope.
  • Stable action names. <resource>.<verb> in snake_case. We never rename in place — adding a new action and keeping the old one for a release is the migration path.
  • Outcome is always set. success | failure | denied so SIEM rules can alert on bursts of denied without parsing status codes.
  • Best-effort. Audit logging must never break the request path. Logger errors are swallowed inside audit().

Operating it

  • Single-replica. The default pino transport writes to stdout. Pipe the container's stdout into your log pipeline and filter on audit: true.
  • Multi-replica. Each replica writes its own events; correlate by requestId (already echoed in every audit envelope) and by the Strict-Transport-Security / replicaId markers documented in production.md.
  • Retention. The runtime does not retain audit events itself. Choose a retention period that satisfies your compliance posture and configure it on the sink.

What's not yet logged

These are tracked as gaps:

  • Rate-limit denials. They are visible from the limiter's existing log lines but are not audit events yet.
  • Document and chunk mutation. Volume-sensitive; needs a sampling / batching strategy first.

When a new event lands, add it to the What gets logged table and the AuditAction union in src/lib/audit.ts — the audit-doc-drift test will fail otherwise.

Resolved findings

The following audit gaps have been addressed in the commits listed:

Finding Status Resolution
Unbounded agent systemPrompt / userPrompt fields Resolved in #147 (8dbf741) Capped at 128 KB; name at 200 chars, description at 2 KB.
Monolithic body-size cap (50 MB) on all /api/v1/workspaces/* Resolved in #147 (8dbf741) Split: 10 MB default, 50 MB only on explicit .../knowledge-bases/*/ingest.
Sequential Promise.all on multi-KB agent tools Resolved in #147 (8dbf741) Parallelized with Promise.allSettled in chat/tools/registry.ts.
SSRF surface on service endpoint URLs (RFC1918, loopback, cloud metadata) Resolved in #147 (8dbf741) Layered validation in openapi/schemas.ts + root.ts; runtime.blockPrivateNetworkEndpoints config flag.

PRs welcome.