| title | Audit Logging |
|---|---|
| description | Structured NDJSON audit logging for runtime security events. |
| order | 6 |
All runtime security events are emitted as structured NDJSON to stderr with correlation IDs for end-to-end tracing.
| Event | Description |
|---|---|
session_start |
New task session begins |
session_end |
Task session completes (with final state) |
tool_exec |
Tool execution start/end (with tool name) |
egress_allowed |
Outbound request allowed (with domain, mode) |
egress_blocked |
Outbound request blocked (with domain, mode) |
llm_call |
LLM API call completed (with input_tokens, output_tokens, model, provider, duration_ms, request_id). See Token usage and duration. |
llm_call_cancelled |
Streaming LLM call cancelled mid-flight; carries partial token counts captured up to cancellation. |
invocation_complete |
A2A invocation finished (auth → dispatch → engine → response). Carries duration_ms (wall-clock) plus aggregated input_tokens_total / output_tokens_total / llm_call_count / model / provider. |
invocation_cancelled |
A2A invocation cancelled mid-flight via tasks/cancel (or internal cancellation like parent ctx deadline). Carries fields.reason (one of workflow_failure / cost_limit_exceeded / timeout / external_signal), duration_ms up to cancellation, and any partial token totals consumed before the signal. See Cancellation. |
guardrail_check |
Guardrail mask / block / warn decision. Carries fields.gate (input / context / tool_call / output / stream — sourced from the library Result.Gate), fields.decision (masked / warned / blocked), fields.guardrail + fields.category from the triggering violation, and fields.violation_count. fields.tool is present on tool_call and on output events for tool return text. With FORGE_GUARDRAIL_CAPTURE_EVIDENCE=true operators also opt into fields.evidence carrying the redacted + truncated triggering text. See Guardrails — Audit Events. |
auth_verify |
Inbound request authenticated successfully (with provider, user_id, org_id, token_kind) |
auth_fail |
Inbound request rejected (with reason, token_kind) |
agent_card_published |
Agent Card finalized at startup or hot-reload (with name, version, protocol_version, url, skill_count, capabilities, security_schemes, card_size_bytes, card_sha256). See Agent Card reference. |
policy_loaded |
One per non-empty policy layer at startup (system / user / workspace). Carries fields.layer, source (file path), deny-list size counts, and max bounds. See Platform Policy. |
policy_violation_at_build_time |
One per violation when forge.yaml conflicts with any policy layer. Agent refuses to start. Carries fields.violation_kind / offending_value / forge_yaml_field plus layer + source identifying the enforcing file. See Platform Policy. |
channel_denied_by_policy |
One per channel adapter skipped at startup because a policy layer's denied_channels list names it. Non-fatal; the agent runs with the remaining channels. Carries fields.channel, layer (system / user / workspace), and source (file path). See Platform Policy — Channels. |
audit_export_status |
One event every 60s when an export sink is configured. Carries fields.sinks[], one entry per registered sink with name, writes_ok, drops_timeout, drops_dial, connected. Operators tail the audit stream to confirm export health. See Audit Event Export (FWS-7). |
{"ts":"2026-02-28T10:00:00Z","event":"session_start","correlation_id":"a1b2c3d4","task_id":"task-1"}
{"ts":"2026-02-28T10:00:01Z","event":"tool_exec","correlation_id":"a1b2c3d4","fields":{"tool":"tavily_research","phase":"start"}}
{"ts":"2026-02-28T10:00:01Z","event":"egress_allowed","correlation_id":"a1b2c3d4","fields":{"domain":"api.tavily.com","mode":"allowlist","source":"proxy"}}
{"ts":"2026-02-28T10:00:05Z","event":"tool_exec","correlation_id":"a1b2c3d4","fields":{"tool":"tavily_research","phase":"end"}}
{"ts":"2026-02-28T10:00:06Z","event":"session_end","correlation_id":"a1b2c3d4","fields":{"state":"completed"}}The source field distinguishes in-process enforcer events from subprocess proxy events.
When the inbound A2A request carries the orchestrator's correlation headers (X-Workflow-ID, X-Workflow-Stage-ID, X-Workflow-Step-ID, X-Invocation-Caller), every audit event emitted during that invocation is tagged with the matching workflow_id / stage_id / step_id / invocation_caller fields. Header names are vendor-neutral so any A2A-compatible orchestrator can populate them. Direct A2A invocations (no orchestrator) omit the fields entirely — emitted JSON is byte-identical to the pre-correlation shape. See Workflow correlation IDs for the full reference, including outbound propagation for agent-to-agent flows.
For deployments where one or more agents serve multiple orgs or workspaces, every audit event can be stamped with org_id and workspace_id top-level fields so downstream consumers can filter by tenancy without joining against auth_verify. Two layers, highest precedence first:
| Layer | Source | When it wins |
|---|---|---|
| Per-request override | X-Forge-Org-ID / X-Forge-Workspace-ID request headers |
Always — when present, override the static stamp |
| Deployment-time stamp | FORGE_ORG_ID / FORGE_WORKSPACE_ID env vars |
When the request carries no override headers |
The deployment-time stamp is read once at agent startup and applied via AuditLogger.WithTenancy(...). It covers every emitted event — startup banners (agent_card_published, policy_loaded, audit_export_status) AND per-invocation events (session_start, llm_call, guardrail_check, invocation_complete, etc.). The per-request override only kicks in inside the request scope; startup banners always reflect the env stamp.
# Initializ platform deployment manifest — static-tenancy case
env:
- name: FORGE_ORG_ID
value: "org_abc123"
- name: FORGE_WORKSPACE_ID
value: "ws_xyz789"# Multi-tenant routing case — the orchestrator picks per request
curl -X POST https://agent.example.com/ \
-H 'X-Forge-Org-ID: org_def456' \
-H 'X-Forge-Workspace-ID: ws_pqr012' \
...Both fields use omitempty. Deployments that set neither env nor header keep emitting the pre-tenancy JSON shape verbatim — no schema version bump.
The top-level org_id is distinct from auth_verify.fields.org_id, which carries whatever the inbound auth token claimed (provider-derived). The top-level value is the operator's declared tenancy, trusted because the deployment / orchestrator set it. Both can be present on the same auth_verify event when they're different identifiers (e.g., the token came from a federated identity but the agent is deployed into a specific workspace).
Every audit event also carries the entity identifier the event came from:
| Layer | Source |
|---|---|
| Per-event explicit | AuditEvent.EntityID / AuditEvent.EntityType |
| Deployment-time stamp | FORGE_AGENT_ID env → forge.yaml agent_id → entity_id; entity_type hardcoded to "agent" |
env:
- name: FORGE_AGENT_ID
value: "aibuilderdemo" # or just set forge.yaml agent_idEmits land as:
{
"ts": "...",
"event": "session_start",
"entity_id": "aibuilderdemo",
"entity_type": "agent",
...
}1:1 join with the guardrails library's MongoDB audit. When FORGE_GUARDRAILS_DB is set, the library writes its own audit records into a GuardrailAuditEvent collection in MongoDB carrying the same entity_id + entity_type columns. The values are sourced from the same env vars / forge.yaml so consumers reading both streams can join forge.entity_id == library.entity_id AND forge.entity_type == library.entity_type without translation. Forge only runs entity_type: "agent" today; the library supports agent / workflow / assistant as future-compatible values.
Entity identity has no per-request override — agent identity is fixed at process startup. The tenancy layer above (org_id / workspace_id) covers the multi-tenant routing case.
See Tenancy stamping reference for the precedence rules and the agent-to-agent propagation helper.
Every llm_call audit event carries the normalized token counts the provider returned in its response metadata, plus the wall-clock time spent in the provider call. Field naming aligns with OTel GenAI semantic conventions (gen_ai.usage.input_tokens / gen_ai.usage.output_tokens) so audit consumers can correlate Forge audit events with OTel traces without a translation table.
{
"ts": "2026-06-04T15:21:09Z",
"event": "llm_call",
"correlation_id": "9b3d…",
"task_id": "task-42",
"model": "claude-sonnet-4-6",
"provider": "anthropic",
"input_tokens": 1240,
"output_tokens": 387,
"duration_ms": 2150,
"request_id": "msg_01H8…"
}| Field | Source | Notes |
|---|---|---|
input_tokens |
Provider response usage | Maps to gen_ai.usage.input_tokens |
output_tokens |
Provider response usage | Maps to gen_ai.usage.output_tokens |
tokens_unavailable |
Audit emitter | true when both counts are zero — some self-hosted Ollama setups don't return usage; billing consumers must distinguish "not measured" from "zero tokens used" |
model |
Runtime model config | The model identifier the executor was configured with |
provider |
Runtime model config | One of anthropic, openai, ollama, custom |
duration_ms |
Captured at call site | Wall-clock time spent in client.Chat, in milliseconds |
request_id |
Provider response | Opaque provider call ID (Anthropic id, OpenAI id) — debug-correlation handle only, never used for billing |
Each tool_exec event (phase=end) carries duration_ms for the tool execution plus structured arg-shape metadata (args_size, result_size) — raw arg values are deliberately not included (payload stripping is FWS-8's concern). One invocation_complete event closes each A2A invocation with the total wall-clock duration and aggregated token totals across all LLM calls in the invocation.
Workflow correlation fields (workflow_id / stage_id / step_id / invocation_caller from FWS-2) also auto-tag every llm_call / tool_exec / invocation_complete event when the inbound request carried orchestrator headers — billing and audit consumers can attribute cost not just to a task but to a specific workflow run / stage / step.
A2A response headers carry the same per-invocation totals inline so an orchestrator can ceiling-check cost during parallel workflow execution without subscribing to the audit stream:
| Header | Value |
|---|---|
X-Forge-Tokens-In |
Sum of input_tokens across all LLM calls in the invocation |
X-Forge-Tokens-Out |
Sum of output_tokens across all LLM calls in the invocation |
X-Forge-Duration-Ms |
Wall-clock invocation duration (auth → dispatch → engine → response) |
X-Forge-Model |
Most-recently-used model |
X-Forge-Provider |
Most-recently-used provider |
Headers populate regardless of whether OTel tracing is enabled — they're the orchestration channel, not the observability channel.
Cost calculation is deliberately not in Forge. Forge emits token counts; the platform applies price tables to compute dollar amounts. Price tables change frequently and shouldn't require agent redeploys.
Forge accepts mid-invocation cancellation via the A2A tasks/cancel JSON-RPC method. The handler looks up the in-flight invocation in a per-Runner cancellation registry, fires a typed cancel-cause through the executor's context.Context, and the loop honors it at the next iteration boundary or between tool calls (whichever comes first). The current LLM call honors cancellation natively — http.Client.Do aborts the request on ctx.Done().
Cancellation latency is bounded by the time for the current LLM call or tool call to finish (typically seconds, not minutes). Go's runtime does not support force-terminating a goroutine, so "hard-cancel" semantically means "honor the signal at the next safe checkpoint." The orchestrator-side cancel + give-up-wait-after-N-seconds pattern is an A2A-client concern, not a Forge concern.
{
"ts": "2026-06-04T15:23:47Z",
"event": "invocation_cancelled",
"correlation_id": "9b3d…",
"task_id": "task-42",
"duration_ms": 1820,
"fields": {
"reason": "cost_limit_exceeded",
"state": "canceled",
"input_tokens_total": 940,
"output_tokens_total": 215,
"llm_call_count": 2,
"model": "claude-sonnet-4-6",
"provider": "anthropic"
}
}fields.reason |
Set by | Meaning |
|---|---|---|
workflow_failure |
Orchestrator | Sibling step in a parallel stage failed under fail_workflow; abandon work. |
cost_limit_exceeded |
Orchestrator | Workflow cumulative cost ceiling hit (typically derived from the FWS-3 X-Forge-Tokens-* headers). |
timeout |
Orchestrator / Forge | Wall-clock budget exhausted. Parent ctx context.DeadlineExceeded auto-maps to this reason. |
external_signal |
Operator / fallback | Operator-initiated stop, debugging cancel, or any cancellation without a typed reason. |
Cancel request shape:
{
"jsonrpc": "2.0",
"method": "tasks/cancel",
"params": { "id": "task-42", "reason": "cost_limit_exceeded" },
"id": "1"
}reason is optional. Unknown reason strings are accepted and forwarded to the audit event verbatim — the audit pipeline is the authority on classification.
Cancel after complete is idempotent. A cancel issued for a task that already finished (or was never started) returns the stored task state unchanged — no error. The handler refuses to flip a terminal-state task to canceled because that would corrupt audit and orchestrator state.
Partial usage is preserved. When LLM calls completed before the cancel signal, input_tokens_total / output_tokens_total / llm_call_count carry the accumulated counts so a downstream cost aggregator bills only for what was consumed. When no LLM call landed, the totals are absent and the event still carries duration_ms so wall-clock spend is visible.
Every inbound request to /tasks emits exactly one of auth_verify or auth_fail.
Successful authentication:
{
"ts":"2026-05-24T00:50:01Z",
"event":"auth_verify",
"fields":{
"method":"POST",
"path":"/tasks/send",
"provider":"aws_sigv4",
"user_id":"arn:aws:sts::412664885516:assumed-role/AWSReservedSSO_PowerUserAccess_.../Naveen",
"org_id":"412664885516",
"token_kind":"sigv4",
"groups_count":0,
"remote_addr":"[::1]:62297"
}
}user_id is the canonical identifier the verifier returned (ARN for AWS, JWT
sub for OIDC/IAP/AAD). org_id is the AWS account, Entra tenant GUID, or
OIDC tid/org_id-mapped claim depending on the provider.
Failed authentication:
{"ts":"...","event":"auth_fail","fields":{"reason":"rejected","token_kind":"sigv4","method":"POST","path":"/tasks/send","remote_addr":"[::1]:62200"}}| Reason | What it means | Operator action |
|---|---|---|
missing_token |
No auth-shaped headers at all | Caller forgot to authenticate |
not_for_me |
Bearer present but no provider claimed it | Wrong token format for the configured providers |
rejected |
Provider recognized + denied (allowlist miss, expired, bad sig, scope mismatch) | Check allowed_principals / tenant_id / token freshness |
invalid |
Token malformed (bad base64, unsupported alg, missing required field) | Token construction bug on the caller side |
provider_unavailable |
Verifier endpoint down (STS / JWKS / Graph 5xx, network error) | Provider-side incident; not a token issue |
Structural classification of what bytes were on the wire — safe to log:
| Value | Shape |
|---|---|
empty |
No token / no auth-shaped headers |
opaque |
Bearer with non-JWT, non-sigv4 shape (channel adapter loopback, custom verifier tokens) |
jwt |
Bearer with three base64url segments (oidc, azure_ad) |
sigv4 |
Bearer with forge-aws-v1. prefix (aws_sigv4 pre-signed URL token) |
iap_jwt |
X-Goog-Iap-Jwt-Assertion header present (gcp_iap) — also stamped on successful verify even if Bearer was simultaneously present |
Who called my agent in the last hour, by ARN/email?
jq -r 'select(.event=="auth_verify") | .fields.user_id' forge.log | sort | uniq -cWhy are requests failing?
jq -r 'select(.event=="auth_fail") | .fields.reason' forge.log | sort | uniq -cWhich agents called this one (in a mesh)?
jq -r 'select(.event=="auth_verify") | "\(.fields.user_id)"' forge.log | sort -uSee Authentication for the full provider chain and how each provider populates these fields.
By default, audit events go to stderr only — the long-standing NDJSON-on-stderr safety net. FWS-7 (issue #95) adds a parallel export path so an in-pod sidecar can consume audit at low latency without parsing every container-log line.
The export sink does NOT replace stderr. Both paths emit byte-identical NDJSON; the export sink is purely additive. If the export sink is down, the operator can still grep audit out of the container logs.
| Flag | Env var | Purpose | Default |
|---|---|---|---|
--audit-socket |
FORGE_AUDIT_SOCKET |
Unix Domain Socket path (preferred) | empty (no export sink) |
--audit-http-endpoint |
FORGE_AUDIT_HTTP_ENDPOINT |
localhost HTTP POST endpoint (fallback when UDS unavailable) | empty |
--audit-write-timeout |
FORGE_AUDIT_WRITE_TIMEOUT |
Per-event sink timeout (Go duration syntax: 50ms, 200ms) |
50ms |
Both forge run and forge serve start accept these flags; forge serve start forwards them to the daemon process. Env vars flow
through to the daemon via os.Environ() even without the flags. When
both --audit-socket and --audit-http-endpoint are set, the socket
wins.
- Lazy connect. The socket need not exist when the agent starts; the first emit triggers the dial. Sidecar deploys that come up after the agent will pick up future events without restarting the agent.
- Per-event timeout. Each emit at the sink gets up to
--audit-write-timeout(default 50ms) before being dropped and counted as adrops_timeout. A slow sidecar can never back-pressure the agent. - Exponential backoff between failed dials. 100ms → 200ms → 400ms → … → 5s cap. During the backoff window, writes drop without attempting a dial — so a permanently-down sidecar does not slow the emit path beyond a cheap clock check.
- No buffering on the sink. Buffering is the sidecar's job. The sink is fire-and-forget.
- No transformation. Events leaving the export sink are byte-identical to events leaving stderr.
Every 60 seconds the runtime emits one audit_export_status event
carrying per-sink counters. The event flows through the same fan-out
so operators tail the audit stream itself to confirm export health.
{
"ts": "2026-06-06T18:30:00Z",
"event": "audit_export_status",
"fields": {
"sinks": [
{"name": "stderr", "writes_ok": 4137, "drops_timeout": 0, "drops_dial": 0, "connected": 0},
{"name": "unix-socket", "writes_ok": 4135, "drops_timeout": 0, "drops_dial": 2, "connected": 1}
]
}
}| Counter | Meaning |
|---|---|
writes_ok |
Events successfully delivered to this sink |
drops_timeout |
Events dropped because the per-event Write missed its deadline (slow / unresponsive peer) |
drops_dial |
Events dropped because the connection was down (sidecar offline or in backoff window) |
connected |
1 when a working connection is held, 0 otherwise. Sticky 0 for fire-and-forget sinks (writerSink) |
Audit cannot be sampled (every policy decision and cost-relevant event must land). OTel traces can be sampled. Audit needs separate retention from observability. Failure-domain isolation: if OTel export breaks, audit must continue, and vice versa.
The two pipelines share signal sources in Forge — when something
interesting happens, instrumentation emits to OTel and to audit at
the same call site. They are deliberately not coupled at the export
level (one breaking does not break the other), but as of OTel v1
(#108 / Phase 4) every audit event emitted from a request-scoped
context carries the active span's trace_id + span_id. See
trace cross-link below for the
join-key semantics.
When OpenTelemetry tracing is enabled (see
Observability — Tracing),
EmitFromContext automatically stamps the active span's trace_id
and span_id on every audit event. Operators paste either value
directly into a trace backend's search box to pivot between the two
streams:
| Pivot direction | How |
|---|---|
| audit row → trace | Paste the row's trace_id into Tempo / Jaeger / Honeycomb to land on the matching trace. Paste the span_id to jump directly to the span (an llm_call row's span_id resolves to the llm.completion span carrying matching gen_ai.usage.* tokens). |
| trace → audit row | Copy trace_id from a trace browser; grep the audit log for the corresponding row to get the FWS-8 payload metadata the trace does not carry. |
Format: lowercase hex matching W3C traceparent semantics — 32-char
(128-bit) trace_id, 16-char (64-bit) span_id.
Backward compatibility: both fields use omitempty. When tracing
is off (default), audit JSON is byte-identical to the pre-Phase-4
shape — no trace_id / span_id keys appear. The
AuditSchemaVersion is NOT bumped: adding optional fields is a
schema-compatible change per the policy above.
When observability.tracing.capture_content: true is set, prompt /
completion / tool-args / tool-result content appears on both the
linked OTel span and the FWS-8 audit row for the same logical event.
The two pipelines run the captured content through the same redact-
then-truncate helper (runtime.PrepareSpanContent /
runtime.TruncateForAudit) so:
- The redaction marker is identical (
[REDACTED]) — operators grepping either sink for vendor secret-token shapes see the same match. - The truncation marker is byte-identical (
…[truncated:N]whereNis the original byte length of the input). Grepping[truncated:across audit rows and span attributes returns aligned, comparable results. - The redact patterns mirror the runtime guardrails CustomRules defaults (Anthropic / OpenAI / GitHub / AWS / Slack / private key blocks / Telegram bot tokens). Adding a new vendor pattern to one pipeline implies adding it to the other.
The audit pipeline's byte cap (16 KiB per field, see
AuditPayloadCapture.Cap*Bytes) is intentionally larger than the
span cap (4 KiB — below the soft attribute-length limit most
observability backends apply). The two caps are independent: a single
event may be truncated on the span side and survive intact on the
audit side. The trailing marker shape is the same either way.
See Observability — Span content capture for the span-side attribute keys and opt-in switches.
forge run / forge serve use the OS streams as a stream-level
audit-vs-ops split, so container log collectors and SIEM pipelines
can route the two concerns separately without parsing any payload:
| Stream | Carries | Consumer |
|---|---|---|
| stdout | Ops logs — startup banner, request lines, runtime errors emitted via the structured JSONLogger (r.logger.Info/Warn/Error). |
Container log collector / local debugging. |
| stderr | Audit NDJSON — every event constant defined in the table above. |
SIEM pipeline today. After FWS-7, also lands on the dedicated UDS / HTTP sink in parallel (stderr stays as the safety-net fallback). |
| UDS / HTTP sink (FWS-7) | Audit NDJSON (primary, when configured). | initializ platform sidecar / customer SIEM. |
Migration note: pre-FWS-9, ops logs and audit both went to stderr —
SIEM rules had to filter by the presence of the event JSON field.
After FWS-9, the split is clean. Operators who used to redirect
forge run 2> ops.log for ops capture must switch to
forge run > ops.log (and 2> audit.log for audit). Container
deployments that capture both streams via the runtime's standard
log collector are unaffected.
Interactive CLI commands (forge init, forge build, forge channel)
keep writing warnings and errors to stderr — those are user-facing UX
messages, not server ops logs, and the stream-split policy doesn't
apply to them.
The audit event schema is a stable, versioned contract. Consumers (the initializ platform, custom SIEM pipelines, cost-attribution dashboards) depend on field names and types. Forge treats the schema as an external interface: backward-compatible additions do not bump the version; removals or semantic changes do.
Every emitted event carries:
| Field | Type | Always present? | Notes |
|---|---|---|---|
ts |
string (RFC3339) | yes | Emission timestamp in UTC |
event |
string | yes | Event-type constant — see "Event Types" above |
schema_version |
string | yes | Current contract version. "1.0" as of FWS-8. |
seq |
int64 | per-invocation only | Monotonic per-invocation counter. Absent on startup events (policy_loaded, agent_card_published, audit_export_status). |
correlation_id |
string | request-scoped only | Per-invocation ID; groups all events for one A2A invocation |
task_id |
string | request-scoped only | A2A task identifier (params.id on tasks/send) |
workflow_id / stage_id / step_id / invocation_caller |
string | optional | Populated when the request carried X-Workflow-* headers (FWS-2) |
model / provider |
string | optional | LLM call attribution (FWS-3) |
input_tokens / output_tokens / tokens_unavailable |
int / bool | optional | LLM call usage (FWS-3) |
duration_ms |
int64 | optional | Wall-clock duration (FWS-3) |
request_id |
string | optional | Provider-specific call identifier (FWS-3) |
trace_id / span_id |
string | tracing-on only | W3C-format lowercase hex (32/16 chars) of the OTel span active at emit time. Pivots audit row ↔ trace tree. See trace cross-link. |
fields |
map | optional | Per-event structured metadata (see each event type) |
Every audit event emitted on behalf of an A2A invocation carries a
monotonically increasing seq field. Sequences start at 1 for the
first event of an invocation and advance by 1 per emit. Consumers
detect gaps (lost events) and reordering (export-side races) by
inspecting seq within a (correlation_id, task_id) group.
Sequences are scoped to a single invocation — different invocations
start their own counters. Events emitted outside any invocation scope
(policy_loaded, agent_card_published, audit_export_status) omit
seq entirely.
The per-invocation SequenceCounter is installed on r.Context() by
installSequenceCounterMiddleware, which wraps the auth middleware so
the counter is already on context before the auth chain runs. This
puts auth_verify / auth_fail first in the sequence (seq=1) and
keeps the rest of the per-invocation events (session_start,
guardrail_check, llm_call, tool_exec, invocation_complete,
etc.) gap-free under the same (correlation_id, task_id) group. The
runner's request entry calls coreruntime.EnsureSequenceCounter —
which reuses the wrapper-installed counter when present and installs a
fresh one on the --no-auth path, so no embedder configuration loses
seq stamping. Pinned by TestAuthAudit_SeqStampedWhenCounterInstalled
and TestEnsureSequenceCounter_ReusesExisting (issue #174).
The seq counter is picked up by AuditLogger.EmitFromContext(ctx, ...)
(and the typed helpers built on top of it — EmitLLMCall,
EmitToolExec, EmitInvocationComplete, EmitInvocationCancelled,
the egress and guardrail emit paths). Plain AuditLogger.Emit skips
the counter and the trace cross-link — so every audit emission that
happens inside an invocation scope MUST go through EmitFromContext.
This was the regression behind issues #173 (three sites — the
BeforeToolExec / AfterToolExec hook callbacks and the
outbound-guardrail-failure session_end emit — had drifted to plain
Emit and lost seq on tool_exec + that branch's session_end) and
#174 (the auth callback couldn't use EmitFromContext until the
counter was installed upstream of the auth middleware). Pinned by
TestToolExecAudit_CarriesSequenceFromContext. Sites that still call
plain Emit are explicitly outside any invocation scope and are
documented inline:
| Site | Why plain Emit |
|---|---|
Egress proxy OnAttempt with source=proxy |
Subprocess HTTP CONNECT has no Go ctx tying back to the A2A request |
MCP server startup events (mcp_server_started / _failed / _degraded) |
Pre-invocation; no scope |
Scheduler tick (schedule_fire / schedule_complete / schedule_skip / schedule_modify) |
Runs on its own timer outside any A2A request |
Startup banners (policy_loaded, agent_card_published, audit_export_status) |
Pre-invocation; no scope |
Issue #175 tracks a follow-up vet/lint pass to catch future
Emit-instead-of-EmitFromContext drift on per-invocation events.
| Change | Bumps version? |
|---|---|
Add a new optional field with omitempty |
No |
| Add a new event type constant | No |
Add a new fields[] key inside an existing event |
No |
| Rename a field, drop a field, or change a field's type | Yes (major bump) |
| Change the semantic meaning of an existing field value | Yes (major bump) |
Consumers that don't recognize a schema_version should keep
processing — the schema is additive-by-default.
By default, audit events are metadata only — token counts, sizes, durations, tool names, provider attribution. No prompt text, no completion text, no raw tool arguments, no raw tool results. This is the baseline contract every operator can rely on regardless of configuration.
Customers who need raw payloads in audit (debugging incidents,
supervised-learning corpora, compliance replay) opt in field by field
via AuditPayloadCapture on the runner config:
RunnerConfig{
AuditPayloadCapture: coreruntime.AuditPayloadCapture{
LLMMessages: true, // prompt messages in llm_call
LLMResponse: true, // completion text in llm_call
ToolArgs: true, // raw tool input in tool_exec
ToolResult: true, // raw tool output in tool_exec
// Per-field byte caps; 0 = use DefaultPayloadCaptureCapBytes (16 KiB)
CapLLMMessagesBytes: 32 << 10,
CapToolResultBytes: 64 << 10,
},
}Captured strings are truncated to a per-field byte cap (default 16 KiB)
with a …[truncated:N] marker so a runaway prompt or gigabyte tool
output can't bloat one audit event. The cap is enforced by
coreruntime.TruncateForAudit.
Security note. Once any capture flag is enabled, the audit transport (FWS-7 sink or stderr safety net) lands captured payloads verbatim — including any PII or secret that flowed through the prompt or completion. Operators are responsible for routing the audit stream to a store appropriate to the captured payloads' sensitivity. Forge does not redact captured content.
| Flag | Adds to event | Adds field |
|---|---|---|
LLMMessages |
llm_call / llm_call_cancelled |
prompt_messages (JSON-encoded []ChatMessage), prompt_messages_count |
LLMResponse |
llm_call |
completion_text (Response.Message.Content) |
ToolArgs |
tool_exec (start hook) |
args (raw ToolInput) |
ToolResult |
tool_exec (end hook) |
result (raw ToolOutput) |
The default size-only fields (args_size, result_size,
prompt_messages_count) always land regardless of capture
configuration so consumers can size-check even without raw bodies.
- Audit event signing. The issue's architectural recommendation was to defer signing until a customer specifically asks (complexity around key management, rotation, and customer-side verification). Sequence numbers cover gap detection in the meantime. Tracked as a follow-up.
- Per-agent capture flags in
forge.yaml. Capture is set viaRunnerConfigprogrammatically today. A YAML surface can be added if customers ask; the runtime semantics are already in place.