|
| 1 | +# ADR 0032: Run Event Taxonomy and Event Spine |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Accepted — owner-approved 2026-06-13 (unblocks M1: AuditLogger as consumer) |
| 6 | + |
| 7 | +## Date |
| 8 | + |
| 9 | +2026-06-13 |
| 10 | + |
| 11 | +## Context |
| 12 | + |
| 13 | +Three parallel half-systems currently handle run-lifecycle events, making it difficult to reason about governance, audit, and receipts as a unified concern: |
| 14 | + |
| 15 | +1. **Audit strings** (`audit.record('run_started', ...)` etc.) — scattered call sites, implicit taxonomy of event names, consumed by receipts and evidence. |
| 16 | +2. **HookRegistry** (teaagent/hooks.py) — Claude-Code-compatible hook events (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop, SubagentStop, SessionEnd), wired only at the tool boundary, carries veto semantics via HookError. |
| 17 | +3. **ContextBus** (teaagent/context_bus.py) — separate event mechanism for deltas. |
| 18 | + |
| 19 | +Meanwhile, every governance gate (approval, budget, plan, tool policy) is inlined in AgentRunner (runner/_core.py), which creates a gravity well and makes testing governance independently difficult. The control-loop ownership map (docs/architecture/control-loop-ownership-map-2026-06-11.md) identifies this as a core architectural pain point. |
| 20 | + |
| 21 | +## Decision |
| 22 | + |
| 23 | +Introduce a typed **run-lifecycle event spine** with explicit event taxonomy and two subscriber classes: |
| 24 | + |
| 25 | +### 1. RunEvent Type System |
| 26 | + |
| 27 | +Define a `RunEventType(str, Enum)` whose members are seeded from: |
| 28 | +- The union of existing audit event names (run_started, iteration_started, tool_call_completed, tool_call_failed, context_compacted, validation_started) |
| 29 | +- The run-lifecycle taxonomy from harness-first-direction §6.3 (plan_resolved, decision_received, tool_call_requested, budget_checkpoint, context_compacted, iteration_completed, final_validation, run_completed, run_failed, run_pending_approval, run_cancelled, receipt_emitted, session_start, session_end, etc.) |
| 30 | + |
| 31 | +Minimal M0 set for this spike: |
| 32 | +- `RUN_STARTED` — run begins |
| 33 | +- `ITERATION_STARTED` — iteration loop begins |
| 34 | +- `TOOL_CALL_REQUESTED` — tool call requested (before gates) |
| 35 | +- `TOOL_CALL_COMPLETED` — tool call succeeded |
| 36 | +- `TOOL_CALL_FAILED` — tool call errored |
| 37 | +- `RUN_COMPLETED` — run ends successfully |
| 38 | +- `RUN_FAILED` — run ends in failure |
| 39 | + |
| 40 | +(Extendable; the full taxonomy is defined in this ADR and documented in code comments.) |
| 41 | + |
| 42 | +### 2. Event Spine Architecture |
| 43 | + |
| 44 | +**RunEvent dataclass** (frozen, immutable): |
| 45 | +``` |
| 46 | +type: RunEventType |
| 47 | +run_id: str |
| 48 | +payload: Mapping[str, Any] # typed payload; structure per event type |
| 49 | +seq: int # monotonic sequence number per spine instance |
| 50 | +``` |
| 51 | + |
| 52 | +**EventSpine class** (sync-first, in-process, deterministic): |
| 53 | +``` |
| 54 | +register_interceptor(fn, *, name: str) -> None |
| 55 | + # Callable[[RunEvent], None]; may raise to veto |
| 56 | + # Interceptors run in registration order before consumers |
| 57 | + # Exceptions propagate (veto semantics) |
| 58 | +
|
| 59 | +register_consumer(fn, *, name: str) -> None |
| 60 | + # Callable[[RunEvent], None]; never veto |
| 61 | + # Consumers run after interceptors |
| 62 | + # Exceptions are caught, logged, and isolated (never affect run) |
| 63 | +
|
| 64 | +emit(event: RunEvent) -> None |
| 65 | + # Fire an event: run interceptors in order, then consumers |
| 66 | + # If any interceptor raises, propagate immediately (no further subscribers run) |
| 67 | + # If any consumer raises, log and continue |
| 68 | + # Return normally on success or after isolated consumer failure |
| 69 | +``` |
| 70 | + |
| 71 | +### 3. Subscriber Semantics |
| 72 | + |
| 73 | +**Interceptors:** |
| 74 | +- Represent governance gates (plan validation, approval, budget, policy) |
| 75 | +- Run in declared order before any consumer sees the event |
| 76 | +- May raise any exception (converted to DenialReasonCode if ToolPermissionError or similar) |
| 77 | +- Exception from interceptor halts the spine (veto) |
| 78 | +- Used to enforce hard constraints |
| 79 | + |
| 80 | +**Consumers:** |
| 81 | +- Represent audit, receipt building, evidence, ContextBus, webhook sinks |
| 82 | +- Run after all interceptors complete |
| 83 | +- Each wrapped in try/except (exception logged via logging module, never propagates) |
| 84 | +- Never affect the run (crash-safe) |
| 85 | +- Used for side effects and derived state |
| 86 | + |
| 87 | +### 4. HookRegistry Alignment |
| 88 | + |
| 89 | +Existing Claude-Code hook names (SessionStart, PreToolUse, PostToolUse, etc.) are preserved as **aliases** to RunEventType members where semantically equivalent (e.g., PRE_TOOL_USE ← PreToolUse). The public hook API (teaagent/hooks.py) will be re-homed onto the spine in a later migration step (M5). |
| 90 | + |
| 91 | +### 5. Compliance with ADR 0030 |
| 92 | + |
| 93 | +New code lives inside the existing `teaagent/runner/` package (teaagent/runner/_events.py) — no new root module. The module freeze is respected. |
| 94 | + |
| 95 | +## Rationale |
| 96 | + |
| 97 | +- **Single contract**: One typed enum replaces three implicit taxonomies; claim-testable, refactorable, and extensible. |
| 98 | +- **Determinism**: Sync-first, in-process, no threads — deterministic for tests, safe for receipts. |
| 99 | +- **Veto clarity**: Interceptor ordering and exception semantics are explicit, enabling governance gates to be extracted without rewriting the runner. |
| 100 | +- **Gradual migration**: Dual-write (M0) allows the old audit.record() paths to coexist with new events, so the migration is strangler-safe. |
| 101 | +- **Test leverage**: Lifecycle tests can assert event sequences instead of implementation internals, decoupling tests from runner refactors. |
| 102 | + |
| 103 | +## Implementation |
| 104 | + |
| 105 | +### Phase M0 (this ADR, this spike) |
| 106 | + |
| 107 | +1. Define `RunEventType(str, Enum)` and `RunEvent` dataclass in teaagent/runner/_events.py. |
| 108 | +2. Define `EventSpine` class with register_interceptor, register_consumer, emit semantics. |
| 109 | +3. Add optional `event_spine: EventSpine | None` parameter to AgentRunner (default: fresh spine, no subscribers). |
| 110 | +4. At existing audit.record call sites, **dual-write**: emit corresponding RunEvent (audit calls unchanged). |
| 111 | +5. Lifecycle tests assert the event sequence for the five-minute-proof scenario. |
| 112 | +6. Acceptance tier stays green. |
| 113 | + |
| 114 | +### Future Phases (M1–M6) |
| 115 | + |
| 116 | +| Step | Change | Invariant | |
| 117 | +| --- | --- | --- | |
| 118 | +| M1 | AuditLogger becomes a consumer (serializes RunEvents to JSONL) | Byte-equivalent audit on proof scenario | |
| 119 | +| M2 | Receipts/evidence fold over event stream | Receipt completeness guaranteed structurally | |
| 120 | +| M3 | Plan gate moves to interceptor | Same denials, same reason codes | |
| 121 | +| M4 | Approval and budget gates to interceptors | Same semantics, extracted from runner | |
| 122 | +| M5 | HookRegistry re-homed onto spine; public hook API documented | Existing hook tests pass via aliases | |
| 123 | +| M6 | ContextBus + webhook sinks consume spine; inline emission paths deleted | No orphaned eventing modules | |
| 124 | + |
| 125 | +## Consequences |
| 126 | + |
| 127 | +**Positive:** |
| 128 | +- Unified event contract enables incremental gate extraction without rewriting AgentRunner. |
| 129 | +- Governance gates become testable independently via lifecycle assertions. |
| 130 | +- Receipts/audit can be derived from a single immutable event stream (M2+), eliminating synthetic-vs-real gaps. |
| 131 | +- Hook ordering and error semantics are explicit and stable for the public API. |
| 132 | + |
| 133 | +**Negative:** |
| 134 | +- M0 dual-write adds ~5 lines per call site (acceptable; temporary until M1). |
| 135 | +- EventSpine is new infrastructure; must be proven correct before gates migrate to interceptors. |
| 136 | +- Full governance-gate extraction (M3–M4) is multi-phase and requires consecutive landing without behavioral changes (per stop-rule in strategy doc §6.4). |
| 137 | + |
| 138 | +## Alternatives Considered |
| 139 | + |
| 140 | +1. **Extend HookRegistry instead of creating EventSpine**: HookRegistry is Claude-Code-specific and tool-boundary-scoped; the spine covers the full run lifecycle and cannot be scoped to tools. Separate design avoids conflating concerns. |
| 141 | + |
| 142 | +2. **Async event sink**: Async sinks (queue-based consumers) would enable webhook delivery and distributed audit. Rejected at M0 for determinism: tests must not depend on timing. Async can be added at M2+ if friction evidence justifies it. |
| 143 | + |
| 144 | +3. **Fold events into context/observations**: Events would become observation slots instead of a separate spine. Rejected: observations are model-visible; governance events must be opaque to the model and ordered by the harness. |
| 145 | + |
| 146 | +## References |
| 147 | + |
| 148 | +- [Harness-First Direction §6](../strategy/harness-first-direction-2026-06-13.md#6-core-architecture-one-event-spine-gates-as-interceptors) |
| 149 | +- [Control-Loop Ownership Map §6.1](../architecture/control-loop-ownership-map-2026-06-11.md) |
| 150 | +- [ADR 0030: Root-Module Freeze](0030-root-module-freeze.md) |
| 151 | +- [ADR 0009: 5-Loop Governance System](0009-five-loop-governance.md) |
| 152 | + |
| 153 | +## Full Event Taxonomy (M0 + Planned) |
| 154 | + |
| 155 | +``` |
| 156 | +RUN_STARTED # Run begins; payload: run_id, task, model, etc. |
| 157 | +SESSION_START # Session begins (alias: SessionStart) |
| 158 | +PLAN_RESOLVED # Plan loaded/validated |
| 159 | +ITERATION_STARTED # Iteration loop begins |
| 160 | +DECISION_RECEIVED # Model returns a decision (tool call or final answer) |
| 161 | +TOOL_CALL_REQUESTED # Tool call identified (before gates) |
| 162 | +TOOL_CALL_APPROVED # Approval gate approved |
| 163 | +TOOL_CALL_DENIED # Approval gate denied |
| 164 | +TOOL_CALL_COMPLETED # Tool call succeeded |
| 165 | +TOOL_CALL_FAILED # Tool call errored |
| 166 | +CONTEXT_COMPACTED # Context compaction occurred |
| 167 | +BUDGET_CHECKPOINT # Budget check (not veto; informational) |
| 168 | +ITERATION_COMPLETED # Iteration loop ends |
| 169 | +FINAL_VALIDATION # Final answer validation |
| 170 | +RUN_COMPLETED # Run ends successfully |
| 171 | +RUN_FAILED # Run ends in failure |
| 172 | +RUN_PENDING_APPROVAL # Run paused for approval |
| 173 | +RUN_CANCELLED # Run cancelled by user |
| 174 | +RECEIPT_EMITTED # Receipt finalized |
| 175 | +SESSION_END # Session ends (alias: SessionEnd) |
| 176 | +SKILL_LOAD # Skill loaded |
| 177 | +MODEL_ROUTE # Model routed (provider selection) |
| 178 | +GIT_SANDBOX_STARTED # Sandbox workspace initialized |
| 179 | +GIT_SANDBOX_RESOLVED # Sandbox resolved/cleaned |
| 180 | +UNDO_PERFORMED # Undo action executed |
| 181 | +PRE_TOOL_USE # Hook: before tool execution (alias: PreToolUse) |
| 182 | +POST_TOOL_USE # Hook: after tool execution (alias: PostToolUse) |
| 183 | +PRE_COMPACT # Hook: before context compaction (alias: PreCompact) |
| 184 | +``` |
| 185 | + |
| 186 | +The M0 spike covers RUN_STARTED, ITERATION_STARTED, TOOL_CALL_REQUESTED, TOOL_CALL_COMPLETED, TOOL_CALL_FAILED, RUN_COMPLETED, RUN_FAILED. Extended events are added in later phases as gates migrate. |
0 commit comments