feat: ADR 0032 M5 — hook observability onto spine (taxonomy); execution stays in dispatch

johnteee · claude · johnteee · commit be4209a02fb2 · 2026-06-14T00:46:26.000+08:00
Assessed "HookRegistry on spine" against the code first. Finding: hooks run in
the tool-dispatch layer (teaagent/tools.py::execute), not the runner the plan
named. PreToolUse/PostToolUse MUTATE in-flight arguments/result fed to the
handler — the spine has no channel to ferry a mutated payload back, so an
interceptor cannot replace them without losing mutation (same coupling that kept
approval inline, decision B). The 6 session-lifecycle hooks have no production
caller — nothing to strangle. So the enforcement bridge (M5-T002) is unsuitable;
M5 closes as observability-only.

Suitable, additive slice DONE (M2 pattern, zero behavior change):
- 5 hook audit events typed in RunEventType + bidirectional mapper:
  tool_hook_pre_mutation / _pre_mutation_blocked / _vetoed / _post_mutation /
  _post_failed. The M2-T001 reader now surfaces hook veto/mutation activity from
  the audit JSONL for the M6 fold.
- test_m5_hook_audit_events_are_typed_and_reader_surfaced; round-trip
  completeness now covers 31 members (7 M0 + 19 M2 + 5 M5).
- Plan §5 graph + §7 M5 row revised; M5-T001 DONE-as-taxonomy, T002 CLOSED
  (unsuitable), T003 DEFERRED. New work-log with full assessment.

Audit bytes unchanged; hook execution + mutation semantics unchanged.

Constraint: mapping/reader only; hook execution stays in dispatch layer; no public hook semantics changed; mutating/dispatch-coupled mechanisms stay where they are (same logic as approval B / budget B-analog).
Tested: tests/lifecycle/test_run_event_spine.py 22 passed (incl. new hook-taxonomy test + round-trip completeness 31==31); docs inventory --check passes.
Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox); pre-commit smoke covers governance core.
Confidence: high
Roadmap-Status: unchanged
Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/docs/generated/docs-inventory.md b/docs/generated/docs-inventory.md
@@ -6,7 +6,7 @@
 Generated by `python3 scripts/generate_docs_inventory.py`.
 Do not edit this file manually — regenerate instead.
 
-**Markdown files:** 588
+**Markdown files:** 589
 
 | Path | Tier | Bytes | SHA256 (12) |
 | --- | --- | ---: | --- |
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
 | `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
 | `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
 | `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
-| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 52091 | `ce92504ad57b` |
+| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 54690 | `4a6ca4a1b9b6` |
 | `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
 | `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
 | `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |
@@ -591,6 +591,7 @@ Do not edit this file manually — regenerate instead.
 | `work-log/documentation-optimization-work-items-2026-06-04.md` | archive | 11750 | `9233b40b0bce` |
 | `work-log/m4-approval-sliceB-blocked-2026-06-13.md` | archive | 7347 | `3981ed82bc08` |
 | `work-log/m4-budget-stays-inline-2026-06-13.md` | archive | 5727 | `0e7a6ee74954` |
+| `work-log/m5-hooks-observability-only-2026-06-13.md` | archive | 5000 | `8a87eaee4d15` |
 | `work-log/operator-friction-log.md` | working | 2560 | `fe79899db10f` |
 | `work-log/p0-p1-governance-implementation-ledger-2026-06-11.md` | archive | 5212 | `0b72cd69de32` |
 | `work-log/parallel-phase-0-implementation-report-2026-06-04.md` | archive | 13181 | `098186167459` |
diff --git a/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md b/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md
@@ -98,7 +98,10 @@ M0 accepted ADR + dual-write spine                                    [done]
            INLINE (owner decisions B + B-analog, 2026-06-13 — both are
            runtime-stateful; plan gate is the sole interceptor gate;
            see m4-approval-sliceB-blocked + m4-budget-stays-inline reports)
-          -> M5 HookRegistry on spine
+          -> M5 hook OBSERVABILITY onto spine (taxonomy typed + reader-
+             surfaced); hook EXECUTION stays in the tool-dispatch layer
+             (it mutates in-flight args/results; the 6 session-lifecycle
+             hooks are unwired in prod) — see m5-hooks-observability-only
             -> M6 evidence + receipt FOLD          (was M2; corrected
                scope A) — now genuinely event-typed because M2 completed
                the taxonomy and M3/M4 preserved the decision-event contract
@@ -174,7 +177,7 @@ consumers by M6.
 | ADR-0032-M2 (REDEFINED, taxonomy-only §16) | Every audit event the evidence bundle reads is typed in `RunEventType` and mapped both directions, so the M2-T001 reader surfaces it **from the audit JSONL** (mapper is sufficient; emit-site migration is NOT in M2 — it is deferred to the component milestones, §16). Covers routes, git-sandbox, skills, tests, undo, provenance, approval/tool-call decision events, cancelled/pending lifecycle. Pure additive; zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — §14.) |
 | ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
 | ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
-| ADR-0032-M5 | HookRegistry subscribes through the spine; Claude-Code-compatible hook names remain aliases; public hook API docs and tests pass. |
+| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
 | ADR-0032-M6 (was M2 fold; corrected scope A) | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. |
 | ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
 
@@ -552,7 +555,15 @@ commit once Slice A is green.
 - Parallelizable: no.
 - Human Review Required: no.
 
-### ADR32-M5-T001: Hook Alias Matrix
+> **M5 REVISED — observability-only (2026-06-13).** T001 landed as a *typed
+> taxonomy* for the 5 audit events the dispatch-layer HookRegistry actually
+> emits (not an 8-name public alias matrix — 6 lifecycle hooks are unwired in
+> prod). T002 (execution bridge) is CLOSED as unsuitable: hooks mutate in-flight
+> args/results and the spine cannot ferry mutated payloads back. T003 (public
+> hook API docs) is deferred — no public hook semantics changed. See the M5
+> row in §7 and `docs/work-log/m5-hooks-observability-only-2026-06-13.md`.
+
+### ADR32-M5-T001: Hook Alias Matrix [DONE as observability taxonomy]
 
 - Goal: define the stable mapping from public hook names to `RunEventType`.
 - Scope: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact,
@@ -579,7 +590,14 @@ commit once Slice A is green.
 - Parallelizable: yes after M4.
 - Human Review Required: yes for public API wording.
 
-### ADR32-M5-T002: HookRegistry Consumer/Interceptor Bridge
+### ADR32-M5-T002: HookRegistry Consumer/Interceptor Bridge [CLOSED — unsuitable]
+
+> **CLOSED, not implemented (2026-06-13).** PreToolUse/PostToolUse mutate
+> in-flight `arguments`/`result` in `teaagent/tools.py::execute`; the spine has
+> no channel to carry a mutated payload back to that dispatch site, so an
+> interceptor cannot replace them without losing mutation (same coupling that
+> kept approval inline). Session-lifecycle hooks have no production caller —
+> nothing to migrate. Hook execution stays in the dispatch layer.
 
 - Goal: run HookRegistry through EventSpine while preserving hook veto and
   consumer semantics.
@@ -607,7 +625,12 @@ commit once Slice A is green.
 - Parallelizable: no.
 - Human Review Required: no unless hook public semantics change.
 
-### ADR32-M5-T003: Public Hook API Documentation
+### ADR32-M5-T003: Public Hook API Documentation [DEFERRED]
+
+> **DEFERRED (2026-06-13).** No public hook semantics changed (execution stays
+> in the dispatch layer; only observability typing was added), so there is no
+> new stability contract to document here. Revisit only if the session-lifecycle
+> hooks are ever wired (separate product decision).
 
 - Goal: document event-spine-backed hook lifecycle and stability contract.
 - Scope: public hook names, payload shapes, ordering, veto/isolation semantics.
diff --git a/docs/work-log/m5-hooks-observability-only-2026-06-13.md b/docs/work-log/m5-hooks-observability-only-2026-06-13.md
@@ -0,0 +1,88 @@
+# M5 — HookRegistry: Observability Onto the Spine, Execution Stays in Dispatch
+
+> **Status:** observability slice DONE (taxonomy typed + reader-surfaced); the
+> enforcement-bridge half of the planned M5 is assessed UNSUITABLE for the same
+> runtime-coupling reason as approval (B) and budget (B-analog). Recommendation:
+> close M5 as **observability-only**. Owner decision pending on the bridge.
+
+## What the plan assumed vs. what the code shows
+
+The work-plan's M5 (§7 row, T002) assumed hooks run in the runner and that the
+migration would move **PreToolUse → spine interceptor**, **PostToolUse + session
+lifecycle → spine consumers**, touching `teaagent/runner/_core.py`. The
+observable code contradicts every part of that:
+
+| Hook | Production caller | Nature |
+|---|---|---|
+| PreToolUse | `teaagent/tools.py::ToolRegistry.execute` (~line 222) | **Mutates in-flight `arguments`** fed to `tool.handler`; can veto via `HookError`; destructive-tool mutation guard |
+| PostToolUse | `teaagent/tools.py::ToolRegistry.execute` (~line 291) | **Mutates the tool `result`** returned upstream |
+| SessionStart / SessionEnd / UserPromptSubmit / PreCompact / Stop / SubagentStop | **none** — only tests call `run_session_*` etc. | defined + unit-tested, **not wired into any production path** |
+
+So hooks live at the **tool-dispatch layer**, plumbed via
+`tool_registry.hook_registry` ([chat_agent.py:502](../../teaagent/chat_agent.py),
+[run_contract.py:105](../../teaagent/integration/run_contract.py)) — not the
+runner the plan named.
+
+## Finding: the enforcement bridge is unsuitable (third consecutive case)
+
+1. **PreToolUse/PostToolUse are mutating, not pure decisions.** `run_pre_hooks`
+   rewrites `arguments`; the rewritten args feed `tool.handler`. `run_post_hooks`
+   rewrites `result`. An EventSpine interceptor can veto (raise) but the spine
+   has **no channel to carry mutated args/results back** to the dispatch site
+   that consumes them. This is the same shape that kept approval inline (a gate
+   that mutates in-flight state, not a pure event decision) — moving it onto the
+   spine would either lose the mutation capability or require the spine to ferry
+   mutable payloads back into `tools.py`, the coupling we rejected for approval.
+
+2. **The session-lifecycle hooks have no inline path to strangle.** They are not
+   invoked in production. "Move them to spine consumers" would move *nothing*;
+   to make them do anything you would have to **newly wire** them — that is
+   feature work, not a parity-preserving migration, and out of scope for the
+   strangler arc.
+
+3. Even setting (1)-(2) aside, PreToolUse runs *inside* `tool.handler` dispatch,
+   after the runner's plan interceptor and inline approval already allowed the
+   call — a different layer than the spine's `TOOL_CALL_REQUESTED` point.
+
+## What IS suitable, and was done
+
+The genuinely spine-shaped value is **observability** — identical to M2 and to
+the approval/budget observability that already reaches the M6 fold. The
+HookRegistry bridge already emits 5 audit events; they were untyped. Done:
+
+- Added 5 members to `RunEventType` (`teaagent/runner/_events.py`):
+  `TOOL_HOOK_PRE_MUTATION`, `TOOL_HOOK_PRE_MUTATION_BLOCKED`, `TOOL_HOOK_VETOED`,
+  `TOOL_HOOK_POST_MUTATION`, `TOOL_HOOK_POST_FAILED`.
+- Added the 5 bidirectional mapper entries; the M2-T001 reader now surfaces hook
+  veto/mutation activity **from the audit JSONL** for the M6 fold.
+- Test `test_m5_hook_audit_events_are_typed_and_reader_surfaced` +
+  round-trip completeness (`len(mapper) == len(RunEventType)` now 31).
+
+Mapping/reader only. **Hook execution is unchanged** — it stays in the dispatch
+layer, with its mutation semantics and audit emission exactly as before. Audit
+bytes are unchanged.
+
+## Recommendation
+
+**Close M5 as observability-only.** The hook taxonomy is typed and folds into
+M6; hook *enforcement/mutation* stays in `tools.py` by the same evidenced logic
+as approval (B) and budget (B-analog): the spine/interceptor model fits
+stateless, non-mutating decisions; mutating/dispatch-coupled mechanisms stay
+where they are.
+
+Separately (and out of the migration's scope): the 6 unwired session-lifecycle
+hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) are
+**defined but dead** in production. Wiring them is a product decision, not a
+spine migration — flag for the backlog, do not bundle here.
+
+## Pattern across M3-M5
+
+- **M3 plan gate** → moved to interceptor cleanly (stateless decision).
+- **M4 approval** → stays inline (runtime-stateful: JIT/handler/swappable policy).
+- **M4 budget** → stays inline (dedup state + interactive handler + multi-point).
+- **M5 hooks** → execution stays in dispatch (mutating + dead lifecycle hooks);
+  observability folds onto the spine.
+
+The spine's realized value is the **read side** (typed evidence → M6 fold → M7
+consumers), not wholesale relocation of enforcement. Plan gate is the one gate
+that genuinely belonged on the spine.
diff --git a/teaagent/runner/_events.py b/teaagent/runner/_events.py
@@ -71,6 +71,18 @@ class RunEventType(str, Enum):
     RUN_CANCELLED = 'run_cancelled'
     RUN_PENDING_APPROVAL = 'run_pending_approval'
 
+    # M5 hook-observability taxonomy (ADR 0032 M5, owner decision 2026-06-13):
+    # the audit events emitted by the HookRegistry bridge in the tool-dispatch
+    # layer (teaagent/tools.py). Typed + mapped so the M6 fold can surface hook
+    # veto/mutation activity FROM the audit JSONL. Mapping/reader only — hook
+    # *execution* stays in the dispatch layer (it mutates in-flight args/results,
+    # so it is not a pure spine interceptor; see the M5 assessment work-log).
+    TOOL_HOOK_PRE_MUTATION = 'tool_hook_pre_mutation'
+    TOOL_HOOK_PRE_MUTATION_BLOCKED = 'tool_hook_pre_mutation_blocked'
+    TOOL_HOOK_VETOED = 'tool_hook_vetoed'
+    TOOL_HOOK_POST_MUTATION = 'tool_hook_post_mutation'
+    TOOL_HOOK_POST_FAILED = 'tool_hook_post_failed'
+
     # Planned (later phases): PLAN_RESOLVED, DECISION_RECEIVED,
     # CONTEXT_COMPACTED, BUDGET_CHECKPOINT, ITERATION_COMPLETED,
     # FINAL_VALIDATION, RECEIPT_EMITTED, SESSION_START, SESSION_END,
@@ -127,6 +139,13 @@ class RunEvent:
     RunEventType.APPROVAL_DENIED: 'approval_denied',
     RunEventType.RUN_CANCELLED: 'run_cancelled',
     RunEventType.RUN_PENDING_APPROVAL: 'run_pending_approval',
+    # M5 hook-observability taxonomy — mapping only; hook execution stays in
+    # the tool-dispatch layer (teaagent/tools.py), not spine-emitted.
+    RunEventType.TOOL_HOOK_PRE_MUTATION: 'tool_hook_pre_mutation',
+    RunEventType.TOOL_HOOK_PRE_MUTATION_BLOCKED: 'tool_hook_pre_mutation_blocked',
+    RunEventType.TOOL_HOOK_VETOED: 'tool_hook_vetoed',
+    RunEventType.TOOL_HOOK_POST_MUTATION: 'tool_hook_post_mutation',
+    RunEventType.TOOL_HOOK_POST_FAILED: 'tool_hook_post_failed',
 }
 
 
diff --git a/tests/lifecycle/test_run_event_spine.py b/tests/lifecycle/test_run_event_spine.py
@@ -487,9 +487,10 @@ def decide(context: dict[str, Any]):
 def test_all_run_event_types_round_trip_through_mappers() -> None:
     """Every RunEventType member round-trips through both mapper directions.
 
-    Covers all 26 members (7 M0 + 19 M2 evidence-event taxonomy) — forward
-    through run_event_to_audit_event_type then back through
-    audit_event_to_run_event_type, and inverse through the dict mappers.
+    Covers all 31 members (7 M0 + 19 M2 evidence-event taxonomy + 5 M5
+    hook-observability taxonomy) — forward through run_event_to_audit_event_type
+    then back through audit_event_to_run_event_type, and inverse through the
+    dict mappers.
     """
     for event_type in RunEventType:
         aud_type = run_event_to_audit_event_type(event_type)
@@ -503,6 +504,30 @@ def test_all_run_event_types_round_trip_through_mappers() -> None:
     assert len(_AUDIT_EVENT_TO_RUN_EVENT_TYPE) == len(RunEventType)
 
 
+def test_m5_hook_audit_events_are_typed_and_reader_surfaced() -> None:
+    """M5: the 5 hook-observability audit events emitted by the dispatch-layer
+    HookRegistry bridge are typed in RunEventType and mapped both directions, so
+    the M2-T001 reader surfaces them from the audit JSONL for the M6 fold.
+
+    Hook *execution* stays in teaagent/tools.py (it mutates in-flight
+    args/results); only its observability is folded onto the spine — the same
+    taxonomy-only shape as M2 (and the approval/budget observability under the
+    decision-B/B-analog findings).
+    """
+    expected = {
+        'tool_hook_pre_mutation': RunEventType.TOOL_HOOK_PRE_MUTATION,
+        'tool_hook_pre_mutation_blocked': RunEventType.TOOL_HOOK_PRE_MUTATION_BLOCKED,
+        'tool_hook_vetoed': RunEventType.TOOL_HOOK_VETOED,
+        'tool_hook_post_mutation': RunEventType.TOOL_HOOK_POST_MUTATION,
+        'tool_hook_post_failed': RunEventType.TOOL_HOOK_POST_FAILED,
+    }
+    for audit_type, run_type in expected.items():
+        # Reader maps the audit JSONL string -> typed event (non-None => surfaced).
+        assert audit_event_to_run_event_type(audit_type) == run_type
+        # Forward mapping is exact and lossless.
+        assert run_event_to_audit_event_type(run_type) == audit_type
+
+
 # ---------------------------------------------------------------------------
 # M3-T001: PlanGateInterceptor unit tests
 # ---------------------------------------------------------------------------