Skip to content

Commit 8d46213

Browse files
johnteeeclaude
andcommitted
feat: close all review residuals (F1 hooks folded, F2 auto-guard, F3 pub/sub detection)
"Fix all remaining" — closes the three documented residuals from the post-migration review end-to-end. F1 (hooks now folded into evidence): added HookActivityRecord + RunEvidenceBundle.hook_activity + extract_hook_activity(), wired into _assemble_evidence_bundle. Hook veto/mutation activity now appears in the bundle/ receipt and folds through the typed stream (the M5 typing was the prerequisite). Makes the original M5 claim true. Bundle gains a 'hook_activity' key; verified safe across all consumers (no strict key-set assertion; receipt formatting unaffected). F2 (cutover guard can no longer go stale): replaced the hand-maintained extractor- type list with an AST auto-discovery in validate_event_spine_wiring.py (check_evidence_extractor_types_typed) — it finds the event_type literals run_evidence/proof_of_use compare against (==, inline `in {...}`, and `in NAME` for module-level frozenset/set constants incl. annotated ones) and asserts each is in RunEventType. Caught a real blind spot mid-implementation: annotated constants (_HOOK_AUDIT_TYPES) were initially missed. F3 (pub/sub bus shape detected): the orphan-bus scan now also flags the subscribe+emit pair, so a RunEventStream-shaped bus is caught; RunEventStream added to the allowlist. Validator docstring + ADR narrowed the documented limitation. Validator now runs three checks (A taxonomy closure, B no-orphan-bus, C evidence- type coverage); all in the check-event-spine-wiring pre-commit hook. Docs (plan §7 M5/M7 rows, FOLD/T003 tickets, ADR realized-architecture) updated to RESOLVED. Constraint: F1 adds an additive bundle field (no behavior change to existing fields); F2/F3 are read-only static guards; remaining limitation (novel-naming buses, exotic dynamic event-type lookups) documented honestly. Tested: tests/test_event_spine_wiring.py + test_run_evidence.py + lifecycle spine 53 passed; broad consumer regression 168 passed; validator exits 0, flags seeded subscribe+emit bus + seeded untyped evidence type. Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox). Confidence: high Roadmap-Status: unchanged Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 57cc614 commit 8d46213

7 files changed

Lines changed: 406 additions & 77 deletions

File tree

docs/adr/0032-run-event-taxonomy.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -149,10 +149,10 @@ recorded here as the durable contract (see the per-phase work-logs under
149149
- **M5****hook OBSERVABILITY** is typed onto the spine; **hook EXECUTION stays
150150
in the tool-dispatch layer** (`teaagent/tools.py`) because PreToolUse/PostToolUse
151151
mutate in-flight args/results, and the session-lifecycle hooks are unwired.
152-
*Scope note (review F1): the 5 hook audit events are typed and reader-visible
153-
but NOT folded into evidence — no extractor reads them and `RunEvidenceBundle`
154-
has no hooks field. Surfacing hook activity in receipts is backlog (needs a
155-
bundle field + extractor), not delivered by M5.*
152+
*Scope note (review F1, RESOLVED 2026-06-14): the 5 hook audit events are typed
153+
and now folded into evidence via `RunEvidenceBundle.hook_activity` +
154+
`extract_hook_activity`hook veto/mutation appears in the bundle/receipt and
155+
folds through the typed stream.*
156156
- **M6****evidence/receipts fold over the typed stream** (`build_evidence_from_events`,
157157
now the production path inside `build_run_evidence_bundle`). Fixed a real
158158
lossiness gap (typed `RunEvent` now carries `created_at`).
@@ -179,13 +179,19 @@ EventSpine.emit ──(register_audit_consumer, M1)──▶ AuditLogger.record
179179
they are not lifecycle buses. The guard's allowlist names every sanctioned
180180
event-delivery surface. The taxonomy-closure check proves no `RunEventType` is
181181
orphaned from the audit record.
182-
- *Guard scope (review F3): the orphan-bus check is a **heuristic tripwire**, not
183-
a proof. It keys on specific high-signal method names (`register_consumer`,
184-
`register_interceptor`, `add_sink`, `on_event`, `publish_delta`,
185-
`subscribe_deltas`) and deliberately excludes generic `publish`/`emit` to avoid
186-
noise — so a bus shaped like `RunEventStream` (`subscribe`+`emit`) is not
187-
detected. It catches the common shapes and forces a conscious allowlist
188-
decision for them; it does not guarantee detection of every conceivable bus.*
182+
- A third check (review F2) AST-discovers the audit `event_type` literals the
183+
evidence extractors read (`run_evidence.py`, `proof_of_use.py`) and asserts
184+
each is in `RunEventType` — so the M6 FOLD-T002 cutover (which drops unmapped
185+
types) can never silently lose evidence as extractors evolve.
186+
- *Guard scope (review F3, narrowed 2026-06-14): the orphan-bus check keys on
187+
high-signal method names (`register_consumer`, `register_interceptor`,
188+
`add_sink`, `on_event`, `publish_delta`, `subscribe_deltas`) **plus the
189+
`subscribe`+`emit` pub/sub pair** (so the `RunEventStream` shape is now caught).
190+
It remains a heuristic — a bus using entirely novel naming could still evade
191+
it — but it catches every shape that occurs in-tree and forces a conscious
192+
allowlist decision. The F2 discovery similarly resolves `==`/`in` against
193+
string literals and module-level `frozenset`/`set` constants; exotic dynamic
194+
lookups are out of scope.*
189195

190196
**Lesson:** the spine's realized value is the **typed read side** (evidence →
191197
receipts) and a single typed lifecycle path — not wholesale relocation of

docs/generated/docs-inventory.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Do not edit this file manually — regenerate instead.
4242
| `adr/0029-consensus-validation-deferred.md` | working | 1587 | `8a2da40abc07` |
4343
| `adr/0030-root-module-freeze.md` | working | 1297 | `bee25422e85f` |
4444
| `adr/0031-shadow-mode-exit-criteria.md` | working | 3598 | `46a9a0d5eaac` |
45-
| `adr/0032-run-event-taxonomy.md` | working | 14751 | `259c84511705` |
45+
| `adr/0032-run-event-taxonomy.md` | working | 15143 | `dbeac8e89ac2` |
4646
| `adr/README.md` | working | 7109 | `713a782f5411` |
4747
| `agent-contribution-contract.md` | constitution | 5204 | `9c2dad1195d2` |
4848
| `agent-mode-operator-guide.md` | working | 2778 | `25b258ab7bfe` |
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
414414
| `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
415415
| `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
416416
| `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
417-
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 61230 | `54a436ad04b7` |
417+
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 61722 | `c6144278d07d` |
418418
| `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
419419
| `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
420420
| `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |

docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -177,9 +177,9 @@ consumers by M6.
177177
| ADR-0032-M2 (REDEFINED, taxonomy-only §16) | Every audit event the evidence bundle reads is typed in `RunEventType` and mapped both directions, so the M2-T001 reader surfaces it **from the audit JSONL** (mapper is sufficient; emit-site migration is NOT in M2 — it is deferred to the component milestones, §16). Covers routes, git-sandbox, skills, tests, undo, provenance, approval/tool-call decision events, cancelled/pending lifecycle. Pure additive; zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — §14.) |
178178
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
179179
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
180-
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader can surface them as typed RunEvents. **Correction (post-migration review F1):** this is typing + reader-visibility ONLY — it is NOT yet folded into evidence/receipts. No evidence extractor reads `tool_hook_*` and `RunEvidenceBundle` has no hooks field, so hook veto/mutation activity does not currently appear in any bundle/receipt. Surfacing it would need a new `RunEvidenceBundle` hooks field + extractor (backlog). Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
180+
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader can surface them as typed RunEvents. **Update (review F1 RESOLVED, 2026-06-14):** initially this was typing + reader-visibility only; the "fold" claim was hollow because no extractor read `tool_hook_*`. Now fixed end-to-end: added `HookActivityRecord` + `RunEvidenceBundle.hook_activity` + `extract_hook_activity()`, wired into `_assemble_evidence_bundle`, so hook veto/mutation activity now appears in the bundle (and folds through the typed stream — the M5 typing was the prerequisite). Audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
181181
| ADR-0032-M6 (was M2 fold; corrected scope A) — **COMPLETE (FOLD-T001 + T002)** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1). **FOLD-T001**: `build_evidence_from_events()` parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted (`tests/test_run_evidence.py::test_m6_fold_*`). Fixed a structural gap: the typed `RunEvent` was lossy — dropped top-level `created_at` (threaded into command/test/approval timestamps); added optional `RunEvent.created_at`, reader populates it. **FOLD-T002 (cutover DONE)**: `build_run_evidence_bundle` now routes production evidence THROUGH the typed reader + fold — the typed stream is the production path; the raw-dict assembly survives only as the shared helper (so the two cannot diverge). Suite-wide green (evidence/receipt/summary/5-min-proof/first-hour/adversarial + all bundle consumers, ~218 tests). **Finding: no synthetic receipt-only fixtures existed to retire** — the receipt/evidence path was already event-backed (`test_run_receipt.py` writes real RunStore events; `test_real_run_receipt_completeness_from_plan` validates a real run); direct `RunEvidenceBundle(...)` constructions are legitimate downstream-consumer/checker unit tests, not masking fixtures. The plan anticipated a gap that does not exist. Parity test re-anchored against `_assemble_evidence_bundle` (the raw-dict path) so it stays meaningful post-cutover. |
182-
| ADR-0032-M7 (was M6) — **COMPLETE as guard + document, 2026-06-13** | Original goal ("ContextBus + webhook consume the spine; delete inline eventing") **NOT done — it is a regression or vacuous.** Webhook is an `audit.add_sink` already fed transitively by the M1 spine→audit consumer; a *direct* spine consumer would see only the spine-emitted subset (coverage regression). ContextBus + integration `RunEventStream` are **unwired in production** (no callers) — nothing to migrate. The inline `audit.record` calls are the **complete event record** (read by evidence/receipts/webhook), not redundant eventing to delete. **Done instead (owner: guard + document):** `scripts/validate_event_spine_wiring.py` + `tests/test_event_spine_wiring.py` enforce the realized invariant — one typed lifecycle path (EventSpine→audit consumer), an allowlist of sanctioned event-delivery surfaces so a NEW competing lifecycle bus fails the gate, and taxonomy closure (no RunEventType orphaned from the audit record). Added as a pre-commit hook. ADR 0032 "Realized architecture (M1–M7)" section documents the outcome. **MIGRATION COMPLETE.** |
182+
| ADR-0032-M7 (was M6) — **COMPLETE as guard + document, 2026-06-13** | Original goal ("ContextBus + webhook consume the spine; delete inline eventing") **NOT done — it is a regression or vacuous.** Webhook is an `audit.add_sink` already fed transitively by the M1 spine→audit consumer; a *direct* spine consumer would see only the spine-emitted subset (coverage regression). ContextBus + integration `RunEventStream` are **unwired in production** (no callers) — nothing to migrate. The inline `audit.record` calls are the **complete event record** (read by evidence/receipts/webhook), not redundant eventing to delete. **Done instead (owner: guard + document):** `scripts/validate_event_spine_wiring.py` + `tests/test_event_spine_wiring.py` enforce the realized invariant with three checks — (A) taxonomy closure (no RunEventType orphaned from the audit record); (B) no orphaned event bus (allowlist of sanctioned surfaces; high-signal methods **plus** the subscribe+emit pub/sub pair so the RunEventStream shape is caught — review F3); (C) evidence-extractor type coverage (AST-discovers the event_type literals run_evidence/proof_of_use read and asserts each is typed, so the M6 cutover can't silently drop evidence — review F2). Added as a pre-commit hook. ADR 0032 "Realized architecture (M1–M7)" section documents the outcome. **MIGRATION COMPLETE.** |
183183

184184
## 8. Task Plan
185185

@@ -725,15 +725,17 @@ commit once Slice A is green.
725725

726726
### ADR32-M6-T003: Orphaned Eventing Validator [DONE]
727727

728-
> **DONE (2026-06-13).** `scripts/validate_event_spine_wiring.py` +
729-
> `tests/test_event_spine_wiring.py`. Two checks: (A) taxonomy closure — every
730-
> `RunEventType` maps losslessly to the audit record (no orphaned typed event);
731-
> (B) no orphaned event bus — an AST scan for high-signal lifecycle-event methods
732-
> (`register_consumer`/`register_interceptor`/`add_sink`/`on_event`/
733-
> `publish_delta`/`subscribe_deltas`; generic `publish`/`emit` excluded to avoid
734-
> noise) must match a curated allowlist of sanctioned surfaces, so a new
735-
> competing bus fails. Seeded-bad-fixture tests included. Added as the
736-
> `check-event-spine-wiring` pre-commit hook.
728+
> **DONE (2026-06-13; checks extended 2026-06-14 per review F2/F3).**
729+
> `scripts/validate_event_spine_wiring.py` + `tests/test_event_spine_wiring.py`.
730+
> Three checks: (A) taxonomy closure — every `RunEventType` maps losslessly to
731+
> the audit record; (B) no orphaned event bus — AST scan for high-signal
732+
> lifecycle-event methods (`register_consumer`/`register_interceptor`/`add_sink`/
733+
> `on_event`/`publish_delta`/`subscribe_deltas`) **plus the `subscribe`+`emit`
734+
> pub/sub pair** (F3) must match a curated allowlist; (C) evidence-extractor type
735+
> coverage — AST-discovers the `event_type` literals `run_evidence`/`proof_of_use`
736+
> read (incl. annotated module-level frozensets) and asserts each is typed, so
737+
> the M6 cutover can't silently drop evidence (F2). Seeded-bad-fixture tests
738+
> included. Added as the `check-event-spine-wiring` pre-commit hook.
737739
738740
- Goal: prove there are no competing lifecycle event systems left after M6.
739741
- Scope: static validation over audit strings, HookRegistry emissions,

0 commit comments

Comments
 (0)