Skip to content

Commit 57cc614

Browse files
johnteeeclaude
andcommitted
fix: address post-migration review findings (F1 hook-fold claim, F2 cutover guard, F3 guard scope)
Strict "review again" of M5/M6/M7 surfaced three claim-vs-evidence gaps; code was correct today but claims/guards were overstated or missing. F2 [Med-High] — the FOLD-T002 cutover (build_run_evidence_bundle routes through read_run_events_from_audit, which drops unmapped types) introduced an unguarded losslessness invariant. Added test_m6_every_evidence_extractor_type_is_typed: enumerates the audit types the evidence/proof-of-use extractors read and asserts each is in RunEventType, so a future untyped evidence type fails loudly instead of being silently dropped from production evidence. F1 [Med] — M5's "hook observability folds into M6" was hollow: no extractor reads tool_hook_*, and RunEvidenceBundle has no hooks field, so typed hook events never reach any bundle/receipt. Corrected the claim in the plan §7 M5 row, ADR realized- architecture M5 bullet, and memory: typing + reader-visibility ONLY; folding hook activity into receipts is backlog (needs a bundle field + extractor). F3 [Low-Med] — the M7 orphan-bus guard is a heuristic tripwire, not a proof: it excludes generic publish/emit, so a RunEventStream-shaped bus (subscribe+emit) is not detected. Documented the limitation in the validator docstring and the ADR. No production code behavior changed; this is a guard test + honest doc corrections. Constraint: docs + one additive guard test; no runtime behavior change; F2 guard is an enumerated-list coupling (loud failure on taxonomy drift), F3 limitation documented not closed. Tested: tests/test_run_evidence.py + tests/test_event_spine_wiring.py 23 passed (incl. new F2 guard); validate_event_spine_wiring exits 0. Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox). Confidence: high Roadmap-Status: unchanged Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 14e03cf commit 57cc614

5 files changed

Lines changed: 77 additions & 8 deletions

File tree

docs/adr/0032-run-event-taxonomy.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,10 @@ recorded here as the durable contract (see the per-phase work-logs under
149149
- **M5****hook OBSERVABILITY** is typed onto the spine; **hook EXECUTION stays
150150
in the tool-dispatch layer** (`teaagent/tools.py`) because PreToolUse/PostToolUse
151151
mutate in-flight args/results, and the session-lifecycle hooks are unwired.
152+
*Scope note (review F1): the 5 hook audit events are typed and reader-visible
153+
but NOT folded into evidence — no extractor reads them and `RunEvidenceBundle`
154+
has no hooks field. Surfacing hook activity in receipts is backlog (needs a
155+
bundle field + extractor), not delivered by M5.*
152156
- **M6****evidence/receipts fold over the typed stream** (`build_evidence_from_events`,
153157
now the production path inside `build_run_evidence_bundle`). Fixed a real
154158
lossiness gap (typed `RunEvent` now carries `created_at`).
@@ -173,9 +177,15 @@ EventSpine.emit ──(register_audit_consumer, M1)──▶ AuditLogger.record
173177
consumer would see only the spine-emitted subset (a coverage regression).
174178
- `ContextBus` and the integration `RunEventStream` are **unwired in production**;
175179
they are not lifecycle buses. The guard's allowlist names every sanctioned
176-
event-delivery surface, so a **new competing lifecycle-event bus fails the gate**
177-
and forces a conscious decision. The taxonomy-closure check proves no
178-
`RunEventType` is orphaned from the audit record.
180+
event-delivery surface. The taxonomy-closure check proves no `RunEventType` is
181+
orphaned from the audit record.
182+
- *Guard scope (review F3): the orphan-bus check is a **heuristic tripwire**, not
183+
a proof. It keys on specific high-signal method names (`register_consumer`,
184+
`register_interceptor`, `add_sink`, `on_event`, `publish_delta`,
185+
`subscribe_deltas`) and deliberately excludes generic `publish`/`emit` to avoid
186+
noise — so a bus shaped like `RunEventStream` (`subscribe`+`emit`) is not
187+
detected. It catches the common shapes and forces a conscious allowlist
188+
decision for them; it does not guarantee detection of every conceivable bus.*
179189

180190
**Lesson:** the spine's realized value is the **typed read side** (evidence →
181191
receipts) and a single typed lifecycle path — not wholesale relocation of

docs/generated/docs-inventory.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Do not edit this file manually — regenerate instead.
4242
| `adr/0029-consensus-validation-deferred.md` | working | 1587 | `8a2da40abc07` |
4343
| `adr/0030-root-module-freeze.md` | working | 1297 | `bee25422e85f` |
4444
| `adr/0031-shadow-mode-exit-criteria.md` | working | 3598 | `46a9a0d5eaac` |
45-
| `adr/0032-run-event-taxonomy.md` | working | 14011 | `50c291b14941` |
45+
| `adr/0032-run-event-taxonomy.md` | working | 14751 | `259c84511705` |
4646
| `adr/README.md` | working | 7109 | `713a782f5411` |
4747
| `agent-contribution-contract.md` | constitution | 5204 | `9c2dad1195d2` |
4848
| `agent-mode-operator-guide.md` | working | 2778 | `25b258ab7bfe` |
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
414414
| `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
415415
| `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
416416
| `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
417-
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 60885 | `70eeb1c1049b` |
417+
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 61230 | `54a436ad04b7` |
418418
| `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
419419
| `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
420420
| `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |

docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ consumers by M6.
177177
| ADR-0032-M2 (REDEFINED, taxonomy-only §16) | Every audit event the evidence bundle reads is typed in `RunEventType` and mapped both directions, so the M2-T001 reader surfaces it **from the audit JSONL** (mapper is sufficient; emit-site migration is NOT in M2 — it is deferred to the component milestones, §16). Covers routes, git-sandbox, skills, tests, undo, provenance, approval/tool-call decision events, cancelled/pending lifecycle. Pure additive; zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — §14.) |
178178
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
179179
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
180-
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
180+
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader can surface them as typed RunEvents. **Correction (post-migration review F1):** this is typing + reader-visibility ONLY — it is NOT yet folded into evidence/receipts. No evidence extractor reads `tool_hook_*` and `RunEvidenceBundle` has no hooks field, so hook veto/mutation activity does not currently appear in any bundle/receipt. Surfacing it would need a new `RunEvidenceBundle` hooks field + extractor (backlog). Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
181181
| ADR-0032-M6 (was M2 fold; corrected scope A) — **COMPLETE (FOLD-T001 + T002)** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1). **FOLD-T001**: `build_evidence_from_events()` parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted (`tests/test_run_evidence.py::test_m6_fold_*`). Fixed a structural gap: the typed `RunEvent` was lossy — dropped top-level `created_at` (threaded into command/test/approval timestamps); added optional `RunEvent.created_at`, reader populates it. **FOLD-T002 (cutover DONE)**: `build_run_evidence_bundle` now routes production evidence THROUGH the typed reader + fold — the typed stream is the production path; the raw-dict assembly survives only as the shared helper (so the two cannot diverge). Suite-wide green (evidence/receipt/summary/5-min-proof/first-hour/adversarial + all bundle consumers, ~218 tests). **Finding: no synthetic receipt-only fixtures existed to retire** — the receipt/evidence path was already event-backed (`test_run_receipt.py` writes real RunStore events; `test_real_run_receipt_completeness_from_plan` validates a real run); direct `RunEvidenceBundle(...)` constructions are legitimate downstream-consumer/checker unit tests, not masking fixtures. The plan anticipated a gap that does not exist. Parity test re-anchored against `_assemble_evidence_bundle` (the raw-dict path) so it stays meaningful post-cutover. |
182182
| ADR-0032-M7 (was M6) — **COMPLETE as guard + document, 2026-06-13** | Original goal ("ContextBus + webhook consume the spine; delete inline eventing") **NOT done — it is a regression or vacuous.** Webhook is an `audit.add_sink` already fed transitively by the M1 spine→audit consumer; a *direct* spine consumer would see only the spine-emitted subset (coverage regression). ContextBus + integration `RunEventStream` are **unwired in production** (no callers) — nothing to migrate. The inline `audit.record` calls are the **complete event record** (read by evidence/receipts/webhook), not redundant eventing to delete. **Done instead (owner: guard + document):** `scripts/validate_event_spine_wiring.py` + `tests/test_event_spine_wiring.py` enforce the realized invariant — one typed lifecycle path (EventSpine→audit consumer), an allowlist of sanctioned event-delivery surfaces so a NEW competing lifecycle bus fails the gate, and taxonomy closure (no RunEventType orphaned from the audit record). Added as a pre-commit hook. ADR 0032 "Realized architecture (M1–M7)" section documents the outcome. **MIGRATION COMPLETE.** |
183183

scripts/validate_event_spine_wiring.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,15 @@
2626
(no orphaned typed event that cannot reach the audit record).
2727
B. No orphaned event bus — any class exposing a high-signal lifecycle-event
2828
delivery method must be in the curated allowlist below. A *new* competing
29-
lifecycle-event bus fails the gate, forcing a conscious architecture
30-
decision rather than silent drift.
29+
lifecycle-event bus of a common shape fails the gate, forcing a conscious
30+
architecture decision rather than silent drift.
31+
32+
LIMITATION (heuristic tripwire, not a proof): check B keys on specific
33+
method names and deliberately excludes generic ``publish`` / ``emit`` to
34+
avoid noise from skill-writer / marketplace / token-streaming. As a result
35+
a bus shaped like the integration ``RunEventStream`` (``subscribe`` +
36+
``emit``) is NOT detected. The check catches the common shapes; it does not
37+
guarantee detection of every conceivable event bus.
3138
3239
Run: python3 scripts/validate_event_spine_wiring.py
3340
Exit code 0 when clean, 1 on any violation.

tests/test_run_evidence.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -501,3 +501,55 @@ def test_m6_fold_preserves_created_at_in_command_timestamps():
501501
assert typed[0].created_at == '2026-06-13T13:00:01+00:00'
502502
folded = build_evidence_from_events(typed, root=root, run_id='r-ts')
503503
assert folded.commands_run[0].timestamp == '2026-06-13T13:00:01+00:00'
504+
505+
506+
# Audit event types that the evidence/proof-of-use extractors filter on. After
507+
# the M6 FOLD-T002 cutover, build_run_evidence_bundle routes through
508+
# read_run_events_from_audit, which DROPS any audit event whose type is not in
509+
# RunEventType. So production evidence is lossless ONLY IF every type below is
510+
# typed. This list is the enforced coupling: if you teach an extractor to read a
511+
# NEW audit event type, add it here — the test will then fail until that type is
512+
# also added to RunEventType (otherwise the cutover would silently drop it).
513+
EVIDENCE_EXTRACTOR_AUDIT_TYPES: frozenset[str] = frozenset(
514+
{
515+
'run_started',
516+
'run_failed',
517+
'tool_use',
518+
'tool_call_started',
519+
'tool_call_completed',
520+
'tool_error',
521+
'test_run',
522+
'approval_requested',
523+
'approval_granted',
524+
'approval_denied',
525+
'tool_call_approved',
526+
'tool_call_denied',
527+
'tool_call_pending_approval',
528+
'model_route',
529+
'provenance_collected',
530+
'skill_activated',
531+
'skill_lifecycle_transition',
532+
'git_sandbox_started',
533+
'git_sandbox_resolved',
534+
'undo_applied',
535+
}
536+
)
537+
538+
539+
def test_m6_every_evidence_extractor_type_is_typed() -> None:
540+
"""Guard the FOLD-T002 cutover: every audit type the evidence extractors read
541+
must be in RunEventType, or read_run_events_from_audit would silently drop it
542+
from production evidence (F2 from the post-migration review).
543+
"""
544+
from teaagent.runner._events import _AUDIT_EVENT_TO_RUN_EVENT_TYPE
545+
546+
missing = sorted(
547+
t
548+
for t in EVIDENCE_EXTRACTOR_AUDIT_TYPES
549+
if t not in _AUDIT_EVENT_TO_RUN_EVENT_TYPE
550+
)
551+
assert not missing, (
552+
f'evidence extractors read audit types not in RunEventType: {missing}. '
553+
f'The M6 cutover would silently drop them — add them to RunEventType + '
554+
f'the audit mapper in teaagent/runner/_events.py.'
555+
)

0 commit comments

Comments
 (0)