You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: ADR 0032 M6 FOLD-T001 — evidence bundle fold over the typed event stream
Adds build_evidence_from_events() as a parallel read-side builder that folds the
typed RunEvent stream (M2-T001 reader) into a RunEvidenceBundle. It shares a new
_assemble_evidence_bundle() helper with the legacy build_run_evidence_bundle, so
the two paths cannot drift — only the event SOURCE differs (RunStore dicts vs.
typed stream). Legacy stays the default; no fallback flag (Q1); parity is proven
by test.
Structural finding + fix: the typed RunEvent was LOSSY for evidence — it dropped
the top-level created_at that the extractors thread into command/test/approval
timestamps, so a faithful fold was impossible. Added optional RunEvent.created_at
(intrinsic event metadata), populated by read_run_events_from_audit from the
audit entry. Live emit path leaves it None (the fold reads persisted audit).
Backward-compatible: all RunEvent construction sites use keyword args.
- teaagent/runner/_events.py: RunEvent.created_at; reader populates it.
- teaagent/run_evidence.py: build_evidence_from_events + _assemble_evidence_bundle
refactor (legacy body unchanged, now shared).
- tests/test_run_evidence.py: test_m6_fold_equals_legacy_{success,failure,pending}
(non-vacuous: success asserts commands+approvals present) + a created_at
preservation test.
- Plan §7 M6 row + FOLD-T001 ticket marked DONE; FOLD-T002 cutover pending.
Every evidence event type is typed (M2+M3+M5), so the typed reader is lossless;
fold bundle == legacy bundle on success/failure/pending fixtures. Cancelled
parity deferred until run_cancelled is spine-emitted (documented gap).
Constraint: parallel builder only; legacy stays default; no behavior change; cancelled-fixture parity is a documented gap, not masked.
Tested: tests/test_run_evidence.py + tests/lifecycle/test_run_event_spine.py 37 passed; receipt + evidence-summary + adversarial acceptance 28 passed; mypy clean on changed files.
Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox).
Confidence: high
Roadmap-Status: unchanged
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,7 @@ consumers by M6.
178
178
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
179
179
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
180
180
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
181
-
| ADR-0032-M6 (was M2 fold; corrected scope A) | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. |
181
+
| ADR-0032-M6 (was M2 fold; corrected scope A) — **FOLD-T001 DONE** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. **FOLD-T001 landed**: `build_evidence_from_events()` is a parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted on success/failure/pending fixtures (`tests/test_run_evidence.py::test_m6_fold_*`). Surfaced + fixed a structural gap: the typed `RunEvent` was **lossy** — it dropped the top-level `created_at` that the extractors thread into command/test/approval timestamps; added `RunEvent.created_at` (optional; reader populates it from audit). Legacy stays default. **FOLD-T002 (cutover: switch receipt/evidence default to the fold + retire synthetic fixtures) PENDING** — the behavior-changing slice. |
182
182
| ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
183
183
184
184
## 8. Task Plan
@@ -768,7 +768,18 @@ commit once Slice A is green.
768
768
> orphaned-eventing **cleanup** phase = new **M7**. They run LAST, after the
769
769
> evidence/receipt fold below.
770
770
771
-
### ADR32-FOLD-T001: Evidence Bundle Fold (new M6; was ADR32-M2-T002, corrected scope A)
771
+
### ADR32-FOLD-T001: Evidence Bundle Fold (new M6; was ADR32-M2-T002, corrected scope A) [DONE]
772
+
773
+
> **DONE (2026-06-13).**`build_evidence_from_events()` added in
774
+
> `teaagent/run_evidence.py`, sharing `_assemble_evidence_bundle` with the legacy
775
+
> `build_run_evidence_bundle` so they cannot drift (only the event *source*
776
+
> differs). Parity asserted on success/failure/pending fixtures
777
+
> (`tests/test_run_evidence.py::test_m6_fold_equals_legacy_*`). Finding: the
778
+
> typed `RunEvent` was lossy (dropped top-level `created_at`, which extractors
779
+
> thread into command/test/approval timestamps) — fixed by adding optional
780
+
> `RunEvent.created_at`, populated by the reader. Legacy stays default; no
781
+
> fallback flag (Q1). Cancelled-fixture parity deferred until `run_cancelled`
782
+
> is spine-emitted (documented gap, not a silent pass).
772
783
773
784
- Goal: add `build_evidence_from_events()` that folds the **full** typed event
774
785
stream (M0 lifecycle + M2-coverage events) into the existing
0 commit comments