You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: ADR 0032 M6 FOLD-T002 — cut production evidence over to the typed-stream fold
build_run_evidence_bundle now derives evidence FROM the typed RunEvent stream
(read_run_events_from_audit + build_evidence_from_events), not raw audit dicts.
The typed stream is the production path; the raw-dict assembly survives only as
the shared _assemble_evidence_bundle helper that the fold also calls, so the two
cannot diverge. Every evidence-bearing event type is typed (M2+M3+M5), so the
typed reader is lossless here.
Finding (honest): the plan anticipated "synthetic receipt-only fixtures masking
real-path gaps" to retire — they do not exist. The receipt/evidence path was
already event-backed (tests/test_run_receipt.py writes real RunStore events;
test_real_run_receipt_completeness_from_plan validates a real run). The direct
RunEvidenceBundle(...) constructions in the suite are legitimate downstream-
consumer/checker unit tests, not gap-masking fixtures. No real-path gaps
surfaced under the cutover.
- teaagent/run_evidence.py: build_run_evidence_bundle routes through the fold.
- tests/test_run_evidence.py: re-anchored the FOLD-T001 parity test against
_assemble_evidence_bundle (raw-dict path) so it stays non-circular now that
the public builder itself folds.
- Plan §7 M6 row + FOLD-T002 ticket marked DONE; M6 COMPLETE.
Constraint: public API unchanged; raw-dict assembly retained as shared helper so fold==assembly is structurally guaranteed; no behavior change observed (lossless typing).
Tested: tests/test_run_evidence.py 15 passed; evidence/receipt/summary/5-min-proof/first-hour/adversarial 47 passed; all bundle consumers (skill/route/completeness/tui-cost/goal/provenance/summary/ws4-observability/conversation-ux/p0-harness) 171 passed; mypy clean.
Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox).
Confidence: high
Roadmap-Status: unchanged
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md
+17-2Lines changed: 17 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,7 @@ consumers by M6.
178
178
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
179
179
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
180
180
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
181
-
| ADR-0032-M6 (was M2 fold; corrected scope A) — **FOLD-T001 DONE** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. **FOLD-T001 landed**: `build_evidence_from_events()` is a parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted on success/failure/pending fixtures (`tests/test_run_evidence.py::test_m6_fold_*`). Surfaced + fixed a structural gap: the typed `RunEvent` was **lossy** — it dropped the top-level `created_at` that the extractors thread into command/test/approval timestamps; added `RunEvent.created_at` (optional; reader populates it from audit). Legacy stays default. **FOLD-T002 (cutover: switch receipt/evidence default to the fold + retire synthetic fixtures) PENDING** — the behavior-changing slice. |
181
+
| ADR-0032-M6 (was M2 fold; corrected scope A) — **COMPLETE (FOLD-T001 + T002)** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1). **FOLD-T001**: `build_evidence_from_events()` parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted (`tests/test_run_evidence.py::test_m6_fold_*`). Fixed a structural gap: the typed `RunEvent` was lossy — dropped top-level `created_at` (threaded into command/test/approval timestamps); added optional `RunEvent.created_at`, reader populates it. **FOLD-T002 (cutover DONE)**: `build_run_evidence_bundle` now routes production evidence THROUGH the typed reader + fold — the typed stream is the production path; the raw-dict assembly survives only as the shared helper (so the two cannot diverge). Suite-wide green (evidence/receipt/summary/5-min-proof/first-hour/adversarial + all bundle consumers, ~218 tests). **Finding: no synthetic receipt-only fixtures existed to retire** — the receipt/evidence path was already event-backed (`test_run_receipt.py` writes real RunStore events; `test_real_run_receipt_completeness_from_plan` validates a real run); direct `RunEvidenceBundle(...)` constructions are legitimate downstream-consumer/checker unit tests, not masking fixtures. The plan anticipated a gap that does not exist. Parity test re-anchored against `_assemble_evidence_bundle` (the raw-dict path) so it stays meaningful post-cutover. |
182
182
| ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
183
183
184
184
## 8. Task Plan
@@ -805,7 +805,22 @@ commit once Slice A is green.
805
805
- Risk: medium (parity-gated, additive). Parallelizable: no.
0 commit comments