Skip to content

Commit 8b280bb

Browse files
johnteeeclaude
andcommitted
feat: ADR 0032 M6 FOLD-T002 — cut production evidence over to the typed-stream fold
build_run_evidence_bundle now derives evidence FROM the typed RunEvent stream (read_run_events_from_audit + build_evidence_from_events), not raw audit dicts. The typed stream is the production path; the raw-dict assembly survives only as the shared _assemble_evidence_bundle helper that the fold also calls, so the two cannot diverge. Every evidence-bearing event type is typed (M2+M3+M5), so the typed reader is lossless here. Finding (honest): the plan anticipated "synthetic receipt-only fixtures masking real-path gaps" to retire — they do not exist. The receipt/evidence path was already event-backed (tests/test_run_receipt.py writes real RunStore events; test_real_run_receipt_completeness_from_plan validates a real run). The direct RunEvidenceBundle(...) constructions in the suite are legitimate downstream- consumer/checker unit tests, not gap-masking fixtures. No real-path gaps surfaced under the cutover. - teaagent/run_evidence.py: build_run_evidence_bundle routes through the fold. - tests/test_run_evidence.py: re-anchored the FOLD-T001 parity test against _assemble_evidence_bundle (raw-dict path) so it stays non-circular now that the public builder itself folds. - Plan §7 M6 row + FOLD-T002 ticket marked DONE; M6 COMPLETE. Constraint: public API unchanged; raw-dict assembly retained as shared helper so fold==assembly is structurally guaranteed; no behavior change observed (lossless typing). Tested: tests/test_run_evidence.py 15 passed; evidence/receipt/summary/5-min-proof/first-hour/adversarial 47 passed; all bundle consumers (skill/route/completeness/tui-cost/goal/provenance/summary/ws4-observability/conversation-ux/p0-harness) 171 passed; mypy clean. Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox). Confidence: high Roadmap-Status: unchanged Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 5f32598 commit 8b280bb

4 files changed

Lines changed: 46 additions & 11 deletions

File tree

docs/generated/docs-inventory.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
414414
| `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
415415
| `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
416416
| `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
417-
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 56141 | `37d1576baf5a` |
417+
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 57704 | `53c78129190f` |
418418
| `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
419419
| `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
420420
| `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |

docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ consumers by M6.
178178
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
179179
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
180180
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
181-
| ADR-0032-M6 (was M2 fold; corrected scope A) — **FOLD-T001 DONE** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. **FOLD-T001 landed**: `build_evidence_from_events()` is a parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted on success/failure/pending fixtures (`tests/test_run_evidence.py::test_m6_fold_*`). Surfaced + fixed a structural gap: the typed `RunEvent` was **lossy** — it dropped the top-level `created_at` that the extractors thread into command/test/approval timestamps; added `RunEvent.created_at` (optional; reader populates it from audit). Legacy stays default. **FOLD-T002 (cutover: switch receipt/evidence default to the fold + retire synthetic fixtures) PENDING** — the behavior-changing slice. |
181+
| ADR-0032-M6 (was M2 fold; corrected scope A) — **COMPLETE (FOLD-T001 + T002)** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1). **FOLD-T001**: `build_evidence_from_events()` parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted (`tests/test_run_evidence.py::test_m6_fold_*`). Fixed a structural gap: the typed `RunEvent` was lossy — dropped top-level `created_at` (threaded into command/test/approval timestamps); added optional `RunEvent.created_at`, reader populates it. **FOLD-T002 (cutover DONE)**: `build_run_evidence_bundle` now routes production evidence THROUGH the typed reader + fold — the typed stream is the production path; the raw-dict assembly survives only as the shared helper (so the two cannot diverge). Suite-wide green (evidence/receipt/summary/5-min-proof/first-hour/adversarial + all bundle consumers, ~218 tests). **Finding: no synthetic receipt-only fixtures existed to retire** — the receipt/evidence path was already event-backed (`test_run_receipt.py` writes real RunStore events; `test_real_run_receipt_completeness_from_plan` validates a real run); direct `RunEvidenceBundle(...)` constructions are legitimate downstream-consumer/checker unit tests, not masking fixtures. The plan anticipated a gap that does not exist. Parity test re-anchored against `_assemble_evidence_bundle` (the raw-dict path) so it stays meaningful post-cutover. |
182182
| ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
183183

184184
## 8. Task Plan
@@ -805,7 +805,22 @@ commit once Slice A is green.
805805
- Risk: medium (parity-gated, additive). Parallelizable: no.
806806
Human Review Required: no.
807807

808-
### ADR32-FOLD-T002: Receipt Fold + Synthetic Fixture Retirement (was ADR32-M2-T003)
808+
### ADR32-FOLD-T002: Receipt Fold + Synthetic Fixture Retirement (was ADR32-M2-T003) [DONE]
809+
810+
> **DONE (2026-06-13, owner chose "do the cutover now").** `build_run_evidence_bundle`
811+
> now routes production evidence through `read_run_events_from_audit` +
812+
> `build_evidence_from_events` — the typed stream is the production path; raw-dict
813+
> assembly survives only as the shared `_assemble_evidence_bundle` helper. Green
814+
> suite-wide (~218 tests across evidence/receipt/summary/5-min-proof/first-hour/
815+
> adversarial + all bundle consumers). **Finding: there were no synthetic
816+
> receipt-only fixtures to retire** — the receipt/evidence path was already
817+
> event-backed (`test_run_receipt.py` writes real RunStore events;
818+
> `test_real_run_receipt_completeness_from_plan` validates a real run). The
819+
> direct `RunEvidenceBundle(...)` constructions in the suite are legitimate
820+
> downstream-consumer/checker unit tests, not gap-masking fixtures. The
821+
> anticipated real-path gaps did not materialize. The FOLD-T001 parity test was
822+
> re-anchored against `_assemble_evidence_bundle` so it remains non-circular
823+
> after the cutover.
809824
810825
- Goal: build receipts from the folded evidence and retire synthetic
811826
receipt-only fixtures that mask real-path gaps.

teaagent/run_evidence.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -798,7 +798,17 @@ def build_run_evidence_bundle(
798798
except FileNotFoundError:
799799
return RunEvidenceBundle(run_id=run_id, goal_id=goal_id)
800800

801-
return _assemble_evidence_bundle(events, root=root, run_id=run_id, goal_id=goal_id)
801+
# M6 FOLD-T002 cutover: production evidence is now derived from the TYPED
802+
# event stream, not raw audit dicts. Every evidence-bearing audit event is
803+
# typed in RunEventType (M2 + M3 + M5), so read_run_events_from_audit is
804+
# lossless here; events whose type is not in the taxonomy are not read by any
805+
# extractor anyway. The legacy raw-dict assembly is no longer the production
806+
# path — it survives only as the shared _assemble_evidence_bundle helper that
807+
# the fold also uses, so the two cannot diverge.
808+
from teaagent.runner._events import read_run_events_from_audit
809+
810+
typed = read_run_events_from_audit(events)
811+
return build_evidence_from_events(typed, root=root, run_id=run_id, goal_id=goal_id)
802812

803813

804814
def build_evidence_from_events(

tests/test_run_evidence.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -322,17 +322,27 @@ def _write_run(root: str, run_id: str, events: list[dict]) -> None:
322322

323323

324324
def _assert_fold_matches_legacy(events: list[dict], run_id: str) -> RunEvidenceBundle:
325-
"""Legacy bundle (RunStore source) must equal the typed-stream fold."""
326-
from teaagent.run_evidence import build_evidence_from_events
327-
from teaagent.run_store import RunStore
325+
"""The typed-stream fold must equal the raw-audit-dict assembly.
326+
327+
Baselines against ``_assemble_evidence_bundle`` (the raw-dict path) rather
328+
than ``build_run_evidence_bundle`` — after the M6 FOLD-T002 cutover the
329+
public builder itself folds through the typed reader, so comparing it to the
330+
fold would be circular. The invariant that matters is that routing raw audit
331+
dicts through ``read_run_events_from_audit`` (typed reader) loses no evidence
332+
versus assembling directly from those dicts.
333+
"""
334+
from teaagent.run_evidence import (
335+
_assemble_evidence_bundle,
336+
build_evidence_from_events,
337+
)
328338
from teaagent.runner._events import read_run_events_from_audit
329339

330340
with tempfile.TemporaryDirectory() as root:
331-
_write_run(root, run_id, events)
332-
333-
legacy = build_run_evidence_bundle(root, run_id)
341+
# Raw-dict assembly (pre-cutover production path) is the baseline.
342+
legacy = _assemble_evidence_bundle(events, root=root, run_id=run_id)
334343

335-
typed = read_run_events_from_audit(RunStore(root).show_run(run_id))
344+
# Typed-stream fold (current production path) must match it.
345+
typed = read_run_events_from_audit(events)
336346
folded = build_evidence_from_events(typed, root=root, run_id=run_id)
337347

338348
assert folded.to_dict() == legacy.to_dict()

0 commit comments

Comments
 (0)