feat: ADR 0032 M6 FOLD-T001 — evidence bundle fold over the typed event stream

johnteee · claude · johnteee · commit 5f32598d720c · 2026-06-14T00:56:21.000+08:00
Adds build_evidence_from_events() as a parallel read-side builder that folds the
typed RunEvent stream (M2-T001 reader) into a RunEvidenceBundle. It shares a new
_assemble_evidence_bundle() helper with the legacy build_run_evidence_bundle, so
the two paths cannot drift — only the event SOURCE differs (RunStore dicts vs.
typed stream). Legacy stays the default; no fallback flag (Q1); parity is proven
by test.

Structural finding + fix: the typed RunEvent was LOSSY for evidence — it dropped
the top-level created_at that the extractors thread into command/test/approval
timestamps, so a faithful fold was impossible. Added optional RunEvent.created_at
(intrinsic event metadata), populated by read_run_events_from_audit from the
audit entry. Live emit path leaves it None (the fold reads persisted audit).
Backward-compatible: all RunEvent construction sites use keyword args.

- teaagent/runner/_events.py: RunEvent.created_at; reader populates it.
- teaagent/run_evidence.py: build_evidence_from_events + _assemble_evidence_bundle
  refactor (legacy body unchanged, now shared).
- tests/test_run_evidence.py: test_m6_fold_equals_legacy_{success,failure,pending}
  (non-vacuous: success asserts commands+approvals present) + a created_at
  preservation test.
- Plan §7 M6 row + FOLD-T001 ticket marked DONE; FOLD-T002 cutover pending.

Every evidence event type is typed (M2+M3+M5), so the typed reader is lossless;
fold bundle == legacy bundle on success/failure/pending fixtures. Cancelled
parity deferred until run_cancelled is spine-emitted (documented gap).

Constraint: parallel builder only; legacy stays default; no behavior change; cancelled-fixture parity is a documented gap, not masked.
Tested: tests/test_run_evidence.py + tests/lifecycle/test_run_event_spine.py 37 passed; receipt + evidence-summary + adversarial acceptance 28 passed; mypy clean on changed files.
Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox).
Confidence: high
Roadmap-Status: unchanged
Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/docs/generated/docs-inventory.md b/docs/generated/docs-inventory.md
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
 | `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
 | `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
 | `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
-| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 54690 | `4a6ca4a1b9b6` |
+| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 56141 | `37d1576baf5a` |
 | `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
 | `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
 | `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |
diff --git a/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md b/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md
@@ -178,7 +178,7 @@ consumers by M6.
 | ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
 | ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
 | ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
-| ADR-0032-M6 (was M2 fold; corrected scope A) | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. |
+| ADR-0032-M6 (was M2 fold; corrected scope A) — **FOLD-T001 DONE** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. **FOLD-T001 landed**: `build_evidence_from_events()` is a parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted on success/failure/pending fixtures (`tests/test_run_evidence.py::test_m6_fold_*`). Surfaced + fixed a structural gap: the typed `RunEvent` was **lossy** — it dropped the top-level `created_at` that the extractors thread into command/test/approval timestamps; added `RunEvent.created_at` (optional; reader populates it from audit). Legacy stays default. **FOLD-T002 (cutover: switch receipt/evidence default to the fold + retire synthetic fixtures) PENDING** — the behavior-changing slice. |
 | ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
 
 ## 8. Task Plan
@@ -768,7 +768,18 @@ commit once Slice A is green.
 > orphaned-eventing **cleanup** phase = new **M7**. They run LAST, after the
 > evidence/receipt fold below.
 
-### ADR32-FOLD-T001: Evidence Bundle Fold (new M6; was ADR32-M2-T002, corrected scope A)
+### ADR32-FOLD-T001: Evidence Bundle Fold (new M6; was ADR32-M2-T002, corrected scope A) [DONE]
+
+> **DONE (2026-06-13).** `build_evidence_from_events()` added in
+> `teaagent/run_evidence.py`, sharing `_assemble_evidence_bundle` with the legacy
+> `build_run_evidence_bundle` so they cannot drift (only the event *source*
+> differs). Parity asserted on success/failure/pending fixtures
+> (`tests/test_run_evidence.py::test_m6_fold_equals_legacy_*`). Finding: the
+> typed `RunEvent` was lossy (dropped top-level `created_at`, which extractors
+> thread into command/test/approval timestamps) — fixed by adding optional
+> `RunEvent.created_at`, populated by the reader. Legacy stays default; no
+> fallback flag (Q1). Cancelled-fixture parity deferred until `run_cancelled`
+> is spine-emitted (documented gap, not a silent pass).
 
 - Goal: add `build_evidence_from_events()` that folds the **full** typed event
   stream (M0 lifecycle + M2-coverage events) into the existing
diff --git a/teaagent/run_evidence.py b/teaagent/run_evidence.py
@@ -9,12 +9,15 @@
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
 from pathlib import Path
-from typing import Any, Optional
+from typing import TYPE_CHECKING, Any, Optional
 
 from teaagent.asset_provenance import ProvenanceRecord
 from teaagent.proof_of_use import ProofOfUseBundle, build_proof_of_use
 from teaagent.run_store import RunStore
 
+if TYPE_CHECKING:
+    from teaagent.runner._events import RunEvent
+
 
 @dataclass
 class CommandEvidence:
@@ -795,6 +798,68 @@ def build_run_evidence_bundle(
     except FileNotFoundError:
         return RunEvidenceBundle(run_id=run_id, goal_id=goal_id)
 
+    return _assemble_evidence_bundle(events, root=root, run_id=run_id, goal_id=goal_id)
+
+
+def build_evidence_from_events(
+    events: list['RunEvent'],
+    *,
+    root: str | Path,
+    run_id: str,
+    goal_id: str = '',
+) -> RunEvidenceBundle:
+    """Fold a typed ``RunEvent`` stream into a :class:`RunEvidenceBundle`.
+
+    ADR 0032 M6 (FOLD-T001): the read-side counterpart to
+    :func:`build_run_evidence_bundle`. Where the legacy builder sources raw
+    audit dicts directly from :class:`RunStore`, this builder folds the **typed**
+    event stream produced by ``read_run_events_from_audit`` /
+    ``read_run_events_from_jsonl`` (M2-T001 reader). Because every evidence-
+    bearing audit event is now typed in ``RunEventType`` (M2 + M3 + M5), the
+    typed reader is lossless for evidence, so this bundle equals the legacy
+    bundle for the same run.
+
+    This is a **parallel** builder: the legacy path stays the default (no
+    fallback flag, ADR 0032 Q1); parity is asserted by test, not a runtime
+    switch. The two builders share :func:`_assemble_evidence_bundle`, so they
+    cannot drift — the only difference is the event *source*.
+
+    Args:
+        events: Typed run events (from the M2-T001 reader over persisted audit).
+        root: Workspace root directory.
+        run_id: Run identifier.
+        goal_id: Optional goal identifier to link this run to a GoalRecord.
+    """
+    from teaagent.runner._events import run_event_to_audit_event_type
+
+    event_dicts: list[dict[str, Any]] = [
+        {
+            'event_type': run_event_to_audit_event_type(e.type),
+            'run_id': e.run_id,
+            'payload': dict(e.payload),
+            'created_at': e.created_at,
+        }
+        for e in events
+    ]
+    return _assemble_evidence_bundle(
+        event_dicts, root=root, run_id=run_id, goal_id=goal_id
+    )
+
+
+def _assemble_evidence_bundle(
+    events: list[dict[str, Any]],
+    *,
+    root: str | Path,
+    run_id: str,
+    goal_id: str = '',
+) -> RunEvidenceBundle:
+    """Assemble a :class:`RunEvidenceBundle` from raw audit-event dicts.
+
+    Shared by :func:`build_run_evidence_bundle` (legacy/RunStore source) and
+    :func:`build_evidence_from_events` (typed-stream fold) so the two paths
+    cannot diverge. Pure over the supplied ``events`` plus on-disk artifacts
+    (undo journal, context health) keyed by ``root``/``run_id``.
+    """
     commands = extract_commands_run(events)
     tests = extract_tests(events)
     approvals = extract_approvals(events)
diff --git a/teaagent/runner/_events.py b/teaagent/runner/_events.py
@@ -95,12 +95,20 @@ class RunEvent:
 
     Each event carries a type, run identifier, monotonic sequence number,
     and typed payload (mapping of event-specific data).
+
+    ``created_at`` is the originating audit entry's ISO-8601 timestamp when the
+    event is read back from persisted audit (M6 fold); it is ``None`` for events
+    freshly emitted on the live spine (the in-process bus does not stamp time —
+    seq is the live ordering key). It is intrinsic, load-bearing evidence
+    metadata: the evidence fold threads it into command/test/approval timestamps,
+    so the typed stream must not drop it.
     """
 
     type: RunEventType
     run_id: str
     payload: Mapping[str, Any]
     seq: int
+    created_at: str | None = None
 
 
 # Subscriber protocol aliases for type hints
@@ -311,12 +319,15 @@ def read_run_events_from_audit(
         run_id = entry.get('run_id', '')
         payload = entry.get('payload', {})
 
-        # Construct a RunEvent with the mapped type, run_id, payload, and seq.
+        # Construct a RunEvent with the mapped type, run_id, payload, seq, and
+        # the originating audit timestamp (load-bearing evidence metadata; the
+        # M6 fold threads created_at into command/test/approval timestamps).
         event = RunEvent(
             type=run_event_type,
             run_id=run_id,
             payload=payload,
             seq=seq,
+            created_at=entry.get('created_at'),
         )
         events.append(event)
 
diff --git a/tests/test_run_evidence.py b/tests/test_run_evidence.py
@@ -303,3 +303,191 @@ def test_redaction_config_build_patterns():
     cfg_full = RedactionConfig()
     patterns_full = cfg_full.build_patterns()
     assert len(patterns_full) > 0
+
+
+# ---------------------------------------------------------------------------
+# ADR 0032 M6 (FOLD-T001): evidence-bundle fold over the typed event stream
+# ---------------------------------------------------------------------------
+
+
+def _write_run(root: str, run_id: str, events: list[dict]) -> None:
+    """Persist raw audit-event dicts as a RunStore JSONL for the run."""
+    import json
+
+    from teaagent.run_store import RunStore
+
+    path = RunStore(root).run_path(run_id)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text('\n'.join(json.dumps(e) for e in events) + '\n', encoding='utf-8')
+
+
+def _assert_fold_matches_legacy(events: list[dict], run_id: str) -> RunEvidenceBundle:
+    """Legacy bundle (RunStore source) must equal the typed-stream fold."""
+    from teaagent.run_evidence import build_evidence_from_events
+    from teaagent.run_store import RunStore
+    from teaagent.runner._events import read_run_events_from_audit
+
+    with tempfile.TemporaryDirectory() as root:
+        _write_run(root, run_id, events)
+
+        legacy = build_run_evidence_bundle(root, run_id)
+
+        typed = read_run_events_from_audit(RunStore(root).show_run(run_id))
+        folded = build_evidence_from_events(typed, root=root, run_id=run_id)
+
+        assert folded.to_dict() == legacy.to_dict()
+    return legacy
+
+
+def test_m6_fold_equals_legacy_on_success_run():
+    """Success run with commands, tests, and an approval folds losslessly."""
+    events = [
+        {
+            'event_type': 'run_started',
+            'run_id': 'r-ok',
+            'payload': {},
+            'created_at': '2026-06-13T10:00:00+00:00',
+        },
+        {
+            'event_type': 'tool_use',
+            'run_id': 'r-ok',
+            'payload': {
+                'tool_name': 'workspace_run_shell',
+                'input': {'command': 'pytest -q'},
+                'call_id': 'c1',
+            },
+            'created_at': '2026-06-13T10:00:01+00:00',
+        },
+        {
+            'event_type': 'tool_call_completed',
+            'run_id': 'r-ok',
+            'payload': {
+                'tool_name': 'workspace_run_shell',
+                'call_id': 'c1',
+                'result': {'exit_code': 0, 'stdout': 'ok'},
+            },
+            'created_at': '2026-06-13T10:00:02+00:00',
+        },
+        {
+            'event_type': 'test_run',
+            'run_id': 'r-ok',
+            'payload': {
+                'test_name': 'unit',
+                'test_file': 'tests/test_x.py',
+                'passed': True,
+            },
+            'created_at': '2026-06-13T10:00:03+00:00',
+        },
+        {
+            'event_type': 'approval_requested',
+            'run_id': 'r-ok',
+            'payload': {'call_id': 'c2', 'tool_name': 'workspace_write_file'},
+            'created_at': '2026-06-13T10:00:04+00:00',
+        },
+        {
+            'event_type': 'approval_granted',
+            'run_id': 'r-ok',
+            'payload': {
+                'call_id': 'c2',
+                'tool_name': 'workspace_write_file',
+                'authority_type': 'jit_prompt',
+                'approved_by': 'user',
+            },
+            'created_at': '2026-06-13T10:00:05+00:00',
+        },
+        {
+            'event_type': 'run_completed',
+            'run_id': 'r-ok',
+            'payload': {'cost_cents': 1.0},
+            'created_at': '2026-06-13T10:00:06+00:00',
+        },
+    ]
+    bundle = _assert_fold_matches_legacy(events, 'r-ok')
+    # Guard against a vacuous pass: the fixture must actually carry evidence.
+    assert bundle.commands_run and bundle.approvals
+
+
+def test_m6_fold_equals_legacy_on_failure_run():
+    """Failed run with a tool error folds losslessly."""
+    events = [
+        {
+            'event_type': 'run_started',
+            'run_id': 'r-fail',
+            'payload': {},
+            'created_at': '2026-06-13T11:00:00+00:00',
+        },
+        {
+            'event_type': 'tool_use',
+            'run_id': 'r-fail',
+            'payload': {
+                'tool_name': 'workspace_run_shell',
+                'input': {'command': 'make build'},
+                'call_id': 'c1',
+            },
+            'created_at': '2026-06-13T11:00:01+00:00',
+        },
+        {
+            'event_type': 'tool_error',
+            'run_id': 'r-fail',
+            'payload': {
+                'tool_name': 'workspace_run_shell',
+                'call_id': 'c1',
+                'error': 'boom',
+            },
+            'created_at': '2026-06-13T11:00:02+00:00',
+        },
+        {
+            'event_type': 'run_failed',
+            'run_id': 'r-fail',
+            'payload': {'error': 'build failed'},
+            'created_at': '2026-06-13T11:00:03+00:00',
+        },
+    ]
+    _assert_fold_matches_legacy(events, 'r-fail')
+
+
+def test_m6_fold_equals_legacy_on_pending_approval_run():
+    """Pending-approval run (requested, not resolved) folds losslessly."""
+    events = [
+        {
+            'event_type': 'run_started',
+            'run_id': 'r-pend',
+            'payload': {},
+            'created_at': '2026-06-13T12:00:00+00:00',
+        },
+        {
+            'event_type': 'tool_call_pending_approval',
+            'run_id': 'r-pend',
+            'payload': {'call_id': 'c1', 'tool_name': 'workspace_write_file'},
+            'created_at': '2026-06-13T12:00:01+00:00',
+        },
+    ]
+    _assert_fold_matches_legacy(events, 'r-pend')
+
+
+def test_m6_fold_preserves_created_at_in_command_timestamps():
+    """The typed RunEvent carries created_at, so folded command timestamps
+    match the legacy (audit-sourced) timestamps rather than collapsing to None.
+    """
+    from teaagent.run_evidence import build_evidence_from_events
+    from teaagent.run_store import RunStore
+    from teaagent.runner._events import read_run_events_from_audit
+
+    events = [
+        {
+            'event_type': 'tool_use',
+            'run_id': 'r-ts',
+            'payload': {
+                'tool_name': 'workspace_run_shell',
+                'input': {'command': 'echo hi'},
+                'call_id': 'c1',
+            },
+            'created_at': '2026-06-13T13:00:01+00:00',
+        },
+    ]
+    with tempfile.TemporaryDirectory() as root:
+        _write_run(root, 'r-ts', events)
+        typed = read_run_events_from_audit(RunStore(root).show_run('r-ts'))
+        assert typed[0].created_at == '2026-06-13T13:00:01+00:00'
+        folded = build_evidence_from_events(typed, root=root, run_id='r-ts')
+        assert folded.commands_run[0].timestamp == '2026-06-13T13:00:01+00:00'