Skip to content

Commit 5f32598

Browse files
johnteeeclaude
andcommitted
feat: ADR 0032 M6 FOLD-T001 — evidence bundle fold over the typed event stream
Adds build_evidence_from_events() as a parallel read-side builder that folds the typed RunEvent stream (M2-T001 reader) into a RunEvidenceBundle. It shares a new _assemble_evidence_bundle() helper with the legacy build_run_evidence_bundle, so the two paths cannot drift — only the event SOURCE differs (RunStore dicts vs. typed stream). Legacy stays the default; no fallback flag (Q1); parity is proven by test. Structural finding + fix: the typed RunEvent was LOSSY for evidence — it dropped the top-level created_at that the extractors thread into command/test/approval timestamps, so a faithful fold was impossible. Added optional RunEvent.created_at (intrinsic event metadata), populated by read_run_events_from_audit from the audit entry. Live emit path leaves it None (the fold reads persisted audit). Backward-compatible: all RunEvent construction sites use keyword args. - teaagent/runner/_events.py: RunEvent.created_at; reader populates it. - teaagent/run_evidence.py: build_evidence_from_events + _assemble_evidence_bundle refactor (legacy body unchanged, now shared). - tests/test_run_evidence.py: test_m6_fold_equals_legacy_{success,failure,pending} (non-vacuous: success asserts commands+approvals present) + a created_at preservation test. - Plan §7 M6 row + FOLD-T001 ticket marked DONE; FOLD-T002 cutover pending. Every evidence event type is typed (M2+M3+M5), so the typed reader is lossless; fold bundle == legacy bundle on success/failure/pending fixtures. Cancelled parity deferred until run_cancelled is spine-emitted (documented gap). Constraint: parallel builder only; legacy stays default; no behavior change; cancelled-fixture parity is a documented gap, not masked. Tested: tests/test_run_evidence.py + tests/lifecycle/test_run_event_spine.py 37 passed; receipt + evidence-summary + adversarial acceptance 28 passed; mypy clean on changed files. Not-tested: full suite not run on 3.12 (hypothesis missing in 3.14 sandbox). Confidence: high Roadmap-Status: unchanged Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent be4209a commit 5f32598

5 files changed

Lines changed: 280 additions & 5 deletions

File tree

docs/generated/docs-inventory.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
414414
| `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
415415
| `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
416416
| `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
417-
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 54690 | `4a6ca4a1b9b6` |
417+
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 56141 | `37d1576baf5a` |
418418
| `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
419419
| `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
420420
| `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |

docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ consumers by M6.
178178
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
179179
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
180180
| ADR-0032-M5 (REVISED — observability-only, 2026-06-13) | **Hook OBSERVABILITY folds onto the spine; hook EXECUTION stays in the tool-dispatch layer.** Assessment found the planned "HookRegistry on spine" unsuitable for the same runtime-coupling reason as approval/budget: PreToolUse/PostToolUse run in `teaagent/tools.py::execute` and **mutate in-flight `arguments`/`result`** (the spine has no channel to ferry mutated payloads back to the dispatch site), and the 6 session-lifecycle hooks (SessionStart/End, UserPromptSubmit, PreCompact, Stop, SubagentStop) have **no production caller** — nothing to strangle; wiring them is feature work. Done: the 5 dispatch-layer hook audit events (`tool_hook_pre_mutation`, `tool_hook_pre_mutation_blocked`, `tool_hook_vetoed`, `tool_hook_post_mutation`, `tool_hook_post_failed`) are typed in `RunEventType` + mapped both directions, so the M2-T001 reader surfaces hook veto/mutation activity from the audit JSONL for the M6 fold. Mapping/reader only; audit bytes unchanged; hook execution + mutation semantics unchanged. See `docs/work-log/m5-hooks-observability-only-2026-06-13.md`. |
181-
| ADR-0032-M6 (was M2 fold; corrected scope A) | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. |
181+
| ADR-0032-M6 (was M2 fold; corrected scope A) — **FOLD-T001 DONE** | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. **FOLD-T001 landed**: `build_evidence_from_events()` is a parallel builder sharing `_assemble_evidence_bundle` with the legacy path (cannot drift; only the event *source* differs), parity-asserted on success/failure/pending fixtures (`tests/test_run_evidence.py::test_m6_fold_*`). Surfaced + fixed a structural gap: the typed `RunEvent` was **lossy** — it dropped the top-level `created_at` that the extractors thread into command/test/approval timestamps; added `RunEvent.created_at` (optional; reader populates it from audit). Legacy stays default. **FOLD-T002 (cutover: switch receipt/evidence default to the fold + retire synthetic fixtures) PENDING** — the behavior-changing slice. |
182182
| ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
183183

184184
## 8. Task Plan
@@ -768,7 +768,18 @@ commit once Slice A is green.
768768
> orphaned-eventing **cleanup** phase = new **M7**. They run LAST, after the
769769
> evidence/receipt fold below.
770770
771-
### ADR32-FOLD-T001: Evidence Bundle Fold (new M6; was ADR32-M2-T002, corrected scope A)
771+
### ADR32-FOLD-T001: Evidence Bundle Fold (new M6; was ADR32-M2-T002, corrected scope A) [DONE]
772+
773+
> **DONE (2026-06-13).** `build_evidence_from_events()` added in
774+
> `teaagent/run_evidence.py`, sharing `_assemble_evidence_bundle` with the legacy
775+
> `build_run_evidence_bundle` so they cannot drift (only the event *source*
776+
> differs). Parity asserted on success/failure/pending fixtures
777+
> (`tests/test_run_evidence.py::test_m6_fold_equals_legacy_*`). Finding: the
778+
> typed `RunEvent` was lossy (dropped top-level `created_at`, which extractors
779+
> thread into command/test/approval timestamps) — fixed by adding optional
780+
> `RunEvent.created_at`, populated by the reader. Legacy stays default; no
781+
> fallback flag (Q1). Cancelled-fixture parity deferred until `run_cancelled`
782+
> is spine-emitted (documented gap, not a silent pass).
772783
773784
- Goal: add `build_evidence_from_events()` that folds the **full** typed event
774785
stream (M0 lifecycle + M2-coverage events) into the existing

teaagent/run_evidence.py

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,15 @@
99
from dataclasses import dataclass, field
1010
from datetime import datetime, timezone
1111
from pathlib import Path
12-
from typing import Any, Optional
12+
from typing import TYPE_CHECKING, Any, Optional
1313

1414
from teaagent.asset_provenance import ProvenanceRecord
1515
from teaagent.proof_of_use import ProofOfUseBundle, build_proof_of_use
1616
from teaagent.run_store import RunStore
1717

18+
if TYPE_CHECKING:
19+
from teaagent.runner._events import RunEvent
20+
1821

1922
@dataclass
2023
class CommandEvidence:
@@ -795,6 +798,68 @@ def build_run_evidence_bundle(
795798
except FileNotFoundError:
796799
return RunEvidenceBundle(run_id=run_id, goal_id=goal_id)
797800

801+
return _assemble_evidence_bundle(events, root=root, run_id=run_id, goal_id=goal_id)
802+
803+
804+
def build_evidence_from_events(
805+
events: list['RunEvent'],
806+
*,
807+
root: str | Path,
808+
run_id: str,
809+
goal_id: str = '',
810+
) -> RunEvidenceBundle:
811+
"""Fold a typed ``RunEvent`` stream into a :class:`RunEvidenceBundle`.
812+
813+
ADR 0032 M6 (FOLD-T001): the read-side counterpart to
814+
:func:`build_run_evidence_bundle`. Where the legacy builder sources raw
815+
audit dicts directly from :class:`RunStore`, this builder folds the **typed**
816+
event stream produced by ``read_run_events_from_audit`` /
817+
``read_run_events_from_jsonl`` (M2-T001 reader). Because every evidence-
818+
bearing audit event is now typed in ``RunEventType`` (M2 + M3 + M5), the
819+
typed reader is lossless for evidence, so this bundle equals the legacy
820+
bundle for the same run.
821+
822+
This is a **parallel** builder: the legacy path stays the default (no
823+
fallback flag, ADR 0032 Q1); parity is asserted by test, not a runtime
824+
switch. The two builders share :func:`_assemble_evidence_bundle`, so they
825+
cannot drift — the only difference is the event *source*.
826+
827+
Args:
828+
events: Typed run events (from the M2-T001 reader over persisted audit).
829+
root: Workspace root directory.
830+
run_id: Run identifier.
831+
goal_id: Optional goal identifier to link this run to a GoalRecord.
832+
"""
833+
from teaagent.runner._events import run_event_to_audit_event_type
834+
835+
event_dicts: list[dict[str, Any]] = [
836+
{
837+
'event_type': run_event_to_audit_event_type(e.type),
838+
'run_id': e.run_id,
839+
'payload': dict(e.payload),
840+
'created_at': e.created_at,
841+
}
842+
for e in events
843+
]
844+
return _assemble_evidence_bundle(
845+
event_dicts, root=root, run_id=run_id, goal_id=goal_id
846+
)
847+
848+
849+
def _assemble_evidence_bundle(
850+
events: list[dict[str, Any]],
851+
*,
852+
root: str | Path,
853+
run_id: str,
854+
goal_id: str = '',
855+
) -> RunEvidenceBundle:
856+
"""Assemble a :class:`RunEvidenceBundle` from raw audit-event dicts.
857+
858+
Shared by :func:`build_run_evidence_bundle` (legacy/RunStore source) and
859+
:func:`build_evidence_from_events` (typed-stream fold) so the two paths
860+
cannot diverge. Pure over the supplied ``events`` plus on-disk artifacts
861+
(undo journal, context health) keyed by ``root``/``run_id``.
862+
"""
798863
commands = extract_commands_run(events)
799864
tests = extract_tests(events)
800865
approvals = extract_approvals(events)

teaagent/runner/_events.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,12 +95,20 @@ class RunEvent:
9595
9696
Each event carries a type, run identifier, monotonic sequence number,
9797
and typed payload (mapping of event-specific data).
98+
99+
``created_at`` is the originating audit entry's ISO-8601 timestamp when the
100+
event is read back from persisted audit (M6 fold); it is ``None`` for events
101+
freshly emitted on the live spine (the in-process bus does not stamp time —
102+
seq is the live ordering key). It is intrinsic, load-bearing evidence
103+
metadata: the evidence fold threads it into command/test/approval timestamps,
104+
so the typed stream must not drop it.
98105
"""
99106

100107
type: RunEventType
101108
run_id: str
102109
payload: Mapping[str, Any]
103110
seq: int
111+
created_at: str | None = None
104112

105113

106114
# Subscriber protocol aliases for type hints
@@ -311,12 +319,15 @@ def read_run_events_from_audit(
311319
run_id = entry.get('run_id', '')
312320
payload = entry.get('payload', {})
313321

314-
# Construct a RunEvent with the mapped type, run_id, payload, and seq.
322+
# Construct a RunEvent with the mapped type, run_id, payload, seq, and
323+
# the originating audit timestamp (load-bearing evidence metadata; the
324+
# M6 fold threads created_at into command/test/approval timestamps).
315325
event = RunEvent(
316326
type=run_event_type,
317327
run_id=run_id,
318328
payload=payload,
319329
seq=seq,
330+
created_at=entry.get('created_at'),
320331
)
321332
events.append(event)
322333

tests/test_run_evidence.py

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,3 +303,191 @@ def test_redaction_config_build_patterns():
303303
cfg_full = RedactionConfig()
304304
patterns_full = cfg_full.build_patterns()
305305
assert len(patterns_full) > 0
306+
307+
308+
# ---------------------------------------------------------------------------
309+
# ADR 0032 M6 (FOLD-T001): evidence-bundle fold over the typed event stream
310+
# ---------------------------------------------------------------------------
311+
312+
313+
def _write_run(root: str, run_id: str, events: list[dict]) -> None:
314+
"""Persist raw audit-event dicts as a RunStore JSONL for the run."""
315+
import json
316+
317+
from teaagent.run_store import RunStore
318+
319+
path = RunStore(root).run_path(run_id)
320+
path.parent.mkdir(parents=True, exist_ok=True)
321+
path.write_text('\n'.join(json.dumps(e) for e in events) + '\n', encoding='utf-8')
322+
323+
324+
def _assert_fold_matches_legacy(events: list[dict], run_id: str) -> RunEvidenceBundle:
325+
"""Legacy bundle (RunStore source) must equal the typed-stream fold."""
326+
from teaagent.run_evidence import build_evidence_from_events
327+
from teaagent.run_store import RunStore
328+
from teaagent.runner._events import read_run_events_from_audit
329+
330+
with tempfile.TemporaryDirectory() as root:
331+
_write_run(root, run_id, events)
332+
333+
legacy = build_run_evidence_bundle(root, run_id)
334+
335+
typed = read_run_events_from_audit(RunStore(root).show_run(run_id))
336+
folded = build_evidence_from_events(typed, root=root, run_id=run_id)
337+
338+
assert folded.to_dict() == legacy.to_dict()
339+
return legacy
340+
341+
342+
def test_m6_fold_equals_legacy_on_success_run():
343+
"""Success run with commands, tests, and an approval folds losslessly."""
344+
events = [
345+
{
346+
'event_type': 'run_started',
347+
'run_id': 'r-ok',
348+
'payload': {},
349+
'created_at': '2026-06-13T10:00:00+00:00',
350+
},
351+
{
352+
'event_type': 'tool_use',
353+
'run_id': 'r-ok',
354+
'payload': {
355+
'tool_name': 'workspace_run_shell',
356+
'input': {'command': 'pytest -q'},
357+
'call_id': 'c1',
358+
},
359+
'created_at': '2026-06-13T10:00:01+00:00',
360+
},
361+
{
362+
'event_type': 'tool_call_completed',
363+
'run_id': 'r-ok',
364+
'payload': {
365+
'tool_name': 'workspace_run_shell',
366+
'call_id': 'c1',
367+
'result': {'exit_code': 0, 'stdout': 'ok'},
368+
},
369+
'created_at': '2026-06-13T10:00:02+00:00',
370+
},
371+
{
372+
'event_type': 'test_run',
373+
'run_id': 'r-ok',
374+
'payload': {
375+
'test_name': 'unit',
376+
'test_file': 'tests/test_x.py',
377+
'passed': True,
378+
},
379+
'created_at': '2026-06-13T10:00:03+00:00',
380+
},
381+
{
382+
'event_type': 'approval_requested',
383+
'run_id': 'r-ok',
384+
'payload': {'call_id': 'c2', 'tool_name': 'workspace_write_file'},
385+
'created_at': '2026-06-13T10:00:04+00:00',
386+
},
387+
{
388+
'event_type': 'approval_granted',
389+
'run_id': 'r-ok',
390+
'payload': {
391+
'call_id': 'c2',
392+
'tool_name': 'workspace_write_file',
393+
'authority_type': 'jit_prompt',
394+
'approved_by': 'user',
395+
},
396+
'created_at': '2026-06-13T10:00:05+00:00',
397+
},
398+
{
399+
'event_type': 'run_completed',
400+
'run_id': 'r-ok',
401+
'payload': {'cost_cents': 1.0},
402+
'created_at': '2026-06-13T10:00:06+00:00',
403+
},
404+
]
405+
bundle = _assert_fold_matches_legacy(events, 'r-ok')
406+
# Guard against a vacuous pass: the fixture must actually carry evidence.
407+
assert bundle.commands_run and bundle.approvals
408+
409+
410+
def test_m6_fold_equals_legacy_on_failure_run():
411+
"""Failed run with a tool error folds losslessly."""
412+
events = [
413+
{
414+
'event_type': 'run_started',
415+
'run_id': 'r-fail',
416+
'payload': {},
417+
'created_at': '2026-06-13T11:00:00+00:00',
418+
},
419+
{
420+
'event_type': 'tool_use',
421+
'run_id': 'r-fail',
422+
'payload': {
423+
'tool_name': 'workspace_run_shell',
424+
'input': {'command': 'make build'},
425+
'call_id': 'c1',
426+
},
427+
'created_at': '2026-06-13T11:00:01+00:00',
428+
},
429+
{
430+
'event_type': 'tool_error',
431+
'run_id': 'r-fail',
432+
'payload': {
433+
'tool_name': 'workspace_run_shell',
434+
'call_id': 'c1',
435+
'error': 'boom',
436+
},
437+
'created_at': '2026-06-13T11:00:02+00:00',
438+
},
439+
{
440+
'event_type': 'run_failed',
441+
'run_id': 'r-fail',
442+
'payload': {'error': 'build failed'},
443+
'created_at': '2026-06-13T11:00:03+00:00',
444+
},
445+
]
446+
_assert_fold_matches_legacy(events, 'r-fail')
447+
448+
449+
def test_m6_fold_equals_legacy_on_pending_approval_run():
450+
"""Pending-approval run (requested, not resolved) folds losslessly."""
451+
events = [
452+
{
453+
'event_type': 'run_started',
454+
'run_id': 'r-pend',
455+
'payload': {},
456+
'created_at': '2026-06-13T12:00:00+00:00',
457+
},
458+
{
459+
'event_type': 'tool_call_pending_approval',
460+
'run_id': 'r-pend',
461+
'payload': {'call_id': 'c1', 'tool_name': 'workspace_write_file'},
462+
'created_at': '2026-06-13T12:00:01+00:00',
463+
},
464+
]
465+
_assert_fold_matches_legacy(events, 'r-pend')
466+
467+
468+
def test_m6_fold_preserves_created_at_in_command_timestamps():
469+
"""The typed RunEvent carries created_at, so folded command timestamps
470+
match the legacy (audit-sourced) timestamps rather than collapsing to None.
471+
"""
472+
from teaagent.run_evidence import build_evidence_from_events
473+
from teaagent.run_store import RunStore
474+
from teaagent.runner._events import read_run_events_from_audit
475+
476+
events = [
477+
{
478+
'event_type': 'tool_use',
479+
'run_id': 'r-ts',
480+
'payload': {
481+
'tool_name': 'workspace_run_shell',
482+
'input': {'command': 'echo hi'},
483+
'call_id': 'c1',
484+
},
485+
'created_at': '2026-06-13T13:00:01+00:00',
486+
},
487+
]
488+
with tempfile.TemporaryDirectory() as root:
489+
_write_run(root, 'r-ts', events)
490+
typed = read_run_events_from_audit(RunStore(root).show_run('r-ts'))
491+
assert typed[0].created_at == '2026-06-13T13:00:01+00:00'
492+
folded = build_evidence_from_events(typed, root=root, run_id='r-ts')
493+
assert folded.commands_run[0].timestamp == '2026-06-13T13:00:01+00:00'

0 commit comments

Comments
 (0)