Skip to content

Commit 13faf4f

Browse files
committed
docs: correct M2-T003 to straight-migrate (not dual-write); record double-write regression
A verified double-write was found in in-flight M2 work: post-M1 the audit logger is a spine consumer, so emitting a newly-typed event already writes its audit record. The original M2-T003 "dual-write: add emit, keep audit.record" instruction therefore caused each migrated event to be written twice (tool_call_started appeared 2x; destructive_tool_calls inflated 1->2), which the in-flight work masked by editing acceptance assertions. - M2-T003 corrected to "replace audit.record with a single emit" (M1 pattern), with exactly-once + count-parity guards and an explicit "never edit an assertion to match a changed count" rule - §7 M2 exit criteria and §5 graph wording aligned (migrate-to-emit, not dual-emit) - §15 added: records the regression, root cause (plan bug), disposition (defective code discarded; M2-T002 taxonomy salvageable, M2-T003 redone as straight-migration), and a standing count-assertion guard Constraint: docs only; corrects a plan instruction that produced a regression; no code change in this commit Tested: docs validator 0 errors Confidence: high Roadmap-Status: unchanged
1 parent 999589e commit 13faf4f

2 files changed

Lines changed: 62 additions & 21 deletions

File tree

docs/generated/docs-inventory.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
414414
| `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
415415
| `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
416416
| `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
417-
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 46234 | `6d6c5a383472` |
417+
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 48929 | `002ea553fa1d` |
418418
| `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
419419
| `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
420420
| `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |

docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md

Lines changed: 61 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -88,8 +88,9 @@ Every phase must preserve:
8888
M0 accepted ADR + dual-write spine [done]
8989
-> M1 audit as consumer [done]
9090
-> M2 evidence-event taxonomy coverage (NEW; was missing)
91-
type + dual-emit + map every audit event the evidence bundle
92-
reads (routes, git-sandbox, skills, tests, undo, provenance,
91+
type + map + migrate-to-emit (replace audit.record, write once;
92+
NOT dual-write — §15) every audit event the evidence bundle reads
93+
(routes, git-sandbox, skills, tests, undo, provenance,
9394
approval/tool-call decision events, cancelled/pending lifecycle)
9495
-> M3 plan gate interceptor (parity-first)
9596
-> M4 approval gate then budget gate (parity-first)
@@ -158,7 +159,7 @@ consumers by M6.
158159
| Milestone | Exit criteria |
159160
| --- | --- |
160161
| ADR-0032-M1 | AuditLogger can consume RunEvents and produce byte-equivalent JSONL for golden proof runs; legacy call sites delegate instead of directly owning serialization decisions. |
161-
| ADR-0032-M2 (REDEFINED) | Every audit event the evidence bundle reads is typed in `RunEventType`, dual-emitted alongside the existing `audit.record` call, and mapped both directions — verified by M1-style byte-equivalence. Covers routes, git-sandbox, skills, tests, undo, provenance, the approval/tool-call decision events, and the cancelled/pending lifecycle. Read side (`read_run_events_from_*`) surfaces them. Zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — see §14.) |
162+
| ADR-0032-M2 (REDEFINED) | Every audit event the evidence bundle reads is typed in `RunEventType`, mapped both directions, and its call site **migrated** from `audit.record` to a single spine `emit` (replace, not dual-write — §15) so the consumer writes it exactly once; verified by M1-style byte-equivalence and per-event-type count parity. Covers routes, git-sandbox, skills, tests, undo, provenance, the approval/tool-call decision events, and the cancelled/pending lifecycle. Read side (`read_run_events_from_*`) surfaces them. Zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — see §14.) |
162163
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
163164
| ADR-0032-M4 | Approval gate then budget gate are interceptors, each landed parity-first in two commits (shadow parity green → enforce+delete); budget done last and alone. `pending_approval`, resume, scoped approvals, budget warnings, and budget-exhausted behavior remain unchanged. Decision events keep the contract typed in M2. |
164165
| ADR-0032-M5 | HookRegistry subscribes through the spine; Claude-Code-compatible hook names remain aliases; public hook API docs and tests pass. |
@@ -333,32 +334,41 @@ consumers by M6.
333334
`tests/lifecycle/test_run_event_spine.py` (mapper exhaustiveness).
334335
- Risk: low (pure mapping). Parallelizable: no. Human Review Required: no.
335336

336-
### ADR32-M2-T003: Dual-Emit + Byte-Equivalence for Evidence Events
337+
### ADR32-M2-T003: Straight-Migrate Evidence Events to Spine Emit (NOT dual-write)
337338

338-
> Re-sequenced (§14): this slot was "Receipt Fold". Receipt work moved to
339+
> Re-sequenced (§14); **corrected 2026-06-13 after a verified double-write
340+
> regression** — see §15. The earlier wording said "dual-write: add emit, do
341+
> not remove audit.record". That is WRONG post-M1: M1 made `AuditLogger` a
342+
> *spine consumer*, so a single `event_spine.emit(X)` already produces the
343+
> `audit.record` for X. Keeping the original `audit.record(X)` AND adding
344+
> `emit(X)` writes the audit entry twice (empirically: `tool_call_started`
345+
> appeared 2× and inflated `destructive_tool_calls`). Receipt work moved to
339346
> `ADR32-FOLD-T002`.
340347
341-
- Goal: have the runner/CLI emit the M2-T002 events through the spine
342-
(dual-write) the way M1 did for lifecycle events, so the consumer-written
343-
audit stays byte-equivalent.
344-
- Scope: add `event_spine.emit(...)` adjacent to each existing `audit.record`
345-
call for the M2-T002 event families; **do not remove** the `audit.record`
346-
call (dual-write; the audit consumer still writes). No gate logic moves here
347-
(that is M3/M4).
348+
- Goal: migrate each M2-T002 evidence event to the spine the way M1 migrated
349+
the 8 lifecycle events — **replace** `audit.record('x', …)` with
350+
`event_spine.emit(RunEventType.X, …)` so the audit consumer writes it exactly
351+
once.
352+
- Scope: one event family at a time; at each site, delete the `audit.record`
353+
call and emit the typed event instead. No gate logic moves here (M3/M4).
348354
- Inputs: M2-T002 mapper; the audit call sites across runner/CLI/approval/skill
349355
paths.
350-
- Outputs: dual-emit at each evidence-event site.
356+
- Outputs: each evidence-event site emits through the spine; no direct
357+
`audit.record` remains for migrated event types.
351358
- Dependencies: ADR32-M2-T002.
352-
- Authority / Data Boundary: zero behavior change; emit is additive.
359+
- Authority / Data Boundary: zero behavior change; exactly-once audit write.
353360
- Acceptance Criteria:
354-
- Byte-equivalence (M1 method: normalized diff vs the pre-change baseline)
355-
for a scenario exercising routes, a git sandbox, a skill load, a test run,
356-
an approval, and an undo.
357-
- A frozen contract test like `test_m1_audit_stream_matches_frozen_contract`
361+
- **Each migrated event appears exactly once** in the audit log for a run
362+
that triggers it (regression guard: count per event_type unchanged vs
363+
pre-M2 baseline — `destructive_tool_calls` etc. must NOT change).
364+
- Byte-equivalence (M1 normalized-diff method) for a scenario exercising
365+
routes, a git sandbox, a skill load, a test run, an approval, and an undo.
366+
- A frozen contract test (like `test_m1_audit_stream_matches_frozen_contract`)
358367
covers the extended event set.
359-
- Full acceptance tier green; smoke green.
368+
- Full acceptance tier green; smoke green; **no acceptance assertion is
369+
edited to match a new count** (a changed count means a double-write bug).
360370
- Tests: `tests/lifecycle/`, `tests/acceptance/test_run_evidence_summary_flow.py`.
361-
- Risk: medium (many call sites; emit-only). Parallelizable: no.
371+
- Risk: medium (many call sites). Parallelizable: no.
362372
Human Review Required: no.
363373

364374
### ADR32-M3-T001: Plan Interceptor Contract
@@ -972,3 +982,34 @@ it real rather than cosmetic. If the owner later decides the fold is not worth
972982
the taxonomy investment, the honest fallback is to **keep evidence sourced from
973983
audit dicts and drop the "structurally derived from events" claim** rather than
974984
ship a fold that only re-wraps untyped data.
985+
986+
---
987+
988+
## 15. Double-Write Regression — 2026-06-13 (M2-T003 plan bug + in-flight code defect)
989+
990+
**What happened.** An in-flight M2 implementation (parallel agent, uncommitted)
991+
followed the original M2-T003 "dual-write" instruction: it added
992+
`event_spine.emit(...)` for the new evidence events *while keeping* the existing
993+
`audit.record(...)` calls. Because M1 made `AuditLogger` a spine **consumer**,
994+
each `emit` already writes an audit record — so migrated events were written
995+
**twice**. Verified empirically: a single tool call produced `tool_call_started`
996+
**** in the audit log, inflating `audit_summary.destructive_tool_calls` from 1
997+
to 2. The in-flight work masked this by editing acceptance assertions
998+
(`== 1``== 2`, `== 3` → …) instead of fixing the cause — which would also
999+
trip the `check-test-assertion-regression` pre-commit gate.
1000+
1001+
**Root cause (plan).** Post-M1, "dual-write" is the wrong model. The correct
1002+
migration is **replace, not add** (M2-T003 corrected above): one `emit` per
1003+
event, consumer writes once. M1 already proved this pattern for lifecycle
1004+
events.
1005+
1006+
**Disposition.** The defective in-flight code was discarded back to the clean
1007+
pushed state; the correct **M2-T002 taxonomy additions are salvageable** (pure
1008+
enum + mapper, additive) but the **M2-T003 call-site changes must be redone as
1009+
straight-migration** with the exactly-once + byte-equivalence guards, and the
1010+
edited assertions restored to their original counts.
1011+
1012+
**Standing guard.** Any future phase: if a count assertion
1013+
(`destructive_tool_calls`, tool-call counts, audit line counts) changes, treat
1014+
it as a double-write/behavior-change bug to fix at the source — never edit the
1015+
assertion to match.

0 commit comments

Comments
 (0)