docs: correct M2-T003 to straight-migrate (not dual-write); record double-write regression

johnteee · johnteee · commit 13faf4f53fa8 · 2026-06-13T13:35:45.000+08:00
A verified double-write was found in in-flight M2 work: post-M1 the audit
logger is a spine consumer, so emitting a newly-typed event already writes its
audit record. The original M2-T003 "dual-write: add emit, keep audit.record"
instruction therefore caused each migrated event to be written twice
(tool_call_started appeared 2x; destructive_tool_calls inflated 1-&gt;2), which
the in-flight work masked by editing acceptance assertions.

- M2-T003 corrected to "replace audit.record with a single emit" (M1 pattern),
  with exactly-once + count-parity guards and an explicit "never edit an
  assertion to match a changed count" rule
- §7 M2 exit criteria and §5 graph wording aligned (migrate-to-emit, not
  dual-emit)
- §15 added: records the regression, root cause (plan bug), disposition
  (defective code discarded; M2-T002 taxonomy salvageable, M2-T003 redone as
  straight-migration), and a standing count-assertion guard

Constraint: docs only; corrects a plan instruction that produced a regression; no code change in this commit
Tested: docs validator 0 errors
Confidence: high
Roadmap-Status: unchanged
diff --git a/docs/generated/docs-inventory.md b/docs/generated/docs-inventory.md
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
 | `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
 | `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
 | `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
-| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 46234 | `6d6c5a383472` |
+| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 48929 | `002ea553fa1d` |
 | `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
 | `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
 | `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |
diff --git a/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md b/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md
@@ -88,8 +88,9 @@ Every phase must preserve:
 M0 accepted ADR + dual-write spine                                    [done]
   -> M1 audit as consumer                                             [done]
     -> M2 evidence-event taxonomy coverage   (NEW; was missing)
-       type + dual-emit + map every audit event the evidence bundle
-       reads (routes, git-sandbox, skills, tests, undo, provenance,
+       type + map + migrate-to-emit (replace audit.record, write once;
+       NOT dual-write — §15) every audit event the evidence bundle reads
+       (routes, git-sandbox, skills, tests, undo, provenance,
        approval/tool-call decision events, cancelled/pending lifecycle)
       -> M3 plan gate interceptor                  (parity-first)
         -> M4 approval gate then budget gate       (parity-first)
@@ -158,7 +159,7 @@ consumers by M6.
 | Milestone | Exit criteria |
 | --- | --- |
 | ADR-0032-M1 | AuditLogger can consume RunEvents and produce byte-equivalent JSONL for golden proof runs; legacy call sites delegate instead of directly owning serialization decisions. |
-| ADR-0032-M2 (REDEFINED) | Every audit event the evidence bundle reads is typed in `RunEventType`, dual-emitted alongside the existing `audit.record` call, and mapped both directions — verified by M1-style byte-equivalence. Covers routes, git-sandbox, skills, tests, undo, provenance, the approval/tool-call decision events, and the cancelled/pending lifecycle. Read side (`read_run_events_from_*`) surfaces them. Zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — see §14.) |
+| ADR-0032-M2 (REDEFINED) | Every audit event the evidence bundle reads is typed in `RunEventType`, mapped both directions, and its call site **migrated** from `audit.record` to a single spine `emit` (replace, not dual-write — §15) so the consumer writes it exactly once; verified by M1-style byte-equivalence and per-event-type count parity. Covers routes, git-sandbox, skills, tests, undo, provenance, the approval/tool-call decision events, and the cancelled/pending lifecycle. Read side (`read_run_events_from_*`) surfaces them. Zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — see §14.) |
 | ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
 | ADR-0032-M4 | Approval gate then budget gate are interceptors, each landed parity-first in two commits (shadow parity green → enforce+delete); budget done last and alone. `pending_approval`, resume, scoped approvals, budget warnings, and budget-exhausted behavior remain unchanged. Decision events keep the contract typed in M2. |
 | ADR-0032-M5 | HookRegistry subscribes through the spine; Claude-Code-compatible hook names remain aliases; public hook API docs and tests pass. |
@@ -333,32 +334,41 @@ consumers by M6.
   `tests/lifecycle/test_run_event_spine.py` (mapper exhaustiveness).
 - Risk: low (pure mapping). Parallelizable: no. Human Review Required: no.
 
-### ADR32-M2-T003: Dual-Emit + Byte-Equivalence for Evidence Events
+### ADR32-M2-T003: Straight-Migrate Evidence Events to Spine Emit (NOT dual-write)
 
-> Re-sequenced (§14): this slot was "Receipt Fold". Receipt work moved to
+> Re-sequenced (§14); **corrected 2026-06-13 after a verified double-write
+> regression** — see §15. The earlier wording said "dual-write: add emit, do
+> not remove audit.record". That is WRONG post-M1: M1 made `AuditLogger` a
+> *spine consumer*, so a single `event_spine.emit(X)` already produces the
+> `audit.record` for X. Keeping the original `audit.record(X)` AND adding
+> `emit(X)` writes the audit entry twice (empirically: `tool_call_started`
+> appeared 2× and inflated `destructive_tool_calls`). Receipt work moved to
 > `ADR32-FOLD-T002`.
 
-- Goal: have the runner/CLI emit the M2-T002 events through the spine
-  (dual-write) the way M1 did for lifecycle events, so the consumer-written
-  audit stays byte-equivalent.
-- Scope: add `event_spine.emit(...)` adjacent to each existing `audit.record`
-  call for the M2-T002 event families; **do not remove** the `audit.record`
-  call (dual-write; the audit consumer still writes). No gate logic moves here
-  (that is M3/M4).
+- Goal: migrate each M2-T002 evidence event to the spine the way M1 migrated
+  the 8 lifecycle events — **replace** `audit.record('x', …)` with
+  `event_spine.emit(RunEventType.X, …)` so the audit consumer writes it exactly
+  once.
+- Scope: one event family at a time; at each site, delete the `audit.record`
+  call and emit the typed event instead. No gate logic moves here (M3/M4).
 - Inputs: M2-T002 mapper; the audit call sites across runner/CLI/approval/skill
   paths.
-- Outputs: dual-emit at each evidence-event site.
+- Outputs: each evidence-event site emits through the spine; no direct
+  `audit.record` remains for migrated event types.
 - Dependencies: ADR32-M2-T002.
-- Authority / Data Boundary: zero behavior change; emit is additive.
+- Authority / Data Boundary: zero behavior change; exactly-once audit write.
 - Acceptance Criteria:
-  - Byte-equivalence (M1 method: normalized diff vs the pre-change baseline)
-    for a scenario exercising routes, a git sandbox, a skill load, a test run,
-    an approval, and an undo.
-  - A frozen contract test like `test_m1_audit_stream_matches_frozen_contract`
+  - **Each migrated event appears exactly once** in the audit log for a run
+    that triggers it (regression guard: count per event_type unchanged vs
+    pre-M2 baseline — `destructive_tool_calls` etc. must NOT change).
+  - Byte-equivalence (M1 normalized-diff method) for a scenario exercising
+    routes, a git sandbox, a skill load, a test run, an approval, and an undo.
+  - A frozen contract test (like `test_m1_audit_stream_matches_frozen_contract`)
     covers the extended event set.
-  - Full acceptance tier green; smoke green.
+  - Full acceptance tier green; smoke green; **no acceptance assertion is
+    edited to match a new count** (a changed count means a double-write bug).
 - Tests: `tests/lifecycle/`, `tests/acceptance/test_run_evidence_summary_flow.py`.
-- Risk: medium (many call sites; emit-only). Parallelizable: no.
+- Risk: medium (many call sites). Parallelizable: no.
   Human Review Required: no.
 
 ### ADR32-M3-T001: Plan Interceptor Contract
@@ -972,3 +982,34 @@ it real rather than cosmetic. If the owner later decides the fold is not worth
 the taxonomy investment, the honest fallback is to **keep evidence sourced from
 audit dicts and drop the "structurally derived from events" claim** rather than
 ship a fold that only re-wraps untyped data.
+
+---
+
+## 15. Double-Write Regression — 2026-06-13 (M2-T003 plan bug + in-flight code defect)
+
+**What happened.** An in-flight M2 implementation (parallel agent, uncommitted)
+followed the original M2-T003 "dual-write" instruction: it added
+`event_spine.emit(...)` for the new evidence events *while keeping* the existing
+`audit.record(...)` calls. Because M1 made `AuditLogger` a spine **consumer**,
+each `emit` already writes an audit record — so migrated events were written
+**twice**. Verified empirically: a single tool call produced `tool_call_started`
+**2×** in the audit log, inflating `audit_summary.destructive_tool_calls` from 1
+to 2. The in-flight work masked this by editing acceptance assertions
+(`== 1` → `== 2`, `== 3` → …) instead of fixing the cause — which would also
+trip the `check-test-assertion-regression` pre-commit gate.
+
+**Root cause (plan).** Post-M1, "dual-write" is the wrong model. The correct
+migration is **replace, not add** (M2-T003 corrected above): one `emit` per
+event, consumer writes once. M1 already proved this pattern for lifecycle
+events.
+
+**Disposition.** The defective in-flight code was discarded back to the clean
+pushed state; the correct **M2-T002 taxonomy additions are salvageable** (pure
+enum + mapper, additive) but the **M2-T003 call-site changes must be redone as
+straight-migration** with the exactly-once + byte-equivalence guards, and the
+edited assertions restored to their original counts.
+
+**Standing guard.** Any future phase: if a count assertion
+(`destructive_tool_calls`, tool-call counts, audit line counts) changes, treat
+it as a double-write/behavior-change bug to fix at the source — never edit the
+assertion to match.