docs: M4 closed — budget stays inline (owner decision B-analog); no gate beyond plan moved

johnteee · claude · johnteee · commit 3e69ac0fbff3 · 2026-06-14T00:21:57.000+08:00
Assessed budget-gate interceptor-suitability against the code before writing any
interceptor (per §13.3, the approval lesson). Finding: the budget "gate" is three
mechanisms — only the global cost cap (_assert_cost_budget) is stateless; the phase
budget (live phase_tracker) and warning ladder (two mutable dedup sets + interactive
on_prompt side-effect handler — the same assert_allowed shadow-coexistence trap that
blocked approval) are runtime-stateful, and even the cost cap is enforced at two
evolving-cost points per iteration with no 1:1 event mapping.

Owner chose B-analog: budget enforcement stays inline. M4 closes with NO gate moved
beyond the plan gate (M3). Approval and budget observability already reach the M6 fold
via their M2-typed audit events.

- New work-log: docs/work-log/m4-budget-stays-inline-2026-06-13.md (full assessment)
- Plan §5 graph + §7 M4 row updated to "no gate moved; both stay inline"
- M4-T001/T002/T003 tickets marked CLOSED — not implemented (planning record retained)
- Superseded salvage stash (ApprovalGateInterceptor + BudgetGateInterceptor) dropped
- Docs inventory regenerated

No code changed; enforcement paths unchanged.

Constraint: docs-only; budget/approval enforcement code unchanged; runtime-stateful gates stay inline by evidenced finding.
Tested: docs inventory --check passes; validate_docs_consistency runs (pre-existing hypothesis-missing collection error in 3.14 sandbox is environmental, unrelated).
Not-tested: full suite not run on 3.12 (no code change).
Confidence: high
Roadmap-Status: unchanged
Co-Authored-By: Claude Fable 5 &lt;noreply@anthropic.com&gt;
diff --git a/docs/generated/docs-inventory.md b/docs/generated/docs-inventory.md
@@ -6,7 +6,7 @@
 Generated by `python3 scripts/generate_docs_inventory.py`.
 Do not edit this file manually — regenerate instead.
 
-**Markdown files:** 587
+**Markdown files:** 588
 
 | Path | Tier | Bytes | SHA256 (12) |
 | --- | --- | ---: | --- |
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
 | `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
 | `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
 | `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
-| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 49574 | `fd53521a53fa` |
+| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 52091 | `ce92504ad57b` |
 | `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
 | `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
 | `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |
@@ -590,6 +590,7 @@ Do not edit this file manually — regenerate instead.
 | `wasm-skill-ci.md` | working | 974 | `8340d6f1e5c1` |
 | `work-log/documentation-optimization-work-items-2026-06-04.md` | archive | 11750 | `9233b40b0bce` |
 | `work-log/m4-approval-sliceB-blocked-2026-06-13.md` | archive | 7347 | `3981ed82bc08` |
+| `work-log/m4-budget-stays-inline-2026-06-13.md` | archive | 5727 | `0e7a6ee74954` |
 | `work-log/operator-friction-log.md` | working | 2560 | `fe79899db10f` |
 | `work-log/p0-p1-governance-implementation-ledger-2026-06-11.md` | archive | 5212 | `0b72cd69de32` |
 | `work-log/parallel-phase-0-implementation-report-2026-06-04.md` | archive | 13181 | `098186167459` |
diff --git a/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md b/docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md
@@ -94,9 +94,10 @@ M0 accepted ADR + dual-write spine                                    [done]
        git-sandbox, skills, tests, undo, provenance, approval/tool-call
        decision events, cancelled/pending lifecycle
       -> M3 plan gate interceptor                  (parity-first)
-        -> M4 budget gate only (parity-first); approval STAYS INLINE
-           (owner decision B, 2026-06-13 — approval is runtime-stateful;
-           see m4-approval-sliceB-blocked report)
+        -> M4 CLOSED with NO gate moved: approval AND budget both STAY
+           INLINE (owner decisions B + B-analog, 2026-06-13 — both are
+           runtime-stateful; plan gate is the sole interceptor gate;
+           see m4-approval-sliceB-blocked + m4-budget-stays-inline reports)
           -> M5 HookRegistry on spine
             -> M6 evidence + receipt FOLD          (was M2; corrected
                scope A) — now genuinely event-typed because M2 completed
@@ -108,9 +109,17 @@ M0 accepted ADR + dual-write spine                                    [done]
 Why this order: the evidence/receipt fold is a **read-side consumer**; it can
 only be "derived from typed events" once *all* the events it reads are typed.
 That requires (a) the non-gate evidence events (M2 coverage) and (b) the gate
-decision events whose contract M3/M4 preserve. M4 stays approval-first,
-budget-second: approval carries `pending_approval`/resume semantics, budget is
-mostly threshold/action.
+decision events whose contract M3 preserves. **M4 closed with no gate moved**:
+on assessment, both remaining gates proved runtime-stateful. Approval carries
+live JIT/session state + handler + auto-mode-swappable policy (decision B).
+Budget is three mechanisms — only the global cost cap is stateless; the phase
+budget (live phase-tracker) and the warning ladder (two mutable dedup sets + an
+interactive `on_prompt` side-effect handler that mutates monitor state, the same
+`assert_allowed` shadow-coexistence trap that blocked approval) are stateful,
+and even the cost cap is checked at two evolving-cost points per iteration that
+don't map 1:1 to events (decision B-analog). Plan gate (M3) is the one gate that
+moved; approval and budget observability still reach the M6 fold via their typed
+audit events.
 
 ## 6. Usage Scenarios
 
@@ -164,7 +173,7 @@ consumers by M6.
 | ADR-0032-M1 | AuditLogger can consume RunEvents and produce byte-equivalent JSONL for golden proof runs; legacy call sites delegate instead of directly owning serialization decisions. |
 | ADR-0032-M2 (REDEFINED, taxonomy-only §16) | Every audit event the evidence bundle reads is typed in `RunEventType` and mapped both directions, so the M2-T001 reader surfaces it **from the audit JSONL** (mapper is sufficient; emit-site migration is NOT in M2 — it is deferred to the component milestones, §16). Covers routes, git-sandbox, skills, tests, undo, provenance, approval/tool-call decision events, cancelled/pending lifecycle. Pure additive; zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — §14.) |
 | ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
-| ADR-0032-M4 (REVISED — owner decision B, 2026-06-13) | Budget gate ONLY moves to an interceptor, parity-first in two commits (shadow parity green → enforce+delete), done alone. **Approval enforcement STAYS INLINE** — it is runtime-stateful (live JIT/session state, tool handler, auto-mode-swappable policy), a poor fit for the pure-interceptor model (every coupling gap was invisible to a unit parity test; see `docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). Approval observability is already provided by M2 (approval audit events are typed + reader-surfaced); the M6 fold reads them. Budget warnings/exhausted behavior unchanged. |
+| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
 | ADR-0032-M5 | HookRegistry subscribes through the spine; Claude-Code-compatible hook names remain aliases; public hook API docs and tests pass. |
 | ADR-0032-M6 (was M2 fold; corrected scope A) | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. |
 | ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
@@ -425,7 +434,14 @@ full acceptance run):**
   - `tests/lifecycle/test_run_event_spine.py`
 - Risk: high. Parallelizable: no. Human Review Required: no.
 
-### ADR32-M4-T001: Approval Interceptor Contract
+> **M4 TICKETS CLOSED — NOT IMPLEMENTED (owner decisions B + B-analog, 2026-06-13).**
+> T001/T002 (approval) and T003 (budget) below are retained as the *planning
+> record only*. On assessment both gates proved runtime-stateful and stay inline
+> (see the M4 row in §7 and the two work-log reports). The plan gate (M3) is the
+> sole governance gate that moved to an interceptor. No code from these tickets
+> shipped.
+
+### ADR32-M4-T001: Approval Interceptor Contract [CLOSED — not implemented]
 
 - Goal: define approval interceptor semantics without touching budget.
 - Scope: permission modes, scoped approvals, JIT/prompt approvals,
@@ -491,7 +507,16 @@ shadow parity test must cover those paths before any inline deletion.
   - `teaagent/runner/_approval_manager.py`
 - Risk: high. Parallelizable: no. Human Review Required: no.
 
-### ADR32-M4-T003: Budget Interceptor Contract
+### ADR32-M4-T003: Budget Interceptor Contract [CLOSED — not implemented]
+
+> **CLOSED by owner decision B-analog (2026-06-13).** Assessment found the budget
+> "gate" is three mechanisms: only the global cost cap is stateless; the phase
+> budget and warning ladder are runtime-stateful (live phase-tracker, two mutable
+> dedup sets, an interactive `on_prompt` side-effect handler — the same
+> shadow-coexistence trap that blocked approval), and even the cost cap is
+> enforced at two evolving-cost points per iteration with no 1:1 event mapping.
+> Budget enforcement stays inline. See
+> `docs/work-log/m4-budget-stays-inline-2026-06-13.md`.
 
 Per §13.3 the budget gate is the riskiest of the three (warning thresholds +
 prompt handler + hard stop), so it is migrated **last and alone**, after the
diff --git a/docs/work-log/m4-budget-stays-inline-2026-06-13.md b/docs/work-log/m4-budget-stays-inline-2026-06-13.md
@@ -0,0 +1,100 @@
+# M4 Budget Gate Stays Inline — Interceptor-Suitability Assessment
+
+> **Status:** RESOLVED — owner chose **B-analog: budget enforcement stays
+> inline**, 2026-06-13. No budget interceptor shipped. This mirrors the approval
+> resolution (decision B) and closes M4 with **no gate moved beyond the plan
+> gate (M3)**.
+
+## Why this assessment ran first
+
+Per the work plan §7 (M4 row) and §13.3, the budget gate's first required step
+is an interceptor-suitability assessment, not an immediate parity-first slice.
+The approval gate taught the lesson the hard way: a unit parity test hid three
+runtime-coupling gaps, and enforce-cutover would have regressed JIT-approved
+calls (`m4-approval-sliceB-blocked-2026-06-13.md`). So budget was assessed
+against the observable code before any interceptor was written.
+
+## Finding: the "budget gate" is three distinct mechanisms, not one
+
+| Mechanism | Inline impl (`teaagent/runner/_core.py`) | State / coupling | Interceptor-suitable? |
+|---|---|---|---|
+| **Global cost cap** | `_assert_cost_budget` (~line 213) | Pure function of `(cost_cents, budget.max_estimated_cost_cents)`; raises `BudgetExceededError` | **Yes** — the plan-gate analog |
+| **Phase budget** | `_check_phase_budget` (~line 246) | Reads live `self.phase_tracker` (current phase, phase iterations/tool-calls/cost); emits `phase_budget_warning`; raises | No — runtime-stateful (like approval) |
+| **Warning ladder** | `_check_budget_warnings` (~line 296) → `BudgetMonitor.check_at_threshold` | `self._budget_warning_levels_emitted` **and** `BudgetMonitor._emitted_levels` / `_prompted` dedup sets; **interactive `on_prompt` side-effect** (`budget_monitor.py:167-176`); emits `budget_warning` / `budget_prompt` / `budget_read_only_suggested`; may raise `RunCancelledError` | No — strictly worse than approval |
+
+So only ~⅓ of the gate (the global cost cap) is genuinely stateless.
+
+## Two hard blockers (evidence-backed)
+
+### 1. The warning ladder hits the exact `assert_allowed` shadow-coexistence trap
+
+`BudgetMonitor.check_at_threshold` (`teaagent/budget_monitor.py:108-129`) is
+**side-effecting**: it mutates `self._emitted_levels` (line 121) and invokes
+`on_prompt` (line 169, an interactive handler returning a bool that advances
+`_prompted`). This is the same shape as `ApprovalPolicy.assert_allowed`, whose
+side effects made a shadow interceptor unable to coexist with the inline path.
+A shadow budget interceptor calling `check_at_threshold` alongside the inline
+call would either:
+
+- **double-fire** `on_prompt` / double-emit `budget_warning`, or
+- if the dedup set is shared, **silently swallow** the inline call (whichever
+  runs first marks the level emitted) — a covert cutover, not a shadow.
+
+The previously documented `budget_warning` double-emit trap is one instance of
+this larger side-effect problem.
+
+### 2. Even the clean piece does not map 1:1 to events
+
+The global cost cap is enforced at **two evolving-cost points per iteration**:
+
+- `_core.py:948` — before `decide()`, with the prior iteration's `cost_cents`
+  (fail-fast before spending more);
+- `_core.py:966` — after `_read_usage()` refreshes `cost_cents` with the cost of
+  the model call just made (catch this iteration's overspend).
+
+`ITERATION_STARTED` fires once at `_core.py:936-938`, **before both**, and its
+payload is only `{'iteration': iterations}` — it does not even carry
+`cost_cents` (a loop-local). An `ITERATION_STARTED` interceptor could only
+approximate the line-948 semantics; covering the line-966 post-usage check would
+require emitting a **new** post-usage event into the audit stream — scope creep
+for marginal value.
+
+## Decision and rationale
+
+**Owner chose B-analog: budget enforcement stays inline.** The decision-B logic
+("runtime-stateful gates stay inline; do not force them into the
+interceptor-on-event model") applies *more* strongly to budget than it did to
+approval: budget has a side-effecting interactive handler, two mutable dedup
+sets, a live phase-tracker dependency, *and* a multi-point evolving-cost
+enforcement pattern with no clean event mapping.
+
+The alternatives were weighed and rejected:
+
+- **Narrow cost-cap-only slice:** moves ~⅓ of the gate, still needs a new
+  post-usage event for the line-966 check, and leaves the stateful majority
+  inline — high overhead, low value.
+- **Full heavy shim:** providers for phase-tracker / dedup sets / `on_prompt` +
+  new events — the exact coupling we rejected for approval, at greater cost.
+
+## Consequences
+
+- **No budget interceptor ships.** Budget warning/prompt/exhausted/phase
+  behavior is **unchanged** — the proven inline paths stay authoritative.
+- **Budget observability already reaches the M6 fold via M2:** the audit events
+  `budget_warning`, `budget_prompt`, `budget_read_only_suggested`,
+  `phase_budget_warning` are typed in `RunEventType` and surfaced by the
+  M2-T001 reader from the audit JSONL. The spine carries observability without
+  owning enforcement — identical shape to the approval resolution.
+- **The parallel tool's salvage stash (`stash@{0}`) is now fully superseded** —
+  both interceptors it held (`ApprovalGateInterceptor`, `BudgetGateInterceptor`)
+  are decided-unneeded. Dropped (recoverable via git reflog for ~90 days if
+  ever needed).
+
+## Net M4 outcome
+
+**M4 closes with no gate moved beyond M3.** The plan gate (M3) is the sole
+governance gate that became an EventSpine interceptor. Approval and budget are
+both legitimately runtime-stateful and stay inline by evidenced architectural
+finding. The strangler migration's remaining value is on the read side:
+M5 (HookRegistry on spine), M6 (evidence + receipt fold over the typed stream),
+M7 (ContextBus + webhook consumers).