Skip to content

Commit 3e69ac0

Browse files
johnteeeclaude
andcommitted
docs: M4 closed — budget stays inline (owner decision B-analog); no gate beyond plan moved
Assessed budget-gate interceptor-suitability against the code before writing any interceptor (per §13.3, the approval lesson). Finding: the budget "gate" is three mechanisms — only the global cost cap (_assert_cost_budget) is stateless; the phase budget (live phase_tracker) and warning ladder (two mutable dedup sets + interactive on_prompt side-effect handler — the same assert_allowed shadow-coexistence trap that blocked approval) are runtime-stateful, and even the cost cap is enforced at two evolving-cost points per iteration with no 1:1 event mapping. Owner chose B-analog: budget enforcement stays inline. M4 closes with NO gate moved beyond the plan gate (M3). Approval and budget observability already reach the M6 fold via their M2-typed audit events. - New work-log: docs/work-log/m4-budget-stays-inline-2026-06-13.md (full assessment) - Plan §5 graph + §7 M4 row updated to "no gate moved; both stay inline" - M4-T001/T002/T003 tickets marked CLOSED — not implemented (planning record retained) - Superseded salvage stash (ApprovalGateInterceptor + BudgetGateInterceptor) dropped - Docs inventory regenerated No code changed; enforcement paths unchanged. Constraint: docs-only; budget/approval enforcement code unchanged; runtime-stateful gates stay inline by evidenced finding. Tested: docs inventory --check passes; validate_docs_consistency runs (pre-existing hypothesis-missing collection error in 3.14 sandbox is environmental, unrelated). Not-tested: full suite not run on 3.12 (no code change). Confidence: high Roadmap-Status: unchanged Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
1 parent 76828f0 commit 3e69ac0

3 files changed

Lines changed: 137 additions & 11 deletions

File tree

docs/generated/docs-inventory.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
Generated by `python3 scripts/generate_docs_inventory.py`.
77
Do not edit this file manually — regenerate instead.
88

9-
**Markdown files:** 587
9+
**Markdown files:** 588
1010

1111
| Path | Tier | Bytes | SHA256 (12) |
1212
| --- | --- | ---: | --- |
@@ -414,7 +414,7 @@ Do not edit this file manually — regenerate instead.
414414
| `ops/security-hardening.md` | working | 11733 | `0a385c7dab82` |
415415
| `ops/troubleshooting.md` | working | 9127 | `4921b6d50f5c` |
416416
| `permission-and-approval-playbook.md` | working | 6560 | `813bc74bb156` |
417-
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 49574 | `fd53521a53fa` |
417+
| `plans/adr-0032-m1-m6-work-plan-2026-06-13.md` | archive | 52091 | `ce92504ad57b` |
418418
| `plans/agent-ecosystem-acceptance-roadmap-2026-05-31.md` | archive | 29099 | `7c4a4972cfeb` |
419419
| `plans/community-pain-points-response-plan-2026-06-05.md` | archive | 7276 | `571d010133ad` |
420420
| `plans/competitive-positioning-plan-2026-05-31.md` | archive | 8726 | `d16dfd2bdd99` |
@@ -590,6 +590,7 @@ Do not edit this file manually — regenerate instead.
590590
| `wasm-skill-ci.md` | working | 974 | `8340d6f1e5c1` |
591591
| `work-log/documentation-optimization-work-items-2026-06-04.md` | archive | 11750 | `9233b40b0bce` |
592592
| `work-log/m4-approval-sliceB-blocked-2026-06-13.md` | archive | 7347 | `3981ed82bc08` |
593+
| `work-log/m4-budget-stays-inline-2026-06-13.md` | archive | 5727 | `0e7a6ee74954` |
593594
| `work-log/operator-friction-log.md` | working | 2560 | `fe79899db10f` |
594595
| `work-log/p0-p1-governance-implementation-ledger-2026-06-11.md` | archive | 5212 | `0b72cd69de32` |
595596
| `work-log/parallel-phase-0-implementation-report-2026-06-04.md` | archive | 13181 | `098186167459` |

docs/plans/adr-0032-m1-m6-work-plan-2026-06-13.md

Lines changed: 34 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -94,9 +94,10 @@ M0 accepted ADR + dual-write spine [done]
9494
git-sandbox, skills, tests, undo, provenance, approval/tool-call
9595
decision events, cancelled/pending lifecycle
9696
-> M3 plan gate interceptor (parity-first)
97-
-> M4 budget gate only (parity-first); approval STAYS INLINE
98-
(owner decision B, 2026-06-13 — approval is runtime-stateful;
99-
see m4-approval-sliceB-blocked report)
97+
-> M4 CLOSED with NO gate moved: approval AND budget both STAY
98+
INLINE (owner decisions B + B-analog, 2026-06-13 — both are
99+
runtime-stateful; plan gate is the sole interceptor gate;
100+
see m4-approval-sliceB-blocked + m4-budget-stays-inline reports)
100101
-> M5 HookRegistry on spine
101102
-> M6 evidence + receipt FOLD (was M2; corrected
102103
scope A) — now genuinely event-typed because M2 completed
@@ -108,9 +109,17 @@ M0 accepted ADR + dual-write spine [done]
108109
Why this order: the evidence/receipt fold is a **read-side consumer**; it can
109110
only be "derived from typed events" once *all* the events it reads are typed.
110111
That requires (a) the non-gate evidence events (M2 coverage) and (b) the gate
111-
decision events whose contract M3/M4 preserve. M4 stays approval-first,
112-
budget-second: approval carries `pending_approval`/resume semantics, budget is
113-
mostly threshold/action.
112+
decision events whose contract M3 preserves. **M4 closed with no gate moved**:
113+
on assessment, both remaining gates proved runtime-stateful. Approval carries
114+
live JIT/session state + handler + auto-mode-swappable policy (decision B).
115+
Budget is three mechanisms — only the global cost cap is stateless; the phase
116+
budget (live phase-tracker) and the warning ladder (two mutable dedup sets + an
117+
interactive `on_prompt` side-effect handler that mutates monitor state, the same
118+
`assert_allowed` shadow-coexistence trap that blocked approval) are stateful,
119+
and even the cost cap is checked at two evolving-cost points per iteration that
120+
don't map 1:1 to events (decision B-analog). Plan gate (M3) is the one gate that
121+
moved; approval and budget observability still reach the M6 fold via their typed
122+
audit events.
114123

115124
## 6. Usage Scenarios
116125

@@ -164,7 +173,7 @@ consumers by M6.
164173
| ADR-0032-M1 | AuditLogger can consume RunEvents and produce byte-equivalent JSONL for golden proof runs; legacy call sites delegate instead of directly owning serialization decisions. |
165174
| ADR-0032-M2 (REDEFINED, taxonomy-only §16) | Every audit event the evidence bundle reads is typed in `RunEventType` and mapped both directions, so the M2-T001 reader surfaces it **from the audit JSONL** (mapper is sufficient; emit-site migration is NOT in M2 — it is deferred to the component milestones, §16). Covers routes, git-sandbox, skills, tests, undo, provenance, approval/tool-call decision events, cancelled/pending lifecycle. Pure additive; zero behavior change. (Old M2 "evidence/receipt fold" moved to M6 — §14.) |
166175
| ADR-0032-M3 | Plan gate is an interceptor using `PlanValidator`, landed parity-first (§13.3): a shadow-parity test asserting interceptor==inline per reason code went green before the inline branch was deleted in a separate commit. Denials and reason codes match current behavior; adversarial and first-hour tests remain green. |
167-
| ADR-0032-M4 (REVISED — owner decision B, 2026-06-13) | Budget gate ONLY moves to an interceptor, parity-first in two commits (shadow parity green → enforce+delete), done alone. **Approval enforcement STAYS INLINE** — it is runtime-stateful (live JIT/session state, tool handler, auto-mode-swappable policy), a poor fit for the pure-interceptor model (every coupling gap was invisible to a unit parity test; see `docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). Approval observability is already provided by M2 (approval audit events are typed + reader-surfaced); the M6 fold reads them. Budget warnings/exhausted behavior unchanged. |
176+
| ADR-0032-M4 (CLOSED — owner decisions B + B-analog, 2026-06-13) | **No gate moves to an interceptor; approval AND budget enforcement both STAY INLINE.** Both proved runtime-stateful on assessment, a poor fit for the pure-interceptor model. **Approval** (decision B): live JIT/session state, tool handler, auto-mode-swappable policy — every coupling gap was invisible to a unit parity test (`docs/work-log/m4-approval-sliceB-blocked-2026-06-13.md`). **Budget** (decision B-analog): it is three mechanisms — only the global cost cap (`_assert_cost_budget`) is stateless; the phase budget (live `phase_tracker`) and the warning ladder (`_budget_warning_levels_emitted` + `BudgetMonitor._emitted_levels`/`_prompted` dedup sets + an interactive `on_prompt` side-effect handler — the same `assert_allowed` shadow-coexistence trap that blocked approval) are stateful, and even the cost cap is enforced at two evolving-cost points per iteration that do not map 1:1 to events (`docs/work-log/m4-budget-stays-inline-2026-06-13.md`). Both gates' observability is already provided by M2 (their audit events — `tool_call_*`, `approval_*`, `budget_warning`, `budget_prompt`, `phase_budget_warning` — are typed + reader-surfaced); the M6 fold reads them without owning enforcement. Approval/budget behavior unchanged. **Net: plan gate (M3) is the sole governance gate moved to an interceptor.** |
168177
| ADR-0032-M5 | HookRegistry subscribes through the spine; Claude-Code-compatible hook names remain aliases; public hook API docs and tests pass. |
169178
| ADR-0032-M6 (was M2 fold; corrected scope A) | Evidence and receipts are folded from the typed event stream and equal the legacy builder on success/failure/pending fixtures (cancelled once emitted in M2); the fold reads the full stream (no fallback flag, per Q1); synthetic receipt-only fixtures are retired or relabeled legacy. Runs only after M2 coverage + M3/M4 decision events exist. |
170179
| ADR-0032-M7 (was M6) | ContextBus and webhook sinks consume the spine; inline emission paths are deleted; validator shows no orphaned eventing modules. |
@@ -425,7 +434,14 @@ full acceptance run):**
425434
- `tests/lifecycle/test_run_event_spine.py`
426435
- Risk: high. Parallelizable: no. Human Review Required: no.
427436

428-
### ADR32-M4-T001: Approval Interceptor Contract
437+
> **M4 TICKETS CLOSED — NOT IMPLEMENTED (owner decisions B + B-analog, 2026-06-13).**
438+
> T001/T002 (approval) and T003 (budget) below are retained as the *planning
439+
> record only*. On assessment both gates proved runtime-stateful and stay inline
440+
> (see the M4 row in §7 and the two work-log reports). The plan gate (M3) is the
441+
> sole governance gate that moved to an interceptor. No code from these tickets
442+
> shipped.
443+
444+
### ADR32-M4-T001: Approval Interceptor Contract [CLOSED — not implemented]
429445

430446
- Goal: define approval interceptor semantics without touching budget.
431447
- Scope: permission modes, scoped approvals, JIT/prompt approvals,
@@ -491,7 +507,16 @@ shadow parity test must cover those paths before any inline deletion.
491507
- `teaagent/runner/_approval_manager.py`
492508
- Risk: high. Parallelizable: no. Human Review Required: no.
493509

494-
### ADR32-M4-T003: Budget Interceptor Contract
510+
### ADR32-M4-T003: Budget Interceptor Contract [CLOSED — not implemented]
511+
512+
> **CLOSED by owner decision B-analog (2026-06-13).** Assessment found the budget
513+
> "gate" is three mechanisms: only the global cost cap is stateless; the phase
514+
> budget and warning ladder are runtime-stateful (live phase-tracker, two mutable
515+
> dedup sets, an interactive `on_prompt` side-effect handler — the same
516+
> shadow-coexistence trap that blocked approval), and even the cost cap is
517+
> enforced at two evolving-cost points per iteration with no 1:1 event mapping.
518+
> Budget enforcement stays inline. See
519+
> `docs/work-log/m4-budget-stays-inline-2026-06-13.md`.
495520
496521
Per §13.3 the budget gate is the riskiest of the three (warning thresholds +
497522
prompt handler + hard stop), so it is migrated **last and alone**, after the
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# M4 Budget Gate Stays Inline — Interceptor-Suitability Assessment
2+
3+
> **Status:** RESOLVED — owner chose **B-analog: budget enforcement stays
4+
> inline**, 2026-06-13. No budget interceptor shipped. This mirrors the approval
5+
> resolution (decision B) and closes M4 with **no gate moved beyond the plan
6+
> gate (M3)**.
7+
8+
## Why this assessment ran first
9+
10+
Per the work plan §7 (M4 row) and §13.3, the budget gate's first required step
11+
is an interceptor-suitability assessment, not an immediate parity-first slice.
12+
The approval gate taught the lesson the hard way: a unit parity test hid three
13+
runtime-coupling gaps, and enforce-cutover would have regressed JIT-approved
14+
calls (`m4-approval-sliceB-blocked-2026-06-13.md`). So budget was assessed
15+
against the observable code before any interceptor was written.
16+
17+
## Finding: the "budget gate" is three distinct mechanisms, not one
18+
19+
| Mechanism | Inline impl (`teaagent/runner/_core.py`) | State / coupling | Interceptor-suitable? |
20+
|---|---|---|---|
21+
| **Global cost cap** | `_assert_cost_budget` (~line 213) | Pure function of `(cost_cents, budget.max_estimated_cost_cents)`; raises `BudgetExceededError` | **Yes** — the plan-gate analog |
22+
| **Phase budget** | `_check_phase_budget` (~line 246) | Reads live `self.phase_tracker` (current phase, phase iterations/tool-calls/cost); emits `phase_budget_warning`; raises | No — runtime-stateful (like approval) |
23+
| **Warning ladder** | `_check_budget_warnings` (~line 296) → `BudgetMonitor.check_at_threshold` | `self._budget_warning_levels_emitted` **and** `BudgetMonitor._emitted_levels` / `_prompted` dedup sets; **interactive `on_prompt` side-effect** (`budget_monitor.py:167-176`); emits `budget_warning` / `budget_prompt` / `budget_read_only_suggested`; may raise `RunCancelledError` | No — strictly worse than approval |
24+
25+
So only ~⅓ of the gate (the global cost cap) is genuinely stateless.
26+
27+
## Two hard blockers (evidence-backed)
28+
29+
### 1. The warning ladder hits the exact `assert_allowed` shadow-coexistence trap
30+
31+
`BudgetMonitor.check_at_threshold` (`teaagent/budget_monitor.py:108-129`) is
32+
**side-effecting**: it mutates `self._emitted_levels` (line 121) and invokes
33+
`on_prompt` (line 169, an interactive handler returning a bool that advances
34+
`_prompted`). This is the same shape as `ApprovalPolicy.assert_allowed`, whose
35+
side effects made a shadow interceptor unable to coexist with the inline path.
36+
A shadow budget interceptor calling `check_at_threshold` alongside the inline
37+
call would either:
38+
39+
- **double-fire** `on_prompt` / double-emit `budget_warning`, or
40+
- if the dedup set is shared, **silently swallow** the inline call (whichever
41+
runs first marks the level emitted) — a covert cutover, not a shadow.
42+
43+
The previously documented `budget_warning` double-emit trap is one instance of
44+
this larger side-effect problem.
45+
46+
### 2. Even the clean piece does not map 1:1 to events
47+
48+
The global cost cap is enforced at **two evolving-cost points per iteration**:
49+
50+
- `_core.py:948` — before `decide()`, with the prior iteration's `cost_cents`
51+
(fail-fast before spending more);
52+
- `_core.py:966` — after `_read_usage()` refreshes `cost_cents` with the cost of
53+
the model call just made (catch this iteration's overspend).
54+
55+
`ITERATION_STARTED` fires once at `_core.py:936-938`, **before both**, and its
56+
payload is only `{'iteration': iterations}` — it does not even carry
57+
`cost_cents` (a loop-local). An `ITERATION_STARTED` interceptor could only
58+
approximate the line-948 semantics; covering the line-966 post-usage check would
59+
require emitting a **new** post-usage event into the audit stream — scope creep
60+
for marginal value.
61+
62+
## Decision and rationale
63+
64+
**Owner chose B-analog: budget enforcement stays inline.** The decision-B logic
65+
("runtime-stateful gates stay inline; do not force them into the
66+
interceptor-on-event model") applies *more* strongly to budget than it did to
67+
approval: budget has a side-effecting interactive handler, two mutable dedup
68+
sets, a live phase-tracker dependency, *and* a multi-point evolving-cost
69+
enforcement pattern with no clean event mapping.
70+
71+
The alternatives were weighed and rejected:
72+
73+
- **Narrow cost-cap-only slice:** moves ~⅓ of the gate, still needs a new
74+
post-usage event for the line-966 check, and leaves the stateful majority
75+
inline — high overhead, low value.
76+
- **Full heavy shim:** providers for phase-tracker / dedup sets / `on_prompt` +
77+
new events — the exact coupling we rejected for approval, at greater cost.
78+
79+
## Consequences
80+
81+
- **No budget interceptor ships.** Budget warning/prompt/exhausted/phase
82+
behavior is **unchanged** — the proven inline paths stay authoritative.
83+
- **Budget observability already reaches the M6 fold via M2:** the audit events
84+
`budget_warning`, `budget_prompt`, `budget_read_only_suggested`,
85+
`phase_budget_warning` are typed in `RunEventType` and surfaced by the
86+
M2-T001 reader from the audit JSONL. The spine carries observability without
87+
owning enforcement — identical shape to the approval resolution.
88+
- **The parallel tool's salvage stash (`stash@{0}`) is now fully superseded**
89+
both interceptors it held (`ApprovalGateInterceptor`, `BudgetGateInterceptor`)
90+
are decided-unneeded. Dropped (recoverable via git reflog for ~90 days if
91+
ever needed).
92+
93+
## Net M4 outcome
94+
95+
**M4 closes with no gate moved beyond M3.** The plan gate (M3) is the sole
96+
governance gate that became an EventSpine interceptor. Approval and budget are
97+
both legitimately runtime-stateful and stay inline by evidenced architectural
98+
finding. The strangler migration's remaining value is on the read side:
99+
M5 (HookRegistry on spine), M6 (evidence + receipt fold over the typed stream),
100+
M7 (ContextBus + webhook consumers).

0 commit comments

Comments
 (0)