Skip to content

Commit 6ed32b2

Browse files
committed
refactor(runner): extract budget enforcement to shared governed-execution layer (ADR-0041 G1)
Behavior-preserving extraction of AgentRunner's per-iteration budget enforcement into a single shared layer. Risk-reported per the high-risk-paths gate; no approval/authorization or audit semantics changed. - New teaagent/runner/_governed_execution.py: GovernedExecutionContext + enforce_cost_budget / enforce_phase_budget / enforce_budget_warnings, a verbatim move of _core.py's _assert_cost_budget / _check_phase_budget / _check_budget_warnings bodies. AgentRunner builds one context at construction and its three budget methods now delegate (signatures unchanged). - scripts/validate_runner_invariants.py: new §1.2 G1 gate (_check_governed_execution_authority) requires _core.py to import the layer and call the enforce_* functions, locking the delegation against regression. - docs/reviews/a-p1-3-governed-execution-risk.md: reflective-risk report. - ADR-0041 §1.3/§1.5: G1 marked partial (budget delegated; _authorize_tool_call approval extraction deferred to its own risk-reported slice); G2/G3 met. - tests/runner/test_governed_execution.py: unit coverage for the extracted funcs. Action: A-P1-3 Constraint: behavior-preserving verbatim extraction of budget enforcement only; approval/authorization (_authorize_tool_call) deliberately left inline (larger blast radius, separate report); no audit event added/removed; git-revert rollback Tested: 93 tests pass across tests/runner/ (incl. TestLiveDifferential + new test_governed_execution) + budget/governance/p0-harness + subagent lineage flow + subagent budget-inheritance integration; validate_runner_invariants.py exit 0 (now gates _core->layer); import-cycle smoke OK; mypy clean on touched files; ruff (uv 0.15.17) clean; validate_docs_consistency.py exit 0 Not-tested: full repo suite not run by me (pre-commit runs the smoke subset + full mypy); _authorize_tool_call extraction not attempted Confidence: high
1 parent 1837ae1 commit 6ed32b2

7 files changed

Lines changed: 430 additions & 110 deletions

File tree

docs/adr/0041-execution-surface-unification-and-harness-thinning.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ All must pass before Phase 2 starts:
148148

149149
| Gate | Evidence |
150150
| --- | --- |
151-
| **G1 — Shared layer exists** | `teaagent/runner/_governed_execution.py` exported; `AgentRunner` delegates budget/approval/audit to it |
151+
| **G1 — Shared layer exists** | Partial: `teaagent/runner/_governed_execution.py` exported; `AgentRunner` delegates **budget** enforcement (cost/phase/warnings) to it. Approval/authorization (`_authorize_tool_call`) still inline — tracked as a separate slice |
152152
| **G2 — Subagent path uses layer** | `SubagentManager` constructs context via shared helpers; no parallel `assert_allowed` in `_manager.py` |
153153
| **G3 — Live differential** | Parametrized test (ADR-0040 §2 follow-up) runs identical tool+budget scenario through `AgentRunner` and `SubagentManager.run_subagent`; `assert_budget_invariant`, `assert_audit_invariant`, `assert_approval_invariant` pass on collected evidence |
154154
| **G4 — CI green** | `scripts/validate_runner_invariants.py`, full test suite, acceptance tier unchanged |
@@ -181,13 +181,19 @@ Behavior-preserving slices landed (2026-06-30):
181181
`SubagentManager.run_subagent` (via the `subagent` tool) and asserts
182182
`assert_audit_invariant`, `assert_audit_events_match`, `assert_approval_invariant`,
183183
and `assert_budget_invariant` on evidence collected from both surfaces.
184-
185-
Still open for Phase 1: the `_governed_execution.py` module that unifies
186-
authorization, approval, audit, and full budget *enforcement* (§1.1) — the G1
187-
extraction from `_core.py` — plus the §1.2 import check for that module. **G1 is
188-
unmet** (a high-risk `_core.py` refactor requiring a `docs/reviews/*-risk.md`
189-
report); G3 is met; G2 is met for the budget invariant. Phase 2 is not yet
190-
authorized.
184+
- **G1 budget enforcement extracted.** `runner/_governed_execution.py` now owns
185+
per-iteration budget enforcement (`enforce_cost_budget` / `enforce_phase_budget`
186+
/ `enforce_budget_warnings`), a verbatim behavior-preserving move from `_core.py`;
187+
`AgentRunner` delegates to it. The §1.2 import gate in
188+
`scripts/validate_runner_invariants.py` requires `_core.py` to use the layer.
189+
Risk report: `docs/reviews/a-p1-3-governed-execution-risk.md`; unit tests:
190+
`tests/runner/test_governed_execution.py`.
191+
192+
Still open for Phase 1 (G1 remainder): the **authorization** dimension —
193+
`AgentRunner._authorize_tool_call` reassigns `ApprovalPolicy` and calls
194+
run-summary emission, a larger blast radius deferred to its own reflective-risk
195+
slice. **G1 is partial** (budget enforcement delegated; approval still inline);
196+
**G2 and G3 are met**. Phase 2 is not yet authorized.
191197

192198
### Phase 2 — Domain reasoning migration (harness thinning)
193199

docs/generated/docs-inventory.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
Generated by `python3 scripts/generate_docs_inventory.py`.
77
Do not edit this file manually — regenerate instead.
88

9-
**Markdown files:** 611
9+
**Markdown files:** 612
1010

1111
| Path | Tier | Bytes | SHA256 (12) |
1212
| --- | --- | ---: | --- |
@@ -44,7 +44,7 @@ Do not edit this file manually — regenerate instead.
4444
| `adr/0031-shadow-mode-exit-criteria.md` | working | 3598 | `46a9a0d5eaac` |
4545
| `adr/0032-run-event-taxonomy.md` | working | 16065 | `b9f0c0d7c30a` |
4646
| `adr/0040-second-framework-invariants.md` | working | 7554 | `00b53102ace3` |
47-
| `adr/0041-execution-surface-unification-and-harness-thinning.md` | working | 17046 | `163b9b9d4ad6` |
47+
| `adr/0041-execution-surface-unification-and-harness-thinning.md` | working | 17638 | `74544dd0eba8` |
4848
| `adr/README.md` | working | 7611 | `c91dd1b63df7` |
4949
| `agent-contribution-contract.md` | constitution | 5204 | `9c2dad1195d2` |
5050
| `agent-mode-operator-guide.md` | working | 2778 | `25b258ab7bfe` |
@@ -527,6 +527,7 @@ Do not edit this file manually — regenerate instead.
527527
| `retrospective/review-system.md` | working | 15180 | `61db6643e4aa` |
528528
| `retrospective/tool-capability-review.md` | working | 15681 | `20ca7506fc04` |
529529
| `reviews/a-p0-2-observability-risk.md` | working | 2781 | `8a2295be6857` |
530+
| `reviews/a-p1-3-governed-execution-risk.md` | working | 4979 | `bcb6e5e1fb88` |
530531
| `reviews/a-p1-4-approval-migration-risk.md` | working | 6858 | `66927f1fb201` |
531532
| `reviews/daily-driver-critique-and-counterarguments-2026-06-04.md` | archive | 4904 | `a6f32a23c7e5` |
532533
| `reviews/daily-driver-docs-package-review-2026-06-02.md` | archive | 2288 | `a180df555135` |
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Reflective-Risk Report: Governed-Execution Budget Extraction (A-P1-3)
2+
3+
ADR 0041 Phase 1, gate G1 (budget dimension). Date: 2026-06-30.
4+
High-risk path touched: `teaagent/runner/_core.py`.
5+
6+
## Goal
7+
8+
Extract the primary runner's per-iteration **budget enforcement** (cost ceiling,
9+
phase budget, graduated cost warnings) from `teaagent/runner/_core.py` into a
10+
shared `teaagent/runner/_governed_execution.py` layer that `AgentRunner`
11+
delegates to, so the budget invariants are defined once and inherited by
12+
subagents (which execute through `AgentRunner` via `run_chat_agent`).
13+
Behavior-preserving; no approval/authorization or policy change.
14+
15+
## Stakeholders
16+
17+
Harness maintainers; every agent run (primary + subagent) relies on correct
18+
budget enforcement; the governance gates of ADR-0009 / ADR-0040.
19+
20+
## Assets at Risk
21+
22+
- Budget-enforcement correctness (overspend protection).
23+
- Audit event stream shape (`phase_budget_warning`, `budget_warning`,
24+
`budget_prompt`, `budget_read_only_suggested`).
25+
- Run-loop control flow (`BudgetExceededError` / `RunCancelledError` propagation).
26+
27+
## Threat Model
28+
29+
A subtle behavior change during extraction could (a) fail to raise on an
30+
over-budget run (overspend), (b) change audit event order/fields (breaks audit
31+
consumers / schema), or (c) alter the exception type so the run loop mishandles
32+
it.
33+
34+
## Assumption Audit
35+
36+
- ASSUMPTION: `self.budget`, `self.phase_tracker`, `self.audit`,
37+
`self._budget_monitor`, and `self._budget_warning_levels_emitted` are never
38+
reassigned after `__init__`. VERIFIED by reading `_core.py` — set once; the
39+
warning set is mutated in place (`.add`), not reassigned. A context built once
40+
holding the same references therefore stays in sync.
41+
- ASSUMPTION: the three methods read no other mutable `self` state. VERIFIED by
42+
reading the bodies — only the five collaborators above.
43+
44+
## Evidence Check
45+
46+
- The extracted functions are a verbatim copy of the original method bodies,
47+
parameterized over a `GovernedExecutionContext` of the same collaborators (the
48+
diff is a pure move).
49+
- Method signatures are unchanged; all run-loop call sites are untouched.
50+
51+
## Authority / Tool Boundary
52+
53+
- In scope: `teaagent/runner/_core.py` (3 budget method bodies + one `__init__`
54+
context construction), new `teaagent/runner/_governed_execution.py`,
55+
`scripts/validate_runner_invariants.py`, tests, ADR.
56+
- Out of scope (explicitly deferred): `_authorize_tool_call` / approval-policy
57+
extraction; sandbox; audit chain; policy semantics.
58+
59+
## Failure Modes
60+
61+
- Import cycle (`runner` <-> new module): mitigated — the module imports only
62+
leaf modules (`errors`/`budget`/`budget_monitor`/`phase_tracker`/`audit`);
63+
import smoke passes.
64+
- Context desync if a collaborator is reassigned in future: guarded by the
65+
no-reassignment invariant (documented in the module) and the full test suite.
66+
67+
## Worst-case Scenario
68+
69+
A budget check silently stops raising, so a run overspends its cost cap. Bounded
70+
by the full budget/runner suites and the live differential test, which assert
71+
enforcement still triggers; a regression fails CI before merge.
72+
73+
## Safe Dry-run Plan
74+
75+
Behavior-preserving pure move, verified offline by running the existing budget,
76+
runner, governance, and subagent suites (93 passed) plus the live differential
77+
test and new unit tests — no production run, no external I/O.
78+
79+
## Rollback Plan
80+
81+
`git revert` the commit. The change is additive (one new module) plus three
82+
method-body delegations and a static gate; reverting restores the inline methods
83+
exactly. No data migration, no persisted-state change.
84+
85+
## Bounded Execution
86+
87+
Single commit; only the files listed above; no network; no destructive ops;
88+
verified by local test suites and the runner-invariant gate before commit.
89+
90+
## Audit Log Plan
91+
92+
Audit emission is byte-identical: every `audit.record(...)` call moved verbatim
93+
into the shared functions. No audit event added or removed.
94+
95+
## Human Review Required
96+
97+
Yes — high-risk path (`teaagent/runner/_core.py`). This report is the
98+
reflective-risk artifact; the `check-high-risk-paths` pre-commit hook gates the
99+
commit on its presence.
100+
101+
## Human Approval Gate
102+
103+
Owner authorized the G1 extraction in-session. Budget dimension only; the
104+
higher-blast-radius `_authorize_tool_call` extraction remains deferred to a
105+
separate report and review.
106+
107+
## Acceptance Criteria
108+
109+
- `_governed_execution.py` owns budget enforcement; `AgentRunner` delegates to it.
110+
- All existing budget / runner / governance / subagent tests pass unchanged.
111+
- `validate_runner_invariants.py` passes and now gates `_core.py` -> shared layer.
112+
- ruff + mypy clean; the G3 differential test stays green.
113+
114+
## Go / No-go Decision
115+
116+
**GO** for the budget dimension — bounded, behavior-preserving, fully verified,
117+
trivially reversible. **NO-GO** for bundling `_authorize_tool_call` into this
118+
change: it reassigns `ApprovalPolicy` and calls run-summary emission, a larger
119+
blast radius that warrants its own reflective-risk report and review.

scripts/validate_runner_invariants.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,18 @@
5252
_BUDGET_CLAMP_SYMBOL = 'compute_clamped_budget'
5353
_BUDGET_CLAMP_FILES: tuple[str, ...] = ('teaagent/subagents/_manager.py',)
5454

55+
# ADR 0041 Phase 1 (G1): the primary runner must delegate per-iteration budget
56+
# enforcement to the shared governed-execution layer, not re-implement it inline.
57+
_GOVERNED_EXEC_IMPORTS: frozenset[str] = frozenset(
58+
{'_governed_execution', 'teaagent.runner._governed_execution'}
59+
)
60+
_GOVERNED_EXEC_SYMBOLS: tuple[str, ...] = (
61+
'enforce_cost_budget',
62+
'enforce_phase_budget',
63+
'enforce_budget_warnings',
64+
)
65+
_GOVERNED_EXEC_FILES: tuple[str, ...] = ('teaagent/runner/_core.py',)
66+
5567

5668
def _collect_imports(source: str) -> set[str]:
5769
tree = ast.parse(source)
@@ -113,12 +125,36 @@ def _check_budget_clamp_authority(rel_path: str) -> list[str]:
113125
return []
114126

115127

128+
def _check_governed_execution_authority(rel_path: str) -> list[str]:
129+
"""ADR 0041 Phase 1 G1: budget enforcement is delegated to the shared layer."""
130+
file_path = _REPO_ROOT / rel_path
131+
if not file_path.is_file():
132+
return [f'{rel_path}: file not found']
133+
source = file_path.read_text(encoding='utf-8')
134+
imports = _collect_imports(source)
135+
if not (imports & _GOVERNED_EXEC_IMPORTS):
136+
return [
137+
f'{rel_path}: missing governed-execution import — must import '
138+
f'teaagent.runner._governed_execution and delegate budget enforcement '
139+
f'instead of re-implementing it inline (ADR 0041 Phase 1, G1)'
140+
]
141+
missing = [s for s in _GOVERNED_EXEC_SYMBOLS if s not in source]
142+
if missing:
143+
return [
144+
f'{rel_path}: imports the governed-execution layer but does not call '
145+
f'{", ".join(missing)} (ADR 0041 Phase 1, G1)'
146+
]
147+
return []
148+
149+
116150
def validate() -> list[str]:
117151
errors: list[str] = []
118152
for rel_path in _SECOND_FRAMEWORK_FILES:
119153
errors.extend(_check_file(rel_path))
120154
for rel_path in _BUDGET_CLAMP_FILES:
121155
errors.extend(_check_budget_clamp_authority(rel_path))
156+
for rel_path in _GOVERNED_EXEC_FILES:
157+
errors.extend(_check_governed_execution_authority(rel_path))
122158
return errors
123159

124160

teaagent/runner/_core.py

Lines changed: 25 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from teaagent.audit import AuditLogger
1212
from teaagent.auto_mode import AutoModeConfig
1313
from teaagent.budget import RunBudget
14-
from teaagent.budget_monitor import BudgetAction, BudgetMonitor
14+
from teaagent.budget_monitor import BudgetMonitor
1515
from teaagent.context import ContextCompactor
1616
from teaagent.errors import (
1717
AgentHarnessError,
@@ -49,6 +49,12 @@
4949
from ._approval_manager import RunnerApprovalCoordinator # noqa: E402
5050
from ._auto_mode_manager import AutoModeManager # noqa: E402
5151
from ._events import EventSpine, RunEventType, register_audit_consumer # noqa: E402
52+
from ._governed_execution import ( # noqa: E402
53+
GovernedExecutionContext,
54+
enforce_budget_warnings,
55+
enforce_cost_budget,
56+
enforce_phase_budget,
57+
)
5258
from ._plan_validator import PlanGateInterceptor, PlanValidator # noqa: E402
5359
from ._types import ( # noqa: E402
5460
ApprovalHandler,
@@ -166,6 +172,13 @@ def __init__(
166172
self._budget_warning_levels_emitted: set[int] = set()
167173
self._budget_prompted = False
168174
self._compaction_warning_emitted = False
175+
self._governed_execution = GovernedExecutionContext(
176+
budget=self.budget,
177+
phase_tracker=self.phase_tracker,
178+
audit=self.audit,
179+
budget_monitor=self._budget_monitor,
180+
budget_warning_levels_emitted=self._budget_warning_levels_emitted,
181+
)
169182
self.plan_validator = PlanValidator(
170183
approval_policy=self.approval_policy,
171184
require_plan=require_plan,
@@ -211,16 +224,7 @@ def __init__(
211224
self.plan_validator.set_read_only_lint_errors(lint_errors)
212225

213226
def _assert_cost_budget(self, cost_cents: float) -> None:
214-
max_cost = self.budget.max_estimated_cost_cents
215-
if max_cost is None:
216-
return
217-
# 0 means zero spend allowed - any positive cost exceeds it
218-
if max_cost == 0:
219-
if cost_cents > 0:
220-
raise BudgetExceededError('cost budget exceeded (zero cap)')
221-
return
222-
if cost_cents > max_cost:
223-
raise BudgetExceededError('cost budget exceeded')
227+
enforce_cost_budget(self._governed_execution, cost_cents)
224228

225229
def _read_usage(
226230
self,
@@ -250,97 +254,18 @@ def _check_phase_budget(
250254
cost_cents: float,
251255
tool_calls: int,
252256
) -> None:
253-
tracker = self.phase_tracker
254-
phase = tracker.current_phase
255-
pb = self.budget.phase_budget_for(phase)
256-
257-
phase_iters = tracker.phase_iterations()
258-
if phase_iters > pb.max_iterations:
259-
self.audit.record(
260-
'phase_budget_warning',
261-
run_id,
262-
phase=phase.value,
263-
metric='iterations',
264-
current=phase_iters,
265-
limit=pb.max_iterations,
266-
)
267-
raise BudgetExceededError(f'phase {phase.value} iteration budget exceeded')
268-
269-
phase_tools = tracker.phase_tool_calls()
270-
if phase_tools > pb.max_tool_calls:
271-
self.audit.record(
272-
'phase_budget_warning',
273-
run_id,
274-
phase=phase.value,
275-
metric='tool_calls',
276-
current=phase_tools,
277-
limit=pb.max_tool_calls,
278-
)
279-
raise BudgetExceededError(f'phase {phase.value} tool-call budget exceeded')
280-
281-
phase_cost = tracker.phase_cost_cents(cost_cents)
282-
if (
283-
pb.max_estimated_cost_cents is not None
284-
and phase_cost > pb.max_estimated_cost_cents
285-
):
286-
self.audit.record(
287-
'phase_budget_warning',
288-
run_id,
289-
phase=phase.value,
290-
metric='cost',
291-
current=phase_cost,
292-
limit=pb.max_estimated_cost_cents,
293-
)
294-
raise BudgetExceededError(f'phase {phase.value} cost budget exceeded')
257+
enforce_phase_budget(
258+
self._governed_execution,
259+
run_id=run_id,
260+
cost_cents=cost_cents,
261+
)
295262

296263
def _check_budget_warnings(self, *, run_id: str, cost_cents: float) -> None:
297-
budget_cap = self.budget.max_estimated_cost_cents
298-
if budget_cap is None:
299-
return
300-
# 0 cap is enforced by _assert_cost_budget; no warnings needed
301-
if budget_cap == 0:
302-
return
303-
max_cost = float(budget_cap)
304-
percent = (cost_cents / max_cost) * 100.0
305-
for level in (50, 80, 90, 100):
306-
if percent < level or level in self._budget_warning_levels_emitted:
307-
continue
308-
self._budget_warning_levels_emitted.add(level)
309-
310-
action = self._budget_monitor.check_at_threshold(
311-
run_id=run_id,
312-
cost_cents=cost_cents,
313-
threshold=level,
314-
)
315-
316-
self.audit.record(
317-
'budget_warning',
318-
run_id,
319-
level=level,
320-
percent=percent,
321-
cost_cents=cost_cents,
322-
max_cost_cents=max_cost,
323-
)
324-
325-
if action == BudgetAction.PROMPT_CONFIRM:
326-
self.audit.record(
327-
'budget_prompt',
328-
run_id,
329-
percent=percent,
330-
cost_cents=cost_cents,
331-
max_cost_cents=max_cost,
332-
approved=False,
333-
)
334-
raise RunCancelledError('run cancelled: budget at 90%')
335-
336-
if action == BudgetAction.SUGGEST_READ_ONLY:
337-
self.audit.record(
338-
'budget_read_only_suggested',
339-
run_id,
340-
percent=percent,
341-
cost_cents=cost_cents,
342-
max_cost_cents=max_cost,
343-
)
264+
enforce_budget_warnings(
265+
self._governed_execution,
266+
run_id=run_id,
267+
cost_cents=cost_cents,
268+
)
344269

345270
def _check_compaction_warning(
346271
self, *, context: RunContext, input_tokens: int, output_tokens: int

0 commit comments

Comments
 (0)