Commit c13f871
paper-burst friction-test cleanup (AI-native-Systems-Research#203) (AI-native-Systems-Research#204)
* paper-burst friction-test cleanup (AI-native-Systems-Research#203)
Closes AI-native-Systems-Research#193, AI-native-Systems-Research#194, AI-native-Systems-Research#195, AI-native-Systems-Research#196, AI-native-Systems-Research#197, AI-native-Systems-Research#198, AI-native-Systems-Research#199, AI-native-Systems-Research#200, AI-native-Systems-Research#201, AI-native-Systems-Research#202.
Refs AI-native-Systems-Research#203.
Ten nous-side improvements surfaced by a clean-room friction-test of
the `paper-burst` campaign on top of post-AI-native-Systems-Research#189 nous (PR AI-native-Systems-Research#192). Each
fix lands with regression coverage; the full suite goes from 939 → 993
passing tests (+54 new, 0 regressions).
Behaviour changes
-----------------
**Critical**
* `AI-native-Systems-Research#193` — SDK sandbox configurable. ``ClaudeAgentOptions.permission_mode``
defaults to ``"bypassPermissions"`` so campaigns can write outside
the launched cwd (the orchestrator runs the agent with cwd=worktree
but BLIS / build / test outputs land at <main-repo>/.nous/<run>/...).
Operators opt out via ``campaign.sandbox: default`` or
``nous run --sandbox default``.
* `AI-native-Systems-Research#194` — ``state.json.iteration`` matches ``runs/iter-N/`` from the
start of iter-1. Engine increments on leaving INIT (whether via
PRE_WORK or directly to DESIGN) in addition to DONE→DESIGN. Fixes
``nous status`` showing "iter 0" while artifacts live in iter-1/,
and silences the misleading "starting fresh" path on resume.
**Operator UX**
* `AI-native-Systems-Research#195` — ``nous status`` "last tool" view actually populates.
``_tee_event`` now walks ``message.content`` for ``ToolUseBlock``
entries (where ``name`` actually lives) instead of trying
``getattr(message, "tool_name")`` on the top-level message (which
was always None on AssistantMessage).
* `AI-native-Systems-Research#197` — ``--max-iterations`` survives ``nous resume``. The effective
cap is persisted to ``state.json`` on first run and read back on
resume when no CLI flag is supplied. CLI flag still wins; resume
prints which source it used.
* `AI-native-Systems-Research#198` — ``nous stop`` honours phase boundaries, not just iteration
boundaries. ``_enter_phase`` checks the STOP sentinel before each
phase transition, so a long EXECUTE_ANALYZE can be halted at the
next gate boundary instead of waiting for the next iteration.
**Structural / extensibility**
* `AI-native-Systems-Research#199` — Per-campaign iter-root whitelist extension. New
``campaign.validation.iter_root_extensions`` lets paper-* campaigns
declare ``analysis_summary.json`` / ``manifest.json`` / etc. as
allowed iter-root artifacts. Validator merges with the global
``_KNOWN_ROOT_FILES`` whitelist.
* `AI-native-Systems-Research#200` — ``ExecuteAnalyzeIncompleteError`` analog of AI-native-Systems-Research#187 for the
EXECUTE phase. Fires before ``validate_execution`` when
``experiment_plan.yaml`` / ``findings.json`` / ``principle_updates.json``
is missing on disk after dispatch. Names the missing files and
lists the four common causes (max_turns exhaustion, subprocess
hang, polling-loop stall, API stall) each pointing at a concrete
artifact. Writes ``failure_type: "execute_incomplete"`` to
retry_log.jsonl.
* `AI-native-Systems-Research#201` — Post-turn SDK silence detector. After each SDK turn,
``summarize_silence_gaps`` walks the streaming ``executor_log.jsonl``
and writes ``failure_type: "sdk_silence"`` to retry_log.jsonl when
the longest gap between events exceeds
``campaign.sdk_timeouts.silence_threshold_seconds`` (default 600s).
Observation-only — doesn't interrupt or fail the turn.
**Polish**
* `AI-native-Systems-Research#196` — Dispatch log line uses ``type(self).__name__`` so SDK
campaigns log as ``SDKDispatcher``, not ``CLIDispatcher`` (the
parent class whose ``logger.info`` was leaking the name).
* `AI-native-Systems-Research#202` — Resume's "starting fresh" warning rephrased to informational
wording ("treating as iter-1; existing artifacts preserved"), demoted
WARNING → INFO. After AI-native-Systems-Research#194 this path mostly never fires anyway.
Tests
-----
993 passed, 1 skipped, 0 failed (was 939 on `reflective`; +54 new).
New behavioural test files:
* ``test_sdk_sandbox.py`` — ``permission_mode`` defaults to bypass;
campaign override; explicit kwarg override; schema validation.
* ``test_sdk_silence_detector.py`` — gap detection helper; threshold
trigger; below-threshold no-op; opt-out via threshold=0.
* ``test_max_iterations_persist.py`` — persist/read helpers; resume
resolution chain (CLI > state > campaign > default).
* ``test_execute_artifact_assertion.py`` — missing-artifact helper,
diagnostic message shape, retry_log entry shape.
Updated regression coverage in ``test_engine.py`` (iteration counter
ticks on leaving INIT and on DONE→DESIGN), ``test_campaign.py`` (resume
warning is INFO with new wording), ``test_validate.py`` (iter-root
extension whitelist), ``test_nous_stop.py`` (phase-boundary stop
checks), ``test_cli_dispatch.py`` (log line uses runtime class name),
``test_sdk_dispatch.py`` (last-tool extraction from ToolUseBlock).
Migration notes
---------------
* No breaking changes to existing campaigns. All new fields (``sandbox``,
``sdk_timeouts``, ``validation.iter_root_extensions``) are optional
with sensible defaults; old campaigns keep working unchanged.
* Programmatic callers of ``validate_design`` / ``validate_execution``
may now pass an optional ``campaign`` kwarg to opt into the iter-root
extension whitelist. Omitting it preserves the strict default.
* ``state.json`` gains an optional ``max_iterations`` field. Old state
files without it work fine on resume — the resolver falls back to
campaign.yaml + hardcoded default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* review fixups: tighter typing + missing wiring tests + doc accuracy
PR AI-native-Systems-Research#204 review surfaced one bug-tier finding plus a handful of
suggestions; this commit folds in everything.
Changes
-------
* `orchestrator/sdk_dispatch.py` — defensive Python-side validation of
``campaign.sdk_timeouts.silence_threshold_seconds`` at construction
(mirroring the ``sandbox`` enum check). Schema enforces
``minimum: 0`` but ad-hoc dict callers bypass schema; this closes
the gap. Also tightened ``SDKRunner.permission_mode`` to
``Literal["bypassPermissions"] | None`` (was ``str | None``) and
promoted the silence-detector fallback log from ``logger.debug`` →
``logger.warning`` so a perpetually-failing detector surfaces.
* `orchestrator/iteration.py` — ``_enter_phase`` now requires
``work_dir`` (was optional, defaulted to None which silently skipped
the stop-sentinel check). All in-repo callers already pass it.
``ExecuteAnalyzeIncompleteError.__init__`` validates that
``missing`` is non-empty and every entry is in
``_REQUIRED_EXECUTE_ARTIFACTS`` — kills the typo class at the raise
site. Error message clarified about where each missing artifact
belongs (iter root vs ``results/``).
* `orchestrator/campaign.py` — ``_persist_max_iterations`` docstring
rewritten to distinguish silent benign no-ops (state.json absent /
not a dict / value unchanged) from logged failures (read/parse/write
errors).
* `orchestrator/cli.py` — ``_cmd_resume``'s max_iterations resolution
chain comment merged steps 3+4 (campaign.yaml fallback and hardcoded
default 10 are the same code path).
Tests (+6 new, 0 regressions; 993 → 999 total)
----------------------------------------------
* ``test_execute_artifact_assertion.py`` — new end-to-end test pinning
the orchestrator glue: when EXECUTE_ANALYZE's dispatch returns
cleanly without writing artifacts, ``run_iteration`` writes the
retry_log row + raises ``ExecuteAnalyzeIncompleteError``. Uses
``InlineDispatcher.dispatch`` no-op stub so no live LLM, no signal-
file polling.
* ``test_nous_stop.py`` — new end-to-end test pinning AI-native-Systems-Research#198 call-site
wiring: a STOP sentinel staged at DESIGN halts at the
HUMAN_DESIGN_GATE boundary instead of running the gate. Catches any
regression that drops ``work_dir`` from one of the four
``_enter_phase`` call sites in ``run_iteration``.
* ``test_max_iterations_persist.py`` — two new tests for AI-native-Systems-Research#197
resolution-chain steps 3 (campaign.yaml fallback) and 4 (hardcoded
default 10) when state.json has no ``max_iterations`` field.
* ``test_sdk_sandbox.py`` — two new tests for AI-native-Systems-Research#193 CLI flag end-to-end
(``--sandbox default`` overrides ``campaign.sandbox: bypass``;
absent flag preserves campaign value).
* ``test_nous_stop.py::test_enter_phase_requires_work_dir`` —
replaces the prior "without work_dir skips sentinel" test with one
that pins the new contract (TypeError on missing kwarg).
* ``test_integration_real_execution.py`` — updated 8 ``_enter_phase``
call sites to pass ``tmp_path`` as ``work_dir``.
Reviewed but skipped
--------------------
The "worktree leak on validate_execution failure path" finding from
the silent-failure-hunter agent did not reproduce on inspection:
``validate_execution`` only logs warnings, doesn't raise. The
worktree cleanup at iteration.py line 1011 runs unconditionally after
that. The two raise sites further down (findings.json missing /
schema fail) fire AFTER cleanup, so no leak. The agent referenced
``ScriptError``/``RecoverableExecutionError`` handlers that don't
exist in the file — appears to have hallucinated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent c6a6ee6 commit c13f871
20 files changed
Lines changed: 1921 additions & 57 deletions
File tree
- orchestrator
- schemas
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
| |||
175 | 176 | | |
176 | 177 | | |
177 | 178 | | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
178 | 233 | | |
179 | 234 | | |
180 | 235 | | |
| |||
194 | 249 | | |
195 | 250 | | |
196 | 251 | | |
197 | | - | |
198 | | - | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
199 | 262 | | |
200 | 263 | | |
201 | 264 | | |
| |||
309 | 372 | | |
310 | 373 | | |
311 | 374 | | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
312 | 381 | | |
313 | 382 | | |
314 | 383 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
117 | 122 | | |
118 | 123 | | |
119 | 124 | | |
| |||
132 | 137 | | |
133 | 138 | | |
134 | 139 | | |
135 | | - | |
| 140 | + | |
136 | 141 | | |
137 | 142 | | |
138 | 143 | | |
| |||
150 | 155 | | |
151 | 156 | | |
152 | 157 | | |
153 | | - | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
154 | 183 | | |
155 | 184 | | |
156 | 185 | | |
| |||
609 | 638 | | |
610 | 639 | | |
611 | 640 | | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
612 | 648 | | |
613 | 649 | | |
614 | 650 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
189 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
190 | 197 | | |
191 | 198 | | |
192 | 199 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
124 | | - | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
125 | 130 | | |
126 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
127 | 134 | | |
128 | 135 | | |
129 | 136 | | |
| |||
0 commit comments