Problem
nous stop <target> (added in #189 wave) writes a STOP sentinel that the orchestrator honors between iterations. The check sits in orchestrator/campaign.py in the iteration loop, before each run_iteration() call.
That granularity is wrong for the most common reason an operator wants to stop: the current iteration has gone off the rails (agent stuck, BLIS subprocess hanging, runaway costs) and they want to halt now, not in 30+ minutes when EXECUTE_ANALYZE eventually completes.
Repro (paper-burst friction-test, 2026-05-26)
EXECUTE_ANALYZE was running 50 BLIS arms in parallel. One arm (externality-credit_seed11) hung at 100% CPU for 40+ minutes, blocking the agent's tool call, blocking the SDK turn. Operator wanted to stop. nous stop wrote the sentinel — but the sentinel wouldn't be checked until iter-2's iteration-loop boundary, which would only happen after iter-1's stuck EXECUTE_ANALYZE eventually returned. Net result: ctrl-C / kill was needed for an immediate halt; nous stop was effectively a no-op for this case.
Proposal
Add phase-boundary stop checks inside run_iteration itself, in addition to the existing iteration-loop check:
# orchestrator/iteration.py
def run_iteration(...):
...
if _enter_phase(engine, "DESIGN"):
_raise_if_stopped(work_dir, where="before DESIGN")
...
if _enter_phase(engine, "HUMAN_DESIGN_GATE"):
_raise_if_stopped(work_dir, where="before HUMAN_DESIGN_GATE")
...
if _enter_phase(engine, "EXECUTE_ANALYZE"):
_raise_if_stopped(work_dir, where="before EXECUTE_ANALYZE")
...
This still doesn't interrupt a stuck SDK turn mid-flight (the agent's tool call is opaque to the orchestrator), but it lets the operator halt cleanly on any phase boundary instead of waiting for the next iteration boundary.
Stretch: a --reason "I want to halt mid-phase" could set a more aggressive flag that the dispatcher checks during streaming-event teeing, raising CampaignStopped from inside the SDK callback. That's a real interrupt, but messier; phase-boundary first.
Files to touch
orchestrator/iteration.py — sprinkle _raise_if_stopped calls at each _enter_phase site.
orchestrator/campaign.py — surface "stopped at phase X" in the ledger row's error field (currently says "stopped_by_user: ...", which is fine).
tests/test_nous_stop.py — add a fixture where the sentinel is written between phases and assert the next phase doesn't start.
Discovered in
paper-burst friction-test, 2026-05-26.
Problem
nous stop <target>(added in #189 wave) writes aSTOPsentinel that the orchestrator honors between iterations. The check sits inorchestrator/campaign.pyin the iteration loop, before eachrun_iteration()call.That granularity is wrong for the most common reason an operator wants to stop: the current iteration has gone off the rails (agent stuck, BLIS subprocess hanging, runaway costs) and they want to halt now, not in 30+ minutes when EXECUTE_ANALYZE eventually completes.
Repro (paper-burst friction-test, 2026-05-26)
EXECUTE_ANALYZE was running 50 BLIS arms in parallel. One arm (externality-credit_seed11) hung at 100% CPU for 40+ minutes, blocking the agent's tool call, blocking the SDK turn. Operator wanted to stop.
nous stopwrote the sentinel — but the sentinel wouldn't be checked until iter-2's iteration-loop boundary, which would only happen after iter-1's stuck EXECUTE_ANALYZE eventually returned. Net result: ctrl-C /killwas needed for an immediate halt;nous stopwas effectively a no-op for this case.Proposal
Add phase-boundary stop checks inside
run_iterationitself, in addition to the existing iteration-loop check:This still doesn't interrupt a stuck SDK turn mid-flight (the agent's tool call is opaque to the orchestrator), but it lets the operator halt cleanly on any phase boundary instead of waiting for the next iteration boundary.
Stretch: a
--reason "I want to halt mid-phase"could set a more aggressive flag that the dispatcher checks during streaming-event teeing, raisingCampaignStoppedfrom inside the SDK callback. That's a real interrupt, but messier; phase-boundary first.Files to touch
orchestrator/iteration.py— sprinkle_raise_if_stoppedcalls at each_enter_phasesite.orchestrator/campaign.py— surface "stopped at phase X" in the ledger row's error field (currently says "stopped_by_user: ...", which is fine).tests/test_nous_stop.py— add a fixture where the sentinel is written between phases and assert the next phase doesn't start.Discovered in
paper-burst friction-test, 2026-05-26.