Skip to content

Commit efc748f

Browse files
authored
fix(stop+validator): phase-aware stop docs + required_iter_root v2 (AI-native-Systems-Research#198 + AI-native-Systems-Research#199) (AI-native-Systems-Research#244)
* fix(stop+validator): phase-aware stop docs + required_iter_root v2 Closes AI-native-Systems-Research#198 Closes AI-native-Systems-Research#199 ## AI-native-Systems-Research#198 — phase-aware nous stop The behavioral half of AI-native-Systems-Research#198 was already shipped (orchestrator/iteration.py _enter_phase honours STOP sentinel before each phase transition; comprehensive test coverage in TestEnterPhaseHonorsSentinel + TestRunIterationHaltsAtPhaseBoundary). What lagged was user-facing documentation. Two surfaces still said "iteration boundary": * orchestrator/cli.py:416 — schema-doc render's See-also section. * orchestrator/cli.py:879 — argparse help= for `nous stop`. Both now describe phase-boundary semantics, with the argparse subparser also gaining a `description=` so `nous stop --help` shows the full explanation including which phase boundaries are honoured. Tests in TestStopDocumentationReflectsPhaseAwareness pin both surfaces against the legacy phrasing. ## AI-native-Systems-Research#199 v2 — required_iter_root The v1 (iter_root_extensions) was already shipped. This PR adds the stretch goal from the issue body: campaigns can declare files that MUST exist at the iter root after EXECUTE_ANALYZE, with structured "required iter-root file missing: X" errors when they're absent. Wiring: * campaign.schema.yaml gains a validation.required_iter_root field (array of filenames). * orchestrator/validate.py grows _campaign_required_iter_root() and _check_required_iter_root() helpers. * validate_execution enforces presence; required ⊆ allowed (a required file is also implicitly allowed at iter-root, so campaigns don't have to list it in both blocks). * validate_design merges required into the allowed-set for the unexpected-file check (so campaigns that write the file during DESIGN don't get rejected), but does NOT enforce presence at design time — most required artifacts are EXECUTE-phase outputs. Tests in TestRequiredIterRoot cover happy path, missing-required-fails, implicit allowed-at-iter-root, schema acceptance, and backward-compat (no-required-block keeps default behaviour). Full suite: 1245 passed, 1 skipped (+9 vs upstream/reflective). * test: cover validate_design required-merge + known-root overlap (AI-native-Systems-Research#199 v2) Two test additions per pr-review-toolkit feedback on AI-native-Systems-Research#244: 1. test_required_file_at_design_time_is_allowed_not_required — pins the validate_design behaviour added in this PR: required ⊆ allowed merge lets a required file written during DESIGN pass the unexpected-file check, AND validate_design does NOT enforce required-presence (that's validate_execution's job because most required artifacts are EXECUTE-phase outputs). Without this test, a future refactor that inlines only the EXECUTE branch would silently regress DESIGN. 2. test_required_overlapping_known_root_file_still_enforced — a campaign declaring a globally-known file (e.g. findings.json) as required must still see "required iter-root file missing" when the file is absent, independent of the global whitelist semantics. Suite: 1247 passed, 1 skipped (+11 vs upstream/reflective baseline).
1 parent 46a288f commit efc748f

5 files changed

Lines changed: 328 additions & 8 deletions

File tree

orchestrator/cli.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -413,7 +413,7 @@ def _render_schema_markdown(schema: dict, *, artifact: str) -> str:
413413
lines.append("- `nous create-campaign --to ./campaign.yaml` — scaffold a heavily-commented starting point.")
414414
lines.append("- `nous run campaign.yaml` — run a campaign (default `--agent sdk`).")
415415
lines.append("- `nous run campaign.yaml --bundle ./bundle.yaml` — skip DESIGN with a pre-authored bundle (#188).")
416-
lines.append("- `nous stop <target>` — ask a running campaign to halt at the next iteration boundary.")
416+
lines.append("- `nous stop <target>` — ask a running campaign to halt at the next phase boundary (#198).")
417417
lines.append("- `nous status --watch <target>` — live progress, including a STUCK marker after 5 min of silence.")
418418
return "\n".join(lines)
419419

@@ -879,7 +879,17 @@ def main():
879879
p_stop = subparsers.add_parser(
880880
"stop",
881881
help="Ask a running campaign to halt cleanly at the next "
882-
"iteration boundary by writing a STOP sentinel.",
882+
"phase boundary by writing a STOP sentinel (#198: honoured "
883+
"at DESIGN / HUMAN_DESIGN_GATE / EXECUTE_ANALYZE / "
884+
"HUMAN_FINDINGS_GATE transitions, not just between iterations).",
885+
description=(
886+
"Write a STOP sentinel that the running campaign honours at "
887+
"the next phase boundary (#198). Phase boundaries are DESIGN, "
888+
"HUMAN_DESIGN_GATE, EXECUTE_ANALYZE, and HUMAN_FINDINGS_GATE — "
889+
"so the operator can halt cleanly without waiting for the "
890+
"next iteration. Mid-phase interruption (a wedged BLIS "
891+
"subprocess, a stuck SDK turn) is still SIGINT's job."
892+
),
883893
)
884894
p_stop.add_argument(
885895
"target",

orchestrator/schemas/campaign.schema.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,21 @@ properties:
164164
Additional filenames allowed at the iter root (in addition to
165165
the global whitelist in orchestrator/validate.py). Names only,
166166
not paths — files in subdirectories are always allowed.
167+
required_iter_root:
168+
type: array
169+
items:
170+
type: string
171+
minLength: 1
172+
uniqueItems: true
173+
description: >
174+
Filenames that MUST exist at the iter root after EXECUTE_ANALYZE
175+
completes (#199 v2). validate_execution emits a "required
176+
iter-root file missing: X" error for each missing entry,
177+
turning silent omissions into structured failures (sister to
178+
#187's DesignIncompleteError and #200's
179+
ExecuteAnalyzeIncompleteError). Required files are implicitly
180+
allowed at the iter root — no need to also list them in
181+
iter_root_extensions. Names only, not paths.
167182
168183
max_turns:
169184
type: object

orchestrator/validate.py

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,40 @@ def _campaign_iter_root_extensions(campaign: dict | None) -> frozenset[str]:
8282
return frozenset(str(x) for x in extensions if x)
8383

8484

85+
def _campaign_required_iter_root(campaign: dict | None) -> frozenset[str]:
86+
"""Read ``campaign.validation.required_iter_root`` (#199 v2).
87+
88+
Files declared here are treated as MUST-EXIST at validate_execution
89+
time. Required ⊆ allowed: a required file is also implicitly an
90+
iter-root extension, so campaigns don't need to list it twice.
91+
"""
92+
if not campaign:
93+
return frozenset()
94+
validation = campaign.get("validation") or {}
95+
required = validation.get("required_iter_root") or []
96+
return frozenset(str(x) for x in required if x)
97+
98+
99+
def _check_required_iter_root(
100+
iter_dir: Path, required: set[str] | frozenset[str],
101+
) -> list[str]:
102+
"""Return one error per required iter-root file that's missing.
103+
104+
#199 v2: campaign.validation.required_iter_root declares files the
105+
campaign promises to produce by EXECUTE_ANALYZE end. Missing entries
106+
are surfaced with a clear "required iter-root file missing: X"
107+
message so the operator (or a future incomplete-iteration diagnostic
108+
in the spirit of #187 / #200) sees what the campaign committed to.
109+
"""
110+
errors: list[str] = []
111+
if not iter_dir.is_dir():
112+
return errors
113+
for name in sorted(required):
114+
if not (iter_dir / name).exists():
115+
errors.append(f"required iter-root file missing: {name}")
116+
return errors
117+
118+
85119
def _validate_ground_truth_independence(bundle: dict) -> list[str]:
86120
"""Cross-field check that the ground truth can disagree with the detector (issue #85).
87121
@@ -279,9 +313,13 @@ def validate_design(iter_dir: Path, campaign: dict | None = None) -> dict:
279313
elif handoff_path.stat().st_size == 0:
280314
errors.append("handoff_snapshot.md is empty")
281315

282-
errors.extend(
283-
_check_unexpected_files(iter_dir, _campaign_iter_root_extensions(campaign))
284-
)
316+
# #199 v2: required ⊆ allowed at design time too. We don't enforce
317+
# required-presence here (most required files are written during
318+
# EXECUTE, not DESIGN), but if the campaign agent does write one
319+
# during DESIGN, the unexpected-file check must not reject it.
320+
extensions = _campaign_iter_root_extensions(campaign)
321+
required = _campaign_required_iter_root(campaign)
322+
errors.extend(_check_unexpected_files(iter_dir, extensions | required))
285323

286324
if errors:
287325
return {"status": "fail", "errors": errors}
@@ -399,9 +437,13 @@ def validate_execution(iter_dir: Path, campaign: dict | None = None) -> dict:
399437
except KeyError as exc:
400438
errors.append(f"bundle.yaml arm missing required field: {exc}")
401439

402-
errors.extend(
403-
_check_unexpected_files(iter_dir, _campaign_iter_root_extensions(campaign))
404-
)
440+
# #199 v2: required ⊆ allowed (a required file is also implicitly
441+
# allowed at iter-root, so campaigns don't have to declare it
442+
# twice). Merge before the unexpected-file check.
443+
extensions = _campaign_iter_root_extensions(campaign)
444+
required = _campaign_required_iter_root(campaign)
445+
errors.extend(_check_unexpected_files(iter_dir, extensions | required))
446+
errors.extend(_check_required_iter_root(iter_dir, required))
405447

406448
if errors:
407449
return {"status": "fail", "errors": errors}

tests/test_nous_stop.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,3 +395,61 @@ def test_sentinel_after_design_halts_before_design_gate(
395395
campaign, work_dir, iteration=1, agent="inline",
396396
auto_approve=True,
397397
)
398+
399+
400+
# ─── Documentation parity (#198) ─────────────────────────────────────────
401+
#
402+
# The behavior of `nous stop` is honoured at every phase boundary
403+
# post-#198 (see TestEnterPhaseHonorsSentinel above). User-facing
404+
# documentation must reflect that — the legacy "iteration boundary"
405+
# wording lags the post-#198 reality and misleads operators.
406+
407+
408+
class TestStopDocumentationReflectsPhaseAwareness:
409+
"""#198: every user-facing surface must describe `nous stop` as
410+
halting at a phase boundary, not an iteration boundary."""
411+
412+
def test_argparse_help_says_phase_boundary(
413+
self, monkeypatch, capsys,
414+
) -> None:
415+
"""`nous stop --help` (the argparse text) must announce phase
416+
boundary, not iteration boundary."""
417+
import sys
418+
from orchestrator.cli import main
419+
420+
monkeypatch.setattr(sys, "argv", ["nous", "stop", "--help"])
421+
with pytest.raises(SystemExit):
422+
main() # --help triggers SystemExit(0)
423+
out = capsys.readouterr().out
424+
# argparse line-wraps long descriptions, so normalize whitespace
425+
# before substring-matching.
426+
normalized = " ".join(out.split())
427+
assert "phase boundary" in normalized, (
428+
f"#198: nous stop help must say 'phase boundary'; got: "
429+
f"{normalized!r}"
430+
)
431+
assert "iteration boundary" not in normalized, (
432+
f"#198: legacy 'iteration boundary' phrasing leaked into "
433+
f"argparse help: {normalized!r}"
434+
)
435+
436+
def test_schema_doc_render_says_phase_boundary(self, capsys) -> None:
437+
"""`nous schema` markdown render references `nous stop` in its
438+
See-also section. That string must announce phase boundary too."""
439+
from orchestrator.cli import _cmd_schema
440+
441+
ns = argparse.Namespace(artifact="campaign", format="md")
442+
_cmd_schema(ns)
443+
out = capsys.readouterr().out
444+
stop_lines = [ln for ln in out.splitlines() if "`nous stop" in ln]
445+
assert stop_lines, (
446+
"schema doc render should describe `nous stop` in See-also"
447+
)
448+
joined = "\n".join(stop_lines)
449+
assert "phase boundary" in joined, (
450+
f"#198: schema doc must say 'phase boundary'; got: {joined!r}"
451+
)
452+
assert "iteration boundary" not in joined, (
453+
f"#198: legacy 'iteration boundary' leaked into schema doc: "
454+
f"{joined!r}"
455+
)

tests/test_validate.py

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,3 +453,198 @@ def test_subdirectories_ignored(self, tmp_path: Path) -> None:
453453
(d / "results").mkdir()
454454
(d / "inputs").mkdir()
455455
assert _check_unexpected_files(d) == []
456+
457+
458+
# ─── #199 v2 — required_iter_root ────────────────────────────────────────
459+
#
460+
# Sister field to iter_root_extensions: campaigns can declare files that
461+
# MUST exist at the iter root after EXECUTE_ANALYZE completes. Missing
462+
# entries fail validate_execution. Mirrors the #187 / #200 pattern of
463+
# turning silent omissions into structured errors.
464+
465+
466+
class TestRequiredIterRoot:
467+
"""#199 v2: campaign.validation.required_iter_root makes campaign-
468+
specific iter-root files mandatory at validate_execution time. The
469+
validator fails with a clear ``required iter-root file missing: X``
470+
error so operators see what the campaign promised to produce."""
471+
472+
def test_missing_required_file_fails_execution(
473+
self, tmp_path: Path,
474+
) -> None:
475+
d = tmp_path / "iter-1"
476+
_setup_execution(d)
477+
# No analysis_summary.json on disk.
478+
campaign = {
479+
"validation": {
480+
"required_iter_root": ["analysis_summary.json"],
481+
# Required files are also implicitly allowed (so a
482+
# campaign doesn't have to list them in both blocks).
483+
},
484+
}
485+
result = validate_execution(d, campaign=campaign)
486+
assert result["status"] == "fail"
487+
assert any(
488+
"required iter-root file missing" in e
489+
and "analysis_summary.json" in e
490+
for e in result["errors"]
491+
), result["errors"]
492+
493+
def test_present_required_file_passes(self, tmp_path: Path) -> None:
494+
d = tmp_path / "iter-1"
495+
_setup_execution(d)
496+
(d / "analysis_summary.json").write_text("{}")
497+
campaign = {
498+
"validation": {
499+
"required_iter_root": ["analysis_summary.json"],
500+
},
501+
}
502+
result = validate_execution(d, campaign=campaign)
503+
assert result["status"] == "pass", result
504+
505+
def test_required_file_implicitly_allowed_at_iter_root(
506+
self, tmp_path: Path,
507+
) -> None:
508+
"""A required file should NOT need to be listed in
509+
iter_root_extensions to avoid the unexpected-file rejection.
510+
Required ⊆ allowed — the validator merges them automatically."""
511+
d = tmp_path / "iter-1"
512+
_setup_execution(d)
513+
(d / "probe_report.md").write_text("# Probe report\n")
514+
campaign = {
515+
"validation": {
516+
"required_iter_root": ["probe_report.md"],
517+
# Note: iter_root_extensions intentionally omitted.
518+
},
519+
}
520+
result = validate_execution(d, campaign=campaign)
521+
assert result["status"] == "pass", result
522+
# Negative: the unexpected-file error must not have fired.
523+
if result.get("errors"):
524+
assert not any(
525+
"unexpected file" in e and "probe_report.md" in e
526+
for e in result["errors"]
527+
), result["errors"]
528+
529+
def test_missing_required_listed_alongside_other_errors(
530+
self, tmp_path: Path,
531+
) -> None:
532+
"""Required-file check runs even when other errors exist —
533+
validation collects all errors so the operator sees the full
534+
picture in one pass."""
535+
d = tmp_path / "iter-1"
536+
_setup_execution(d)
537+
(d / "findings.json").unlink() # primary error
538+
campaign = {
539+
"validation": {
540+
"required_iter_root": ["analysis_summary.json"],
541+
},
542+
}
543+
result = validate_execution(d, campaign=campaign)
544+
assert result["status"] == "fail"
545+
joined = " | ".join(result["errors"])
546+
assert "findings.json" in joined
547+
assert "analysis_summary.json" in joined
548+
549+
def test_no_required_block_keeps_default_behavior(
550+
self, tmp_path: Path,
551+
) -> None:
552+
"""Campaigns without required_iter_root behave exactly as
553+
before this PR (backward-compat)."""
554+
d = tmp_path / "iter-1"
555+
_setup_execution(d)
556+
# No analysis_summary.json on disk; no required_iter_root either.
557+
result = validate_execution(d, campaign={})
558+
assert result["status"] == "pass", result
559+
560+
def test_required_combined_with_extensions(
561+
self, tmp_path: Path,
562+
) -> None:
563+
"""A campaign declares both required and optional iter-root
564+
files; only the required one is enforced for presence."""
565+
d = tmp_path / "iter-1"
566+
_setup_execution(d)
567+
(d / "analysis_summary.json").write_text("{}") # required, present
568+
# manifest.json is "extensions" (optional) and absent — no error.
569+
campaign = {
570+
"validation": {
571+
"required_iter_root": ["analysis_summary.json"],
572+
"iter_root_extensions": ["manifest.json"],
573+
},
574+
}
575+
result = validate_execution(d, campaign=campaign)
576+
assert result["status"] == "pass", result
577+
578+
def test_required_file_at_design_time_is_allowed_not_required(
579+
self, tmp_path: Path,
580+
) -> None:
581+
"""validate_design merges required ⊆ allowed (so a campaign that
582+
writes a required file during DESIGN doesn't get rejected by the
583+
unexpected-file check), but does NOT enforce required-presence
584+
at design time — most required artifacts are EXECUTE-phase
585+
outputs (e.g. probe_report.md is written during EXECUTE for
586+
paper-* campaigns).
587+
588+
Pins both invariants:
589+
(a) present-during-design → no "unexpected file" rejection
590+
(b) absent-during-design → still passes design (no required-presence enforcement)
591+
"""
592+
campaign = {
593+
"validation": {"required_iter_root": ["probe_report.md"]},
594+
}
595+
596+
# (a) Present during DESIGN — required ⊆ allowed must let it through.
597+
d1 = tmp_path / "iter-1"
598+
_setup_design(d1)
599+
(d1 / "probe_report.md").write_text("# Probe report\n")
600+
result = validate_design(d1, campaign=campaign)
601+
assert result["status"] == "pass", result
602+
603+
# (b) Absent during DESIGN — must still pass (only validate_execution
604+
# enforces required-presence).
605+
d2 = tmp_path / "iter-2"
606+
_setup_design(d2)
607+
result = validate_design(d2, campaign=campaign)
608+
assert result["status"] == "pass", result
609+
610+
def test_required_overlapping_known_root_file_still_enforced(
611+
self, tmp_path: Path,
612+
) -> None:
613+
"""A campaign may declare a file already in _KNOWN_ROOT_FILES
614+
(e.g. findings.json) as required. The required-presence check
615+
must still fire when the file is missing, even though the
616+
unexpected-file check would never have flagged it.
617+
"""
618+
d = tmp_path / "iter-1"
619+
_setup_execution(d)
620+
(d / "findings.json").unlink() # in _KNOWN_ROOT_FILES, now absent.
621+
campaign = {
622+
"validation": {"required_iter_root": ["findings.json"]},
623+
}
624+
result = validate_execution(d, campaign=campaign)
625+
assert result["status"] == "fail"
626+
assert any(
627+
"required iter-root file missing" in e and "findings.json" in e
628+
for e in result["errors"]
629+
), result["errors"]
630+
631+
def test_schema_accepts_required_iter_root(self) -> None:
632+
"""The campaign.schema.yaml must accept the new field — without
633+
a schema entry, jsonschema validation in nous run would reject
634+
the campaign before the validator ever sees it."""
635+
from orchestrator.validate import _load_yaml_schema
636+
import jsonschema
637+
638+
schema = _load_yaml_schema("campaign.schema.yaml")
639+
campaign = {
640+
"research_question": "Does X work?",
641+
"run_id": "demo",
642+
"target_system": {"name": "T", "description": "d"},
643+
"prompts": {"methodology_layer": "p"},
644+
"validation": {
645+
"required_iter_root": ["probe_report.md"],
646+
"iter_root_extensions": ["analysis_summary.json"],
647+
},
648+
}
649+
# Should not raise.
650+
jsonschema.validate(campaign, schema)

0 commit comments

Comments
 (0)