Skip to content

Commit 285d555

Browse files
sriumcpclaude
andcommitted
Address PR review: harden NOUS_CAMPAIGN_PARENT resolver + close in-progress detection trap (#239)
Comprehensive fixup addressing all 12 issues identified by review agents. CRITICAL fixes: - C1 / silent-failure-hunter C3: `_cmd_run` now uses `find_existing_work_dir` to check BOTH legacy and env-var locations for in-progress detection. Fixes the regression where toggling NOUS_CAMPAIGN_PARENT between runs could silently allow a parallel run that corrupts shared worktrees. - C2 / silent-failure-hunter C1: empty/whitespace `NOUS_CAMPAIGN_PARENT` now raises ValueError instead of silently falling through to legacy. Bash `export NOUS_CAMPAIGN_PARENT=$UNSET` is a common typo that the user explicitly tried to opt out of. - C3 / pr-test-analyzer gap #2: `tests/conftest.py` autouse fixture now strips `NOUS_CAMPAIGN_PARENT` so existing tests don't fail for contributors who follow README's recommendation to export it. - C4 / comment-analyzer #1: README "byte-identical" claim corrected; scope tightened to "the resolved work_dir is unchanged". IMPORTANT fixes: - I1 / pr-test-analyzer gap #1: 4 new CLI tests cover yaml + bare-run-id with env-var, including the test that would have caught C1. - I2 / comment-analyzer #3 / silent-failure-hunter C4: README adds migration command (`mv <repo>/.nous/<run> $NOUS_CAMPAIGN_PARENT/<run>`). `_cmd_run` emits a migration hint when a campaign is found at the legacy path while env var is set. - I3 / silent-failure-hunter I5: state.json's `work_dir` field is now read by `find_existing_work_dir` — prefers the recorded path when it points to an existing campaign elsewhere (handles post-creation `mv`). - I4 / comment-analyzer #10: `cli.resolve_work_dir` legacy branch delegates to the resolver via `find_existing_work_dir` instead of reimplementing path logic. - I5 / silent-failure-hunter I1: `repo_path` validated for existence in resolver; raises FileNotFoundError on typos. - I6 / code-reviewer #2: explicit `Path(...)` wrap restored at CLI boundary for type robustness. - I7 / pr-test-analyzer gap #3: schema validator regression test added; state.json shape verified post-setup. - I8 / silent-failure-hunter D1: collision detection. `setup_work_dir` refuses to clobber a same-named campaign at the env-var path that records a different `repo_path`. Adds `repo_path` field to state.json schema. Suggestion fixes: - comment-analyzer #2: Resolution rules consolidated in `work_dir_resolver.py:RESOLUTION RULES` marker comment; other docstrings reference it. Reduces drift across 5 places. - comment-analyzer #5: schema description tightened (drop implementation detail; mark fields as machine-local). - comment-analyzer #7: `_TEMPLATE` bullet structure restructured for unambiguous conditional reading. - comment-analyzer #4: cross-machine portability story documented in schema descriptions (machine-local with caveat). - comment-analyzer #8: README line 166 (post-#239) updated to reflect the conditional default. - silent-failure-hunter I2: setup_work_dir wraps mkdir with OSError surfacing env-var context. - silent-failure-hunter I4: `_find_repo_root` failure message now mentions `NOUS_CAMPAIGN_PARENT` as an alternative. - pr-test-analyzer suggestions: edge-case tests added (whitespace env var, trailing slash, env-var-change scenario, corrupt state.json, recorded-path-wins, recorded-path-nonexistent-falls-back). Schema additions: - `state.json` gains `work_dir` and `repo_path` fields (both nullable strings; absolute paths recorded at setup_work_dir). - `state.schema.json` extends with both fields and machine-local caveats in descriptions. Test count: - Before: 12 tests (work_dir_resolver only) - After: 32 tests covering resolver + setup + CLI + schema validation - Full suite: 1,215 pass, 2 skip (was 1,195/2/0 — 20 new tests, 0 regressions). Backward compatibility preserved: - `NOUS_CAMPAIGN_PARENT` unset → resolved path is unchanged - Pre-#239 campaigns at `<repo>/.nous/<run>/` continue to be found by `find_existing_work_dir` even when env var is set (migration grace). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9bf9732 commit 285d555

9 files changed

Lines changed: 732 additions & 154 deletions

File tree

README.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,9 +88,20 @@ By default, Nous creates each campaign's working directory at `<target_repo>/.no
8888
export NOUS_CAMPAIGN_PARENT=~/Documents/Projects/nous-campaigns
8989
```
9090

91-
When `NOUS_CAMPAIGN_PARENT` is set, campaign artifacts live at `$NOUS_CAMPAIGN_PARENT/<run_id>/` — wholly outside the target repo. Code worktrees (per-arm BLIS branches, #133) continue to live at `<target>/.nous-experiments/<run_id>/<arm>/` because they ARE code FOR the target. The target repo's working tree stays clean. Backward-compat: when the env var is unset, behavior is byte-identical to today.
91+
When `NOUS_CAMPAIGN_PARENT` is set, campaign artifacts live at `$NOUS_CAMPAIGN_PARENT/<run_id>/` — wholly outside the target repo. Code worktrees (per-arm BLIS branches, #133) continue to live at `<target>/.nous-experiments/<run_id>/<arm>/` because they ARE code FOR the target. The target repo's working tree stays clean. Backward-compat: when the env var is unset, the resolved `work_dir` is unchanged — `<repo_path>/.nous/<run_id>/`, exactly as today.
9292

93-
The resolved absolute work_dir is recorded in each campaign's `state.json` (`work_dir` field) so the campaign location is robust to env var changes between runs.
93+
The resolved absolute `work_dir` and `repo_path` are recorded in each campaign's `state.json` so the campaign location and target survive env-var changes between runs. `nous resume` and `nous status` look up campaigns at both the env-var location AND the legacy location, so existing pre-#239 campaigns continue to work without immediate migration.
94+
95+
**Migrating existing campaigns** to the new location is a one-time operation per campaign:
96+
97+
```bash
98+
# 1. Set the env var first (in your shell rc).
99+
# 2. Move existing campaign(s):
100+
mv <target_repo>/.nous/<run_id> $NOUS_CAMPAIGN_PARENT/<run_id>
101+
# 3. Continue using the campaign as before. Nous will find it at the new location.
102+
```
103+
104+
state.json's recorded `work_dir` will be stale until the campaign's next setup; that's fine — `find_existing_work_dir` checks the actual directory existence, not just state.json's recorded path.
94105

95106
### 1. Install Nous
96107

@@ -152,7 +163,7 @@ target_system:
152163
repo_path: /path/to/your/repo
153164
```
154165
155-
When `repo_path` is set, the campaign directory is created inside the target repo at `.nous/<run_id>/`. All artifacts live there.
166+
When `repo_path` is set, the campaign directory is created at `$NOUS_CAMPAIGN_PARENT/<run_id>/` if you've set that env var (recommended — see [Environment setup](#environment-setup) above), or otherwise at the legacy `<target_repo>/.nous/<run_id>/`. All artifacts live there.
156167

157168
To discover the full schema (required vs optional fields, descriptions
158169
verbatim from the schema source), run:

orchestrator/cli.py

Lines changed: 92 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,32 @@ def _find_repo_root(start=None):
1414
if parent == current:
1515
break
1616
current = parent
17-
print("Could not find .nous/ directory in any parent", file=sys.stderr)
17+
print(
18+
f"Could not find .nous/ directory in any parent of {Path.cwd()}. "
19+
f"Either run from inside the target repo, pass an explicit "
20+
f"work_dir path, or set NOUS_CAMPAIGN_PARENT (#239).",
21+
file=sys.stderr,
22+
)
1823
sys.exit(1)
1924

2025

2126
def resolve_work_dir(target):
2227
"""Resolve a CLI ``target`` (yaml path | dir | run_id) to a work_dir.
2328
24-
For yaml inputs and bare run_ids, honors ``NOUS_CAMPAIGN_PARENT``
25-
(#239) consistently with ``setup_work_dir``: if the env var is set,
26-
the work_dir is ``$NOUS_CAMPAIGN_PARENT/<run_id>/``; else legacy
27-
``<repo_path>/.nous/<run_id>/`` or CWD-walk.
29+
Honors NOUS_CAMPAIGN_PARENT (#239) and finds existing campaigns
30+
that may live at the legacy ``<repo>/.nous/<run_id>/`` path even
31+
when the env var is set (so users with pre-#239 campaigns can still
32+
run ``nous status`` / ``nous resume`` without first migrating).
2833
2934
For an explicit dir target with state.json, the dir is taken at
30-
face value (state.json's own existence implies the work_dir is
31-
valid at that path).
35+
face value.
3236
"""
33-
from orchestrator.work_dir_resolver import resolve_work_dir as _resolve
37+
import os
38+
39+
from orchestrator.work_dir_resolver import (
40+
ENV_VAR,
41+
find_existing_work_dir,
42+
)
3443

3544
if target.endswith(".yaml") or target.endswith(".yml"):
3645
p = Path(target)
@@ -46,12 +55,33 @@ def resolve_work_dir(target):
4655
print(f"Campaign file {target} is empty or not a YAML mapping", file=sys.stderr)
4756
sys.exit(1)
4857
try:
49-
repo_path = data["target_system"]["repo_path"]
58+
# Wrap in Path() for type-robustness: surfaces a clear error
59+
# immediately if repo_path is the wrong type (e.g. an int
60+
# from a hand-edited yaml) rather than failing later in the
61+
# resolver with a less helpful message.
62+
repo_path = (
63+
Path(data["target_system"]["repo_path"])
64+
if data["target_system"].get("repo_path") is not None
65+
else None
66+
)
5067
run_id = data["run_id"]
5168
except (KeyError, TypeError) as exc:
5269
print(f"Campaign file {target} missing required field: {exc}", file=sys.stderr)
5370
sys.exit(1)
54-
return _resolve(run_id, repo_path)
71+
try:
72+
work_dir = find_existing_work_dir(run_id, repo_path)
73+
except (ValueError, FileNotFoundError) as exc:
74+
print(f"Campaign location resolution failed: {exc}", file=sys.stderr)
75+
sys.exit(1)
76+
if work_dir is None:
77+
print(
78+
f"Work directory not found for run_id={run_id!r} "
79+
f"(checked {ENV_VAR} location and "
80+
f"{repo_path!s}/.nous/{run_id}/ if applicable).",
81+
file=sys.stderr,
82+
)
83+
sys.exit(1)
84+
return work_dir
5585

5686
p = Path(target)
5787
if p.is_dir() and (p / "state.json").exists():
@@ -62,16 +92,29 @@ def resolve_work_dir(target):
6292
sys.exit(1)
6393

6494
run_id = target
65-
# #239: prefer NOUS_CAMPAIGN_PARENT if set; else fall back to the
66-
# CWD-walk pattern.
67-
import os
68-
if os.environ.get("NOUS_CAMPAIGN_PARENT"):
69-
work_dir = _resolve(run_id, repo_path=None)
70-
else:
95+
# Bare run_id: prefer find_existing_work_dir (env-var path), then
96+
# fall back to CWD-walk for the legacy invocation pattern.
97+
work_dir = None
98+
try:
99+
work_dir = find_existing_work_dir(run_id, repo_path=None)
100+
except ValueError as exc:
101+
print(f"Campaign location resolution failed: {exc}", file=sys.stderr)
102+
sys.exit(1)
103+
if work_dir is None and not os.environ.get(ENV_VAR):
104+
# No env var set: fall through to legacy CWD-walk for
105+
# backward-compat with `nous status <run_id>` invoked from
106+
# inside the target repo.
71107
root = _find_repo_root()
72-
work_dir = root / ".nous" / run_id
73-
if not work_dir.is_dir():
74-
print(f"Work directory not found: {work_dir}", file=sys.stderr)
108+
candidate = root / ".nous" / run_id
109+
if candidate.is_dir():
110+
work_dir = candidate
111+
if work_dir is None:
112+
print(
113+
f"Work directory not found for run_id={run_id!r}. "
114+
f"Either run from inside the target repo, pass an explicit "
115+
f"work_dir path, or set {ENV_VAR}.",
116+
file=sys.stderr,
117+
)
75118
sys.exit(1)
76119
return work_dir
77120

@@ -106,23 +149,44 @@ def _cmd_run(args):
106149
run_id = args.run_id or campaign.get("run_id") or (campaign_path.parent.name + "-run")
107150
repo_path = campaign["target_system"].get("repo_path")
108151

109-
# #239: honor NOUS_CAMPAIGN_PARENT consistently with setup_work_dir.
110-
# The state.json's absolute path is whichever resolver branch fires
111-
# for the current environment — env-var-set or legacy fallback.
112-
from orchestrator.work_dir_resolver import resolve_work_dir as _resolve
113-
state_path = _resolve(run_id, repo_path) / "state.json"
114-
if state_path.exists():
115-
state = json.loads(state_path.read_text())
152+
# #239: in-progress detection must check BOTH the legacy and
153+
# env-var locations — otherwise toggling NOUS_CAMPAIGN_PARENT
154+
# between runs would silently allow a parallel run that corrupts
155+
# shared worktrees. find_existing_work_dir consults all candidates
156+
# plus state.json's recorded work_dir.
157+
import os as _os
158+
159+
from orchestrator.work_dir_resolver import (
160+
ENV_VAR,
161+
find_existing_work_dir,
162+
)
163+
try:
164+
existing = find_existing_work_dir(run_id, repo_path)
165+
except (ValueError, FileNotFoundError) as exc:
166+
print(f"Campaign location resolution failed: {exc}", file=sys.stderr)
167+
sys.exit(1)
168+
if existing is not None:
169+
state = json.loads((existing / "state.json").read_text())
116170
# #236: read via helper so legacy ``phase`` keys still resolve.
117171
from orchestrator.engine import read_phase_field
118172
phase = read_phase_field(state)
119173
if phase != "INIT":
120174
print(
121-
f"Run '{run_id}' already in progress (phase={phase}). "
122-
f"Use 'nous resume' to continue.",
175+
f"Run '{run_id}' already in progress at {existing} "
176+
f"(phase={phase}). Use 'nous resume' to continue.",
123177
file=sys.stderr,
124178
)
125179
sys.exit(1)
180+
# Migration hint: if env var is set but the existing campaign
181+
# lives at the legacy location, point the user at `mv`.
182+
if _os.environ.get(ENV_VAR) and ".nous" in existing.parts:
183+
print(
184+
f"Note: campaign found at legacy location {existing}. "
185+
f"To migrate to {ENV_VAR}: "
186+
f"`mv {existing} ${ENV_VAR}/{run_id}` and re-run. "
187+
f"Continuing at the legacy location for now.",
188+
file=sys.stderr,
189+
)
126190

127191
work_dir = setup_work_dir(run_id, repo_path=repo_path)
128192

orchestrator/create_campaign.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -121,15 +121,16 @@
121121
122122
# Path to the target system's git repo. Used for two distinct things
123123
# (#239 keeps them cleanly separated):
124+
#
124125
# 1. Code worktrees per arm (#133) live at
125-
# <repo_path>/.nous-experiments/<run_id>/<arm>/. Always relative
126-
# to the target repo — they ARE code FOR the target.
126+
# <repo_path>/.nous-experiments/<run_id>/<arm>/. Always —
127+
# they ARE code FOR the target repo.
128+
#
127129
# 2. Campaign artifacts (state, ledger, principles, findings, JSON
128-
# results) live at:
129-
# - $NOUS_CAMPAIGN_PARENT/<run_id>/ if env var set (recommended)
130-
# - <repo_path>/.nous/<run_id>/ legacy default (untracked
131-
# in target — pollutes git
132-
# status, see #239)
130+
# results) live at $NOUS_CAMPAIGN_PARENT/<run_id>/ if you've
131+
# set that env var (recommended — see below); otherwise at the
132+
# legacy <repo_path>/.nous/<run_id>/, which pollutes the
133+
# target's git status (#239).
133134
#
134135
# Recommended setup: export NOUS_CAMPAIGN_PARENT=~/Documents/Projects/nous-campaigns
135136
# in your shell rc. Campaign artifacts then live outside the target,

orchestrator/iteration.py

Lines changed: 55 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
import argparse
2222
import json
2323
import logging
24+
import os
2425
import re
2526
import shutil
2627
import sys
@@ -726,16 +727,17 @@ def _merge_principles(work_dir: Path, iter_dir: Path) -> None:
726727
def setup_work_dir(run_id: str, repo_path: str | None = None) -> Path:
727728
"""Create and initialize a working directory from templates.
728729
729-
Resolution rules (#239):
730-
1. If ``NOUS_CAMPAIGN_PARENT`` is set, the work_dir is
731-
``$NOUS_CAMPAIGN_PARENT/<run_id>/``. Target repo is unaffected.
732-
2. Else, if repo_path is provided, falls back to legacy
733-
``<repo_path>/.nous/<run_id>/`` (preserves backward-compat).
734-
3. Else, falls back to ``<run_id>/`` in the current directory.
730+
See ``orchestrator/work_dir_resolver.py`` for the canonical
731+
RESOLUTION RULES — this function delegates path resolution there.
732+
Briefly: ``$NOUS_CAMPAIGN_PARENT/<run_id>/`` if env var set, else
733+
legacy ``<repo_path>/.nous/<run_id>/``, else ``<run_id>/`` relative
734+
to CWD (#239).
735735
736-
The resolved absolute work_dir is recorded in state.json's
737-
``work_dir`` field — this is the per-campaign source of truth for
738-
location, robust to env var changes between runs.
736+
The resolved absolute work_dir AND repo_path are recorded in
737+
state.json — this is the per-campaign source of truth for location.
738+
Used by collision detection (refuses to clobber a same-named
739+
campaign that targets a different repo, #239 D1) and by future
740+
resume/discovery tooling.
739741
740742
Worktrees are NOT affected: they continue to live at
741743
``<repo_path>/.nous-experiments/<run_id>/<arm>/`` because they are
@@ -745,28 +747,65 @@ def setup_work_dir(run_id: str, repo_path: str | None = None) -> Path:
745747
Also writes a per-campaign ``.claude/settings.json`` permission policy
746748
(issue #135) so dispatchers can pass ``--settings <path>`` instead of
747749
``--dangerously-skip-permissions``.
750+
751+
Raises:
752+
ValueError: ``NOUS_CAMPAIGN_PARENT`` set to empty/whitespace, OR
753+
an existing state.json at the resolved path records a
754+
different ``repo_path`` (run_id collision under env var).
755+
FileNotFoundError: ``repo_path`` provided but doesn't exist.
756+
OSError: filesystem error creating the work_dir (wrapped with
757+
env-var context if ``NOUS_CAMPAIGN_PARENT`` is set).
748758
"""
749759
from orchestrator.settings_template import (
750760
render_campaign_settings,
751761
settings_path_for,
752762
write_campaign_settings,
753763
)
754-
from orchestrator.work_dir_resolver import resolve_work_dir
764+
from orchestrator.work_dir_resolver import ENV_VAR, resolve_work_dir
755765

756766
work_dir = resolve_work_dir(run_id, repo_path)
757-
work_dir.mkdir(parents=True, exist_ok=True)
767+
768+
# #239 D1: collision detection. If state.json already exists with a
769+
# different recorded repo_path, the user has accidentally collided
770+
# two campaigns under the same run_id. Refuse loudly rather than
771+
# silently corrupt the existing campaign.
772+
existing_state = work_dir / "state.json"
773+
if existing_state.exists():
774+
try:
775+
prior = json.loads(existing_state.read_text())
776+
except (json.JSONDecodeError, OSError):
777+
prior = {}
778+
prior_repo = prior.get("repo_path")
779+
new_repo = str(Path(repo_path).resolve()) if repo_path else None
780+
if prior_repo and new_repo and prior_repo != new_repo:
781+
raise ValueError(
782+
f"run_id collision at {work_dir}: existing state.json "
783+
f"records repo_path={prior_repo!r} but this run targets "
784+
f"repo_path={new_repo!r}. Run_ids must be globally unique "
785+
f"under {ENV_VAR}; rename one campaign's run_id."
786+
)
787+
788+
try:
789+
work_dir.mkdir(parents=True, exist_ok=True)
790+
except OSError as exc:
791+
env_hint = ""
792+
if os.environ.get(ENV_VAR):
793+
env_hint = f" (resolved via {ENV_VAR}={os.environ[ENV_VAR]!r})"
794+
raise OSError(
795+
f"could not create campaign work_dir at {work_dir}{env_hint}: {exc}"
796+
) from exc
797+
758798
for t in ["state.json", "ledger.json", "principles.json"]:
759799
dest = work_dir / t
760800
if not dest.exists():
761801
shutil.copy(TEMPLATES_DIR / t, dest)
762802
state = json.loads((work_dir / "state.json").read_text())
763803
state["run_id"] = run_id
764-
# #239: record the resolved absolute work_dir as per-campaign source
765-
# of truth. Survives env var changes: if NOUS_CAMPAIGN_PARENT is
766-
# later unset or changed, downstream tooling can still find the
767-
# campaign by reading this field rather than re-deriving from
768-
# convention.
804+
# #239: record resolved paths as per-campaign source of truth.
805+
# work_dir survives env var changes; repo_path enables collision
806+
# detection and future cross-machine discovery.
769807
state["work_dir"] = str(work_dir.resolve())
808+
state["repo_path"] = str(Path(repo_path).resolve()) if repo_path else None
770809
atomic_write(work_dir / "state.json", json.dumps(state, indent=2) + "\n")
771810

772811
# Per-campaign permission policy. Idempotent: don't overwrite a settings

orchestrator/schemas/state.schema.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,11 @@
4040
},
4141
"work_dir": {
4242
"type": ["string", "null"],
43-
"description": "Absolute path to the campaign's work_dir, recorded at setup_work_dir (#239). The per-campaign source of truth for location: even if the user changes NOUS_CAMPAIGN_PARENT or the legacy <repo>/.nous/<run_id>/ default between runs, this field preserves the original resolved path. Null before setup. Set to the resolved Path(work_dir).resolve() at first setup; downstream tooling can read this rather than re-deriving from convention."
43+
"description": "Absolute filesystem path to the campaign's work_dir, recorded at setup_work_dir (#239). Per-campaign source of truth for location, robust to NOUS_CAMPAIGN_PARENT changes between runs. **Machine-local**: tools that travel state.json across machines must validate Path(work_dir).exists() before trusting it (or rederive from the local environment). Null before setup."
44+
},
45+
"repo_path": {
46+
"type": ["string", "null"],
47+
"description": "Absolute path to the target system's repo, recorded at setup_work_dir (#239). Used for collision detection (refusing to clobber a same-named campaign that targets a different repo when NOUS_CAMPAIGN_PARENT is set) and to identify which target a campaign belongs to. **Machine-local**, same caveats as work_dir. Null before setup or when the campaign was authored without a target_system.repo_path."
4448
}
4549
}
4650
}

orchestrator/templates/state.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,6 @@
55
"family": null,
66
"timestamp": "1970-01-01T00:00:00Z",
77
"config_ref": null,
8-
"work_dir": null
8+
"work_dir": null,
9+
"repo_path": null
910
}

0 commit comments

Comments
 (0)