Skip to content

Commit 3d6f4fd

Browse files
sriumcpclaude
andcommitted
fix(review): address PR #234 review findings
Addresses 4 reviewer reports against PR #234. All findings actioned except those explicitly out of scope (silent-failure-hunter's "write to sibling file" alternative — overkill given the tripwire is non-blocking). ## Critical fixes 1. **Porcelain parser rewrite** (orchestrator/worktree.py) The substring filter `"M" in status` was too loose (matched phantom code combinations) and too narrow (missed staged-add `A`, renames `R`, copies `C`, typechanges `T`). Rename paths in porcelain v1 use `orig -> new` syntax that the old parser reported as a single garbage path. New: explicit `_PORCELAIN_WRITE_CODES` set, `_parse_porcelain_line` helper that splits rename destinations correctly. Untracked is handled via the `??` special case; deletions (`D`) are documented as intentionally NOT surfaced (removing a file isn't a "write"). 2. **Tripwire hoisted into execute-incomplete branch** (orchestrator/iteration.py) The original implementation only ran `detect_undeclared_writes` on EXECUTE_ANALYZE success. But that's exactly when undeclared writes matter LEAST — the executor wrote findings.json and likely declared what it needed. The case that matters MOST is `_missing_execute_artifacts` (max-turn exhaustion, subprocess crash) where the executor wrote partial code and never got to declare it. New `_detect_undeclared_writes_for_iter` helper runs in both branches; results land in retry_log.jsonl on incomplete and findings.json on success. 3. **`detect_undeclared_writes` git-failure log: DEBUG → WARNING** The whole point of the helper is "turn silent loss into an auditable trail." Swallowing diagnostic failures at DEBUG re-introduces the silent loss for the failure case. Now logs at WARNING with returncode + stderr. 4. **`_declared_code_change_paths` YAML failure logged at ERROR** bundle.yaml is a system boundary; corruption is operator- actionable. Previously: silent empty-set return. Now: ERROR log naming the parse error and the bundle path before returning empty. ## Important fixes 5. **`except BaseException` → `except Exception`** in create_experiment_worktree's extras-cleanup path. BaseException catches KeyboardInterrupt/SystemExit; we don't want to trigger a `git worktree remove` subprocess on Ctrl-C. 6. **Existing-path collision in `_link_worktree_extras`** now logs a loud, explanatory WARNING (was a quiet warn-and-continue) — names the collision, explains why declaring a tracked path as an extra is almost certainly a misconfiguration, and tells the operator how to fix. 7. **`_record_undeclared_writes_in_findings` JSON failure logged at ERROR** with the undeclared-paths list. Previously: silent return. Now the operator can recover the list from logs even when findings.json is corrupt. 8. **Parity: success branch now uses `experiment_id` AND `experiment_dir` guards**, matching the cleanup path's `experiment_id` guard. A future refactor that decouples them won't silently disable the tripwire. ## Test additions (+8 behavioral tests) - `test_modify_then_stage_flagged` — porcelain MM - `test_added_staged_file_flagged` — porcelain A - `test_renamed_file_reports_destination` — rename parser - `test_deleted_file_not_flagged` — codify the D-skip behavior - `test_renamed_destination_can_be_declared` — declared-paths filter survives rename round-trip - `test_git_failure_logs_warning` — diagnostic failures aren't silent - `test_malformed_findings_logs_error_with_paths` — recovery info in logs - `test_corrupted_bundle_logs_error` — bundle corruption is loud ## Test relaxations (review-flagged brittleness) - `TestWorktreeDisciplineGuidance` no longer asserts the literal string "Do not `cd` to the parent repo" — anchors on "Worktree discipline", "worktree_extras", "code_changes" instead. Editorial prose tweaks won't break the test; the *concepts* still must be present. - `test_logs_missing_placeholders_and_context_keys` now matches by substring (not exact list-repr). A future format swap (JSON, comma-joined) preserves the diagnostic intent. ## Test plan - 1175 passed, 1 skipped (was 1167; +8 new). Zero regressions. - All tests behavioral; real on-disk git repos in tmp_path; caplog for diagnostic-line assertions. Refs #228, closes #229 #230 #231 #232 (v1). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b6dfa08 commit 3d6f4fd

4 files changed

Lines changed: 289 additions & 49 deletions

File tree

orchestrator/iteration.py

Lines changed: 71 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,31 @@ class IterationOutcome(str, Enum):
5252

5353
def _declared_code_change_paths(bundle_path: Path) -> set[str]:
5454
"""Read ``bundle.yaml`` and return every ``arms[].code_changes[].file``
55-
relative path declared on any arm. Returns an empty set if the file
56-
is missing, unparseable, or declares no ``code_changes`` (#230)."""
55+
declared on any arm (#230).
56+
57+
Returns the set of declared file paths verbatim — paths are NOT
58+
normalized. If the bundle declares an absolute path or a ``./``-
59+
prefixed path, it won't match the relative paths that
60+
``detect_undeclared_writes`` reports; that mismatch is the bundle
61+
author's responsibility (the bundle schema already constrains
62+
``file`` shape).
63+
64+
Returns an empty set when the bundle is missing or declares no
65+
``code_changes``. A YAML parse failure is logged at ERROR (the
66+
bundle is a system boundary; corruption is operator-actionable)
67+
and an empty set returned so cleanup proceeds.
68+
"""
5769
if not bundle_path.exists():
5870
return set()
5971
try:
6072
bundle = yaml.safe_load(bundle_path.read_text()) or {}
61-
except yaml.YAMLError:
73+
except yaml.YAMLError as exc:
74+
logger.error(
75+
"_declared_code_change_paths: bundle.yaml parse failed at %s "
76+
"(%s); treating as if no code_changes were declared. Every "
77+
"executor write will be flagged as undeclared until this is fixed.",
78+
bundle_path, exc,
79+
)
6280
return set()
6381
arms = bundle.get("arms") or []
6482
declared: set[str] = set()
@@ -77,17 +95,50 @@ def _record_undeclared_writes_in_findings(
7795
) -> None:
7896
"""Merge ``worktree_uncommitted_writes`` into ``findings.json`` (#230).
7997
80-
No-op if findings.json is missing or unparseable — the schema
81-
validator downstream will already complain and we don't want to
82-
block worktree cleanup on a malformed findings artifact."""
98+
No-op if findings.json is missing — the cleanup may be running in
99+
the execute-incomplete branch where findings was never produced
100+
(the caller surfaces the data via retry_log there instead).
101+
102+
A JSONDecodeError on the existing findings is logged at ERROR
103+
(corrupted findings is operator-actionable) and the function
104+
returns without writing — modifying a corrupt JSON file would
105+
only make recovery harder.
106+
"""
83107
if not undeclared or not findings_path.exists():
84108
return
85109
try:
86110
findings = json.loads(findings_path.read_text())
87-
except json.JSONDecodeError:
111+
except json.JSONDecodeError as exc:
112+
logger.error(
113+
"_record_undeclared_writes_in_findings: findings.json at %s "
114+
"is not valid JSON (%s); the undeclared-writes list will not "
115+
"be persisted. Undeclared paths: %s",
116+
findings_path, exc, undeclared,
117+
)
88118
return
89119
findings["worktree_uncommitted_writes"] = sorted(set(undeclared))
90120
atomic_write(findings_path, json.dumps(findings, indent=2) + "\n")
121+
122+
123+
def _detect_undeclared_writes_for_iter(
124+
iter_dir: Path,
125+
experiment_dir: Path,
126+
) -> list[str]:
127+
"""Detect undeclared writes in ``experiment_dir`` and log a WARNING
128+
if any are found (#230). Returns the list so the caller can decide
129+
where to persist it (findings.json on success, retry_log on
130+
incomplete). Pure tripwire — never raises, never blocks cleanup."""
131+
from orchestrator.worktree import detect_undeclared_writes
132+
declared = _declared_code_change_paths(iter_dir / "bundle.yaml")
133+
undeclared = detect_undeclared_writes(experiment_dir, declared)
134+
if undeclared:
135+
logger.warning(
136+
"Executor wrote %d files in the experiment worktree "
137+
"without declaring them in bundle.arms[].code_changes; "
138+
"they will be lost on cleanup: %s",
139+
len(undeclared), undeclared[:20],
140+
)
141+
return undeclared
91142
DEFAULTS_PATH = Path(__file__).resolve().parent / "defaults.yaml"
92143
_ARM_TYPE_RE = re.compile(r"^[a-zA-Z0-9_-]+$")
93144

@@ -1048,13 +1099,22 @@ def _max_turns_for(phase_key: str) -> int:
10481099
# "X not found" from validate_execution.
10491100
missing = _missing_execute_artifacts(iter_dir)
10501101
if missing:
1102+
# #230: even on incomplete, the executor may have written
1103+
# partial code into the worktree — exactly the case where
1104+
# undeclared writes matter most. Capture before cleanup.
1105+
incomplete_undeclared: list[str] = []
1106+
if repo_path and experiment_id and experiment_dir is not None:
1107+
incomplete_undeclared = _detect_undeclared_writes_for_iter(
1108+
iter_dir, experiment_dir,
1109+
)
10511110
from orchestrator.metrics import log_retry_event
10521111
log_retry_event(work_dir / "llm_metrics.jsonl", {
10531112
"iteration": iteration,
10541113
"phase": "execute-analyze",
10551114
"failure_type": "execute_incomplete",
10561115
"missing_artifacts": missing,
10571116
"max_turns": _max_turns_for("execute_analyze"),
1117+
"undeclared_writes": incomplete_undeclared,
10581118
})
10591119
# Clean up the experiment worktree so a re-run isn't blocked.
10601120
if repo_path and experiment_id:
@@ -1076,17 +1136,11 @@ def _max_turns_for(phase_key: str) -> int:
10761136
# worktree is removed below. Persist into findings.json so the
10771137
# design agent on iter-N+1 can see what to declare in
10781138
# ``code_changes``. Tripwire only — never blocks cleanup.
1079-
if repo_path and experiment_dir is not None:
1080-
from orchestrator.worktree import detect_undeclared_writes
1081-
declared = _declared_code_change_paths(iter_dir / "bundle.yaml")
1082-
undeclared = detect_undeclared_writes(experiment_dir, declared)
1139+
if repo_path and experiment_id and experiment_dir is not None:
1140+
undeclared = _detect_undeclared_writes_for_iter(
1141+
iter_dir, experiment_dir,
1142+
)
10831143
if undeclared:
1084-
logger.warning(
1085-
"Executor wrote %d files in the experiment worktree "
1086-
"without declaring them in bundle.arms[].code_changes; "
1087-
"they will be lost on cleanup: %s",
1088-
len(undeclared), undeclared[:20],
1089-
)
10901144
_record_undeclared_writes_in_findings(
10911145
iter_dir / "findings.json", undeclared,
10921146
)

orchestrator/worktree.py

Lines changed: 89 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,12 @@ def create_experiment_worktree(
7373
if extras:
7474
try:
7575
_link_worktree_extras(repo_path, worktree_dir, extras)
76-
except BaseException:
76+
except Exception:
7777
# If symlinking fails (bad extras config), don't leak the
7878
# half-built worktree + branch — clean up before re-raising.
79+
# Scoped to ``Exception`` so a Ctrl-C (KeyboardInterrupt)
80+
# during setup aborts fast instead of triggering a `git
81+
# worktree remove` subprocess that may itself stall.
7982
remove_experiment_worktree(repo_path, experiment_id)
8083
raise
8184

@@ -89,16 +92,24 @@ def _link_worktree_extras(
8992
) -> None:
9093
"""Symlink each entry in ``extras`` from ``repo_path`` into ``worktree_dir``.
9194
92-
Each entry must be a relative path (no leading ``/``, no ``..``
93-
traversal that escapes ``repo_path``). The source path must exist in
94-
``repo_path``. Parent directories under the worktree are created as
95-
needed.
96-
97-
If a path under the worktree already exists at the target location
98-
(typically because the entry refers to a tracked path that the
99-
worktree checkout already populated), the existing path is left
100-
untouched and a warning is logged — do not silently overwrite the
101-
real checkout with a symlink.
95+
Validation order:
96+
97+
1. Each entry must be a non-empty relative path (no leading ``/``).
98+
2. Resolved source must lie under ``repo_path`` — ``..`` traversal
99+
is permitted *syntactically* but rejected if the resolved path
100+
escapes the repo boundary.
101+
3. Source must exist in ``repo_path``.
102+
4. If the target path already exists in the worktree (typically a
103+
tracked path the checkout populated), the existing file is left
104+
untouched. This is logged at WARNING with the entry name so
105+
operators can spot a misconfigured extras list — declaring a
106+
tracked path as an extra is almost always a mistake (the agent
107+
reads main's working tree instead of main's HEAD).
108+
109+
On failure mid-loop, prior symlinks created in this call are NOT
110+
rolled back here — the caller's ``except Exception`` in
111+
``create_experiment_worktree`` (which calls
112+
``remove_experiment_worktree``) sweeps the whole worktree.
102113
"""
103114
for entry in extras:
104115
if not entry or os.path.isabs(entry):
@@ -122,28 +133,65 @@ def _link_worktree_extras(
122133

123134
link_path = worktree_dir / entry
124135
if link_path.exists() or link_path.is_symlink():
136+
# Loud warning, not silent — the executor will see main's
137+
# tracked content here instead of the symlinked target,
138+
# which subverts the campaign author's intent.
125139
logger.warning(
126-
"worktree_extras: %s already present in worktree; leaving "
127-
"the existing path untouched (entry was %r)",
128-
link_path, entry,
140+
"worktree_extras: %r collides with an existing path in "
141+
"the worktree (%s) — leaving it untouched. This usually "
142+
"means the entry refers to a tracked path; tracked paths "
143+
"should NOT be declared as extras (they're already in "
144+
"the worktree checkout). Drop %r from worktree_extras "
145+
"or rename the source.",
146+
entry, link_path, entry,
129147
)
130148
continue
131149
link_path.parent.mkdir(parents=True, exist_ok=True)
132150
os.symlink(source, link_path)
133151
logger.info("worktree_extras: linked %s -> %s", link_path, source)
134152

135153

154+
# Porcelain v1 status codes that indicate the executor produced or
155+
# changed file content the bundle didn't declare. ``M`` = modified, ``A``
156+
# = added (staged), ``R`` = renamed, ``C`` = copied, ``T`` = typechange.
157+
# ``D`` (deleted) is intentionally NOT here: removing a tracked file
158+
# isn't a "write," and surfacing it would turn ``git rm`` between arms
159+
# into noise. Untracked is signalled by the special ``??`` prefix and
160+
# handled separately in the parser.
161+
_PORCELAIN_WRITE_CODES = frozenset({"M", "A", "R", "C", "T"})
162+
163+
164+
def _parse_porcelain_line(line: str) -> tuple[str, str, str] | None:
165+
"""Parse one ``git status --porcelain`` v1 line.
166+
167+
Returns ``(index_status, worktree_status, path)`` or ``None`` for
168+
blank/short lines. For renames and copies (``R``, ``C``), the
169+
porcelain format is ``XY orig -> new``; this returns the *new*
170+
path so the caller treats the destination as the relevant write.
171+
"""
172+
if len(line) < 4:
173+
return None
174+
index_st, worktree_st = line[0], line[1]
175+
rest = line[3:]
176+
if " -> " in rest:
177+
rest = rest.split(" -> ", 1)[1]
178+
return index_st, worktree_st, rest.strip()
179+
180+
136181
def detect_undeclared_writes(
137182
worktree_path: Path,
138183
declared_paths: set[str] | None = None,
139184
) -> list[str]:
140-
"""Return paths in ``worktree_path`` that the executor wrote without
185+
"""Return paths the executor wrote in ``worktree_path`` without
141186
declaring them via the bundle's ``code_changes`` (#230).
142187
143188
Parses ``git -C <worktree_path> status --porcelain`` and reports
144-
every untracked file (``??``) and modified-tracked file (``M``) that
145-
is not already covered by ``declared_paths``. Each returned path is
146-
relative to ``worktree_path``.
189+
every porcelain line whose status indicates a write — untracked
190+
(``??``), modified (``M``), staged-add (``A``), renamed (``R``),
191+
copied (``C``), or typechanged (``T``) — in either the index or
192+
the worktree column. Deletions (``D``) are not surfaced (see
193+
``_PORCELAIN_WRITE_CODES``). Each returned path is relative to
194+
``worktree_path``; for renames, the destination path is reported.
147195
148196
Symlinks (typically created by ``worktree_extras``, #229) are
149197
excluded — they're orchestrator-managed inputs, not undeclared
@@ -154,6 +202,12 @@ def detect_undeclared_writes(
154202
``code_changes`` entry will lose the file when the worktree is
155203
cleaned up. Reporting it loudly turns the silent loss into an
156204
auditable trail.
205+
206+
Returns an empty list when ``worktree_path`` is missing or when
207+
``git status`` itself fails — the cleanup path must not break on
208+
diagnostics. Failures are logged at WARNING with returncode +
209+
stderr, so an empty return on a real git failure is loud, not
210+
silent.
157211
"""
158212
declared_paths = declared_paths or set()
159213
worktree_path = Path(worktree_path)
@@ -168,22 +222,30 @@ def detect_undeclared_writes(
168222
check=False,
169223
)
170224
if result.returncode != 0:
171-
# Worktree may have been removed already, or git is unhappy —
172-
# don't make cleanup fail because diagnostics failed.
173-
logger.debug(
174-
"git status --porcelain on %s exited %s: %s",
225+
# Diagnostic failure — we still must not block cleanup, but the
226+
# whole point of this function is "turn silent loss into an
227+
# auditable trail," so log loudly rather than at DEBUG.
228+
logger.warning(
229+
"detect_undeclared_writes: git status failed for %s "
230+
"(returncode=%s); returning empty list. stderr=%s",
175231
worktree_path, result.returncode, result.stderr.strip(),
176232
)
177233
return []
178234

179235
undeclared: list[str] = []
180236
for line in result.stdout.splitlines():
181-
if len(line) < 4:
237+
parsed = _parse_porcelain_line(line)
238+
if parsed is None:
182239
continue
183-
# Porcelain v1 prefix: 2 status chars + space + path.
184-
status = line[:2]
185-
path = line[3:].strip()
186-
if not (status.startswith("??") or "M" in status):
240+
index_st, worktree_st, path = parsed
241+
# Untracked is the special ``??`` prefix.
242+
if index_st == "?" and worktree_st == "?":
243+
relevant = True
244+
else:
245+
relevant = bool(
246+
_PORCELAIN_WRITE_CODES & {index_st, worktree_st}
247+
)
248+
if not relevant:
187249
continue
188250
if path in declared_paths:
189251
continue

tests/test_prompt_loader.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -210,11 +210,12 @@ class TestWorktreeDisciplineGuidance:
210210
)
211211

212212
def test_full_template_carries_discipline_section(self):
213+
# Structural anchors only — match the *concepts* the section
214+
# must convey, not the exact prose. Editorial tweaks to the
215+
# surrounding language must not break this test.
213216
text = (self.REAL_PROMPTS_DIR / "execute_analyze.md").read_text()
214217
assert "Worktree discipline" in text
215218
assert "worktree_extras" in text
216-
assert "Do not `cd` to the parent repo" in text
217-
# Persistence rule must be present so executors don't lose work.
218219
assert "code_changes" in text
219220

220221
def test_thin_template_carries_discipline_section(self):
@@ -273,16 +274,24 @@ def test_logs_missing_placeholders_and_context_keys(self, prompts_dir, caplog):
273274
with pytest.raises(ValueError, match="iteration_mode, mode_guidance"):
274275
loader.load("execute_analyze", partial_context)
275276

276-
# The forensic log line must carry the two new fields.
277+
# The forensic log line must carry the two new fields. Match by
278+
# substring (not exact list-repr) so a future format swap (e.g.
279+
# comma-joined values, JSON, structured logging) doesn't break
280+
# the diagnostic intent — what matters is that a human reading
281+
# the log can see the missing names AND the present names.
277282
record = next(
278283
(r for r in caplog.records if r.levelname == "ERROR"
279284
and "prompt render failed" in r.getMessage()),
280285
None,
281286
)
282287
assert record is not None, "expected ERROR-level diagnostic log line"
283288
msg = record.getMessage()
284-
assert "missing_placeholders=['iteration_mode', 'mode_guidance']" in msg
285-
assert "context_keys=['iter_dir', 'target_system']" in msg
289+
assert "missing_placeholders=" in msg
290+
assert "iteration_mode" in msg
291+
assert "mode_guidance" in msg
292+
assert "context_keys=" in msg
293+
assert "iter_dir" in msg
294+
assert "target_system" in msg
286295
assert "template=execute_analyze" in msg
287296

288297
def test_no_log_on_successful_render(self, prompts_dir, caplog):

0 commit comments

Comments
 (0)