fix(review): address PR #234 review findings

sriumcp · claude · sriumcp · commit 3d6f4fdada0a · 2026-05-28T06:55:07.000-04:00
Addresses 4 reviewer reports against PR #234. All findings actioned except those explicitly out of scope (silent-failure-hunter's "write to sibling file" alternative — overkill given the tripwire is non-blocking). ## Critical fixes 1. **Porcelain parser rewrite** (orchestrator/worktree.py) The substring filter `"M" in status` was too loose (matched phantom code combinations) and too narrow (missed staged-add `A`, renames `R`, copies `C`, typechanges `T`). Rename paths in porcelain v1 use `orig -> new` syntax that the old parser reported as a single garbage path. New: explicit `_PORCELAIN_WRITE_CODES` set, `_parse_porcelain_line` helper that splits rename destinations correctly. Untracked is handled via the `??` special case; deletions (`D`) are documented as intentionally NOT surfaced (removing a file isn't a "write"). 2. **Tripwire hoisted into execute-incomplete branch** (orchestrator/iteration.py) The original implementation only ran `detect_undeclared_writes` on EXECUTE_ANALYZE success. But that's exactly when undeclared writes matter LEAST — the executor wrote findings.json and likely declared what it needed. The case that matters MOST is `_missing_execute_artifacts` (max-turn exhaustion, subprocess crash) where the executor wrote partial code and never got to declare it. New `_detect_undeclared_writes_for_iter` helper runs in both branches; results land in retry_log.jsonl on incomplete and findings.json on success. 3. **`detect_undeclared_writes` git-failure log: DEBUG → WARNING** The whole point of the helper is "turn silent loss into an auditable trail." Swallowing diagnostic failures at DEBUG re-introduces the silent loss for the failure case. Now logs at WARNING with returncode + stderr. 4. **`_declared_code_change_paths` YAML failure logged at ERROR** bundle.yaml is a system boundary; corruption is operator- actionable. Previously: silent empty-set return. Now: ERROR log naming the parse error and the bundle path before returning empty. ## Important fixes 5. **`except BaseException` → `except Exception`** in create_experiment_worktree's extras-cleanup path. BaseException catches KeyboardInterrupt/SystemExit; we don't want to trigger a `git worktree remove` subprocess on Ctrl-C. 6. **Existing-path collision in `_link_worktree_extras`** now logs a loud, explanatory WARNING (was a quiet warn-and-continue) — names the collision, explains why declaring a tracked path as an extra is almost certainly a misconfiguration, and tells the operator how to fix. 7. **`_record_undeclared_writes_in_findings` JSON failure logged at ERROR** with the undeclared-paths list. Previously: silent return. Now the operator can recover the list from logs even when findings.json is corrupt. 8. **Parity: success branch now uses `experiment_id` AND `experiment_dir` guards**, matching the cleanup path's `experiment_id` guard. A future refactor that decouples them won't silently disable the tripwire. ## Test additions (+8 behavioral tests) - `test_modify_then_stage_flagged` — porcelain MM - `test_added_staged_file_flagged` — porcelain A - `test_renamed_file_reports_destination` — rename parser - `test_deleted_file_not_flagged` — codify the D-skip behavior - `test_renamed_destination_can_be_declared` — declared-paths filter survives rename round-trip - `test_git_failure_logs_warning` — diagnostic failures aren't silent - `test_malformed_findings_logs_error_with_paths` — recovery info in logs - `test_corrupted_bundle_logs_error` — bundle corruption is loud ## Test relaxations (review-flagged brittleness) - `TestWorktreeDisciplineGuidance` no longer asserts the literal string "Do not `cd` to the parent repo" — anchors on "Worktree discipline", "worktree_extras", "code_changes" instead. Editorial prose tweaks won't break the test; the *concepts* still must be present. - `test_logs_missing_placeholders_and_context_keys` now matches by substring (not exact list-repr). A future format swap (JSON, comma-joined) preserves the diagnostic intent. ## Test plan - 1175 passed, 1 skipped (was 1167; +8 new). Zero regressions. - All tests behavioral; real on-disk git repos in tmp_path; caplog for diagnostic-line assertions. Refs #228, closes #229 #230 #231 #232 (v1). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/orchestrator/iteration.py b/orchestrator/iteration.py
@@ -52,13 +52,31 @@ class IterationOutcome(str, Enum):
 
 def _declared_code_change_paths(bundle_path: Path) -> set[str]:
     """Read ``bundle.yaml`` and return every ``arms[].code_changes[].file``
-    relative path declared on any arm. Returns an empty set if the file
-    is missing, unparseable, or declares no ``code_changes`` (#230)."""
+    declared on any arm (#230).
+
+    Returns the set of declared file paths verbatim — paths are NOT
+    normalized. If the bundle declares an absolute path or a ``./``-
+    prefixed path, it won't match the relative paths that
+    ``detect_undeclared_writes`` reports; that mismatch is the bundle
+    author's responsibility (the bundle schema already constrains
+    ``file`` shape).
+
+    Returns an empty set when the bundle is missing or declares no
+    ``code_changes``. A YAML parse failure is logged at ERROR (the
+    bundle is a system boundary; corruption is operator-actionable)
+    and an empty set returned so cleanup proceeds.
+    """
     if not bundle_path.exists():
         return set()
     try:
         bundle = yaml.safe_load(bundle_path.read_text()) or {}
-    except yaml.YAMLError:
+    except yaml.YAMLError as exc:
+        logger.error(
+            "_declared_code_change_paths: bundle.yaml parse failed at %s "
+            "(%s); treating as if no code_changes were declared. Every "
+            "executor write will be flagged as undeclared until this is fixed.",
+            bundle_path, exc,
+        )
         return set()
     arms = bundle.get("arms") or []
     declared: set[str] = set()
@@ -77,17 +95,50 @@ def _record_undeclared_writes_in_findings(
 ) -> None:
     """Merge ``worktree_uncommitted_writes`` into ``findings.json`` (#230).
 
-    No-op if findings.json is missing or unparseable — the schema
-    validator downstream will already complain and we don't want to
-    block worktree cleanup on a malformed findings artifact."""
+    No-op if findings.json is missing — the cleanup may be running in
+    the execute-incomplete branch where findings was never produced
+    (the caller surfaces the data via retry_log there instead).
+
+    A JSONDecodeError on the existing findings is logged at ERROR
+    (corrupted findings is operator-actionable) and the function
+    returns without writing — modifying a corrupt JSON file would
+    only make recovery harder.
+    """
     if not undeclared or not findings_path.exists():
         return
     try:
         findings = json.loads(findings_path.read_text())
-    except json.JSONDecodeError:
+    except json.JSONDecodeError as exc:
+        logger.error(
+            "_record_undeclared_writes_in_findings: findings.json at %s "
+            "is not valid JSON (%s); the undeclared-writes list will not "
+            "be persisted. Undeclared paths: %s",
+            findings_path, exc, undeclared,
+        )
         return
     findings["worktree_uncommitted_writes"] = sorted(set(undeclared))
     atomic_write(findings_path, json.dumps(findings, indent=2) + "\n")
+
+
+def _detect_undeclared_writes_for_iter(
+    iter_dir: Path,
+    experiment_dir: Path,
+) -> list[str]:
+    """Detect undeclared writes in ``experiment_dir`` and log a WARNING
+    if any are found (#230). Returns the list so the caller can decide
+    where to persist it (findings.json on success, retry_log on
+    incomplete). Pure tripwire — never raises, never blocks cleanup."""
+    from orchestrator.worktree import detect_undeclared_writes
+    declared = _declared_code_change_paths(iter_dir / "bundle.yaml")
+    undeclared = detect_undeclared_writes(experiment_dir, declared)
+    if undeclared:
+        logger.warning(
+            "Executor wrote %d files in the experiment worktree "
+            "without declaring them in bundle.arms[].code_changes; "
+            "they will be lost on cleanup: %s",
+            len(undeclared), undeclared[:20],
+        )
+    return undeclared
 DEFAULTS_PATH = Path(__file__).resolve().parent / "defaults.yaml"
 _ARM_TYPE_RE = re.compile(r"^[a-zA-Z0-9_-]+$")
 
@@ -1048,13 +1099,22 @@ def _max_turns_for(phase_key: str) -> int:
         # "X not found" from validate_execution.
         missing = _missing_execute_artifacts(iter_dir)
         if missing:
+            # #230: even on incomplete, the executor may have written
+            # partial code into the worktree — exactly the case where
+            # undeclared writes matter most. Capture before cleanup.
+            incomplete_undeclared: list[str] = []
+            if repo_path and experiment_id and experiment_dir is not None:
+                incomplete_undeclared = _detect_undeclared_writes_for_iter(
+                    iter_dir, experiment_dir,
+                )
             from orchestrator.metrics import log_retry_event
             log_retry_event(work_dir / "llm_metrics.jsonl", {
                 "iteration": iteration,
                 "phase": "execute-analyze",
                 "failure_type": "execute_incomplete",
                 "missing_artifacts": missing,
                 "max_turns": _max_turns_for("execute_analyze"),
+                "undeclared_writes": incomplete_undeclared,
             })
             # Clean up the experiment worktree so a re-run isn't blocked.
             if repo_path and experiment_id:
@@ -1076,17 +1136,11 @@ def _max_turns_for(phase_key: str) -> int:
         # worktree is removed below. Persist into findings.json so the
         # design agent on iter-N+1 can see what to declare in
         # ``code_changes``. Tripwire only — never blocks cleanup.
-        if repo_path and experiment_dir is not None:
-            from orchestrator.worktree import detect_undeclared_writes
-            declared = _declared_code_change_paths(iter_dir / "bundle.yaml")
-            undeclared = detect_undeclared_writes(experiment_dir, declared)
+        if repo_path and experiment_id and experiment_dir is not None:
+            undeclared = _detect_undeclared_writes_for_iter(
+                iter_dir, experiment_dir,
+            )
             if undeclared:
-                logger.warning(
-                    "Executor wrote %d files in the experiment worktree "
-                    "without declaring them in bundle.arms[].code_changes; "
-                    "they will be lost on cleanup: %s",
-                    len(undeclared), undeclared[:20],
-                )
                 _record_undeclared_writes_in_findings(
                     iter_dir / "findings.json", undeclared,
                 )
diff --git a/orchestrator/worktree.py b/orchestrator/worktree.py
@@ -73,9 +73,12 @@ def create_experiment_worktree(
     if extras:
         try:
             _link_worktree_extras(repo_path, worktree_dir, extras)
-        except BaseException:
+        except Exception:
             # If symlinking fails (bad extras config), don't leak the
             # half-built worktree + branch — clean up before re-raising.
+            # Scoped to ``Exception`` so a Ctrl-C (KeyboardInterrupt)
+            # during setup aborts fast instead of triggering a `git
+            # worktree remove` subprocess that may itself stall.
             remove_experiment_worktree(repo_path, experiment_id)
             raise
 
@@ -89,16 +92,24 @@ def _link_worktree_extras(
 ) -> None:
     """Symlink each entry in ``extras`` from ``repo_path`` into ``worktree_dir``.
 
-    Each entry must be a relative path (no leading ``/``, no ``..``
-    traversal that escapes ``repo_path``). The source path must exist in
-    ``repo_path``. Parent directories under the worktree are created as
-    needed.
-
-    If a path under the worktree already exists at the target location
-    (typically because the entry refers to a tracked path that the
-    worktree checkout already populated), the existing path is left
-    untouched and a warning is logged — do not silently overwrite the
-    real checkout with a symlink.
+    Validation order:
+
+    1. Each entry must be a non-empty relative path (no leading ``/``).
+    2. Resolved source must lie under ``repo_path`` — ``..`` traversal
+       is permitted *syntactically* but rejected if the resolved path
+       escapes the repo boundary.
+    3. Source must exist in ``repo_path``.
+    4. If the target path already exists in the worktree (typically a
+       tracked path the checkout populated), the existing file is left
+       untouched. This is logged at WARNING with the entry name so
+       operators can spot a misconfigured extras list — declaring a
+       tracked path as an extra is almost always a mistake (the agent
+       reads main's working tree instead of main's HEAD).
+
+    On failure mid-loop, prior symlinks created in this call are NOT
+    rolled back here — the caller's ``except Exception`` in
+    ``create_experiment_worktree`` (which calls
+    ``remove_experiment_worktree``) sweeps the whole worktree.
     """
     for entry in extras:
         if not entry or os.path.isabs(entry):
@@ -122,28 +133,65 @@ def _link_worktree_extras(
 
         link_path = worktree_dir / entry
         if link_path.exists() or link_path.is_symlink():
+            # Loud warning, not silent — the executor will see main's
+            # tracked content here instead of the symlinked target,
+            # which subverts the campaign author's intent.
             logger.warning(
-                "worktree_extras: %s already present in worktree; leaving "
-                "the existing path untouched (entry was %r)",
-                link_path, entry,
+                "worktree_extras: %r collides with an existing path in "
+                "the worktree (%s) — leaving it untouched. This usually "
+                "means the entry refers to a tracked path; tracked paths "
+                "should NOT be declared as extras (they're already in "
+                "the worktree checkout). Drop %r from worktree_extras "
+                "or rename the source.",
+                entry, link_path, entry,
             )
             continue
         link_path.parent.mkdir(parents=True, exist_ok=True)
         os.symlink(source, link_path)
         logger.info("worktree_extras: linked %s -> %s", link_path, source)
 
 
+# Porcelain v1 status codes that indicate the executor produced or
+# changed file content the bundle didn't declare. ``M`` = modified, ``A``
+# = added (staged), ``R`` = renamed, ``C`` = copied, ``T`` = typechange.
+# ``D`` (deleted) is intentionally NOT here: removing a tracked file
+# isn't a "write," and surfacing it would turn ``git rm`` between arms
+# into noise. Untracked is signalled by the special ``??`` prefix and
+# handled separately in the parser.
+_PORCELAIN_WRITE_CODES = frozenset({"M", "A", "R", "C", "T"})
+
+
+def _parse_porcelain_line(line: str) -> tuple[str, str, str] | None:
+    """Parse one ``git status --porcelain`` v1 line.
+
+    Returns ``(index_status, worktree_status, path)`` or ``None`` for
+    blank/short lines. For renames and copies (``R``, ``C``), the
+    porcelain format is ``XY orig -> new``; this returns the *new*
+    path so the caller treats the destination as the relevant write.
+    """
+    if len(line) < 4:
+        return None
+    index_st, worktree_st = line[0], line[1]
+    rest = line[3:]
+    if " -> " in rest:
+        rest = rest.split(" -> ", 1)[1]
+    return index_st, worktree_st, rest.strip()
+
+
 def detect_undeclared_writes(
     worktree_path: Path,
     declared_paths: set[str] | None = None,
 ) -> list[str]:
-    """Return paths in ``worktree_path`` that the executor wrote without
+    """Return paths the executor wrote in ``worktree_path`` without
     declaring them via the bundle's ``code_changes`` (#230).
 
     Parses ``git -C <worktree_path> status --porcelain`` and reports
-    every untracked file (``??``) and modified-tracked file (``M``) that
-    is not already covered by ``declared_paths``. Each returned path is
-    relative to ``worktree_path``.
+    every porcelain line whose status indicates a write — untracked
+    (``??``), modified (``M``), staged-add (``A``), renamed (``R``),
+    copied (``C``), or typechanged (``T``) — in either the index or
+    the worktree column. Deletions (``D``) are not surfaced (see
+    ``_PORCELAIN_WRITE_CODES``). Each returned path is relative to
+    ``worktree_path``; for renames, the destination path is reported.
 
     Symlinks (typically created by ``worktree_extras``, #229) are
     excluded — they're orchestrator-managed inputs, not undeclared
@@ -154,6 +202,12 @@ def detect_undeclared_writes(
     ``code_changes`` entry will lose the file when the worktree is
     cleaned up. Reporting it loudly turns the silent loss into an
     auditable trail.
+
+    Returns an empty list when ``worktree_path`` is missing or when
+    ``git status`` itself fails — the cleanup path must not break on
+    diagnostics. Failures are logged at WARNING with returncode +
+    stderr, so an empty return on a real git failure is loud, not
+    silent.
     """
     declared_paths = declared_paths or set()
     worktree_path = Path(worktree_path)
@@ -168,22 +222,30 @@ def detect_undeclared_writes(
         check=False,
     )
     if result.returncode != 0:
-        # Worktree may have been removed already, or git is unhappy —
-        # don't make cleanup fail because diagnostics failed.
-        logger.debug(
-            "git status --porcelain on %s exited %s: %s",
+        # Diagnostic failure — we still must not block cleanup, but the
+        # whole point of this function is "turn silent loss into an
+        # auditable trail," so log loudly rather than at DEBUG.
+        logger.warning(
+            "detect_undeclared_writes: git status failed for %s "
+            "(returncode=%s); returning empty list. stderr=%s",
             worktree_path, result.returncode, result.stderr.strip(),
         )
         return []
 
     undeclared: list[str] = []
     for line in result.stdout.splitlines():
-        if len(line) < 4:
+        parsed = _parse_porcelain_line(line)
+        if parsed is None:
             continue
-        # Porcelain v1 prefix: 2 status chars + space + path.
-        status = line[:2]
-        path = line[3:].strip()
-        if not (status.startswith("??") or "M" in status):
+        index_st, worktree_st, path = parsed
+        # Untracked is the special ``??`` prefix.
+        if index_st == "?" and worktree_st == "?":
+            relevant = True
+        else:
+            relevant = bool(
+                _PORCELAIN_WRITE_CODES & {index_st, worktree_st}
+            )
+        if not relevant:
             continue
         if path in declared_paths:
             continue
diff --git a/tests/test_prompt_loader.py b/tests/test_prompt_loader.py
@@ -210,11 +210,12 @@ class TestWorktreeDisciplineGuidance:
     )
 
     def test_full_template_carries_discipline_section(self):
+        # Structural anchors only — match the *concepts* the section
+        # must convey, not the exact prose. Editorial tweaks to the
+        # surrounding language must not break this test.
         text = (self.REAL_PROMPTS_DIR / "execute_analyze.md").read_text()
         assert "Worktree discipline" in text
         assert "worktree_extras" in text
-        assert "Do not `cd` to the parent repo" in text
-        # Persistence rule must be present so executors don't lose work.
         assert "code_changes" in text
 
     def test_thin_template_carries_discipline_section(self):
@@ -273,16 +274,24 @@ def test_logs_missing_placeholders_and_context_keys(self, prompts_dir, caplog):
             with pytest.raises(ValueError, match="iteration_mode, mode_guidance"):
                 loader.load("execute_analyze", partial_context)
 
-        # The forensic log line must carry the two new fields.
+        # The forensic log line must carry the two new fields. Match by
+        # substring (not exact list-repr) so a future format swap (e.g.
+        # comma-joined values, JSON, structured logging) doesn't break
+        # the diagnostic intent — what matters is that a human reading
+        # the log can see the missing names AND the present names.
         record = next(
             (r for r in caplog.records if r.levelname == "ERROR"
              and "prompt render failed" in r.getMessage()),
             None,
         )
         assert record is not None, "expected ERROR-level diagnostic log line"
         msg = record.getMessage()
-        assert "missing_placeholders=['iteration_mode', 'mode_guidance']" in msg
-        assert "context_keys=['iter_dir', 'target_system']" in msg
+        assert "missing_placeholders=" in msg
+        assert "iteration_mode" in msg
+        assert "mode_guidance" in msg
+        assert "context_keys=" in msg
+        assert "iter_dir" in msg
+        assert "target_system" in msg
         assert "template=execute_analyze" in msg
 
     def test_no_log_on_successful_render(self, prompts_dir, caplog):
diff --git a/tests/test_worktree.py b/tests/test_worktree.py