Skip to content

Commit 76ed590

Browse files
sriumcpclaude
andauthored
fix(isolation): close all 4 subissues of tracker #228 (worktree, resume, persistence) (#234)
* fix(isolation): close all 4 subissues of tracker #228 Lands v1 scope of every subissue under tracker #228. Friction observed across two campaigns on 2026-05-27 (paper-burst + cross-account-signal- pooling) where the experiment worktree's isolation guarantee was silently undermined: executors `cd` to the parent repo to use gitignored deps, write Python modules into the worktree without declaring them as code_changes, and on resume occasionally hit unreplaced-placeholder errors. Closes #229 — `target_system.worktree_extras` schema field + auto-symlink gitignored deps from main into each experiment worktree on creation. Each entry must be a non-empty relative path resolving under repo_path; absolute paths and ../ traversal are rejected at creation time. Source must exist. On extras failure, the half-built worktree + branch are cleaned up before re-raising (no leak). Symlinks live inside the worktree dir, so `git worktree remove --force` reaps them with the rest of the dir. Closes #230 — Pre-cleanup `git status --porcelain` warns on undeclared writes; surfaced in `findings.json` under a new optional `worktree_uncommitted_writes` key (schema-allowed). Filters symlinks (orchestrator-managed inputs from #229) and any path declared in `bundle.arms[].code_changes[].file`. Tripwire only — never blocks cleanup. Closes #231 — "Worktree discipline" guidance added to `execute_analyze.md` (full) and `execute_analyze_thin.md`. Tells the executor: stay in the worktree, reference parent assets via `worktree_extras` symlinks (relative paths, not absolute paths into main), declare any new files via `code_changes`. Prose-only change — no new placeholders, no behavior change to dispatch. Closes #232 — Forensic logging in `prompt_loader.py`: when rendering fails on unreplaced placeholders, log `template`, `resolved_path`, `missing_placeholders`, and `context_keys` at ERROR before raising. Diagnostic only — no speculative fix for the resume-time bug. The intermittent Unreplaced-placeholders failure on iter-2 EXECUTE_ANALYZE is gated on reproduction; this PR ships the seine. Refs #228 (tracker). ## Test plan - 1167 passed, 1 skipped (was 1133 + 1 in #227 baseline; +34 new behavioral tests). - All tests use real on-disk fixtures or seam-injected fakes per CLAUDE.md (no live LLM calls). - New behavioral coverage: - test_worktree.py: TestWorktreeExtras (9), TestDetectUndeclared Writes (6), TestDeclaredCodeChangePaths (5), TestRecord UndeclaredWritesInFindings (5) - test_prompt_loader.py: TestWorktreeDisciplineGuidance (3), TestPlaceholderDiagnosticLogging (2) ## Out of scope (v2 — explicit, in tracker #228) - Post-execution `git -C <main> diff` fail-loud check on main's working tree (defense in depth for #229) - `auto_capture_writes` synthetic `code_changes` flag (alternative for #230) - Actual fix for the resume-time placeholder bug (gated on reproduction; #232 ships the diagnostic) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address PR #234 review findings Addresses 4 reviewer reports against PR #234. All findings actioned except those explicitly out of scope (silent-failure-hunter's "write to sibling file" alternative — overkill given the tripwire is non-blocking). ## Critical fixes 1. **Porcelain parser rewrite** (orchestrator/worktree.py) The substring filter `"M" in status` was too loose (matched phantom code combinations) and too narrow (missed staged-add `A`, renames `R`, copies `C`, typechanges `T`). Rename paths in porcelain v1 use `orig -> new` syntax that the old parser reported as a single garbage path. New: explicit `_PORCELAIN_WRITE_CODES` set, `_parse_porcelain_line` helper that splits rename destinations correctly. Untracked is handled via the `??` special case; deletions (`D`) are documented as intentionally NOT surfaced (removing a file isn't a "write"). 2. **Tripwire hoisted into execute-incomplete branch** (orchestrator/iteration.py) The original implementation only ran `detect_undeclared_writes` on EXECUTE_ANALYZE success. But that's exactly when undeclared writes matter LEAST — the executor wrote findings.json and likely declared what it needed. The case that matters MOST is `_missing_execute_artifacts` (max-turn exhaustion, subprocess crash) where the executor wrote partial code and never got to declare it. New `_detect_undeclared_writes_for_iter` helper runs in both branches; results land in retry_log.jsonl on incomplete and findings.json on success. 3. **`detect_undeclared_writes` git-failure log: DEBUG → WARNING** The whole point of the helper is "turn silent loss into an auditable trail." Swallowing diagnostic failures at DEBUG re-introduces the silent loss for the failure case. Now logs at WARNING with returncode + stderr. 4. **`_declared_code_change_paths` YAML failure logged at ERROR** bundle.yaml is a system boundary; corruption is operator- actionable. Previously: silent empty-set return. Now: ERROR log naming the parse error and the bundle path before returning empty. ## Important fixes 5. **`except BaseException` → `except Exception`** in create_experiment_worktree's extras-cleanup path. BaseException catches KeyboardInterrupt/SystemExit; we don't want to trigger a `git worktree remove` subprocess on Ctrl-C. 6. **Existing-path collision in `_link_worktree_extras`** now logs a loud, explanatory WARNING (was a quiet warn-and-continue) — names the collision, explains why declaring a tracked path as an extra is almost certainly a misconfiguration, and tells the operator how to fix. 7. **`_record_undeclared_writes_in_findings` JSON failure logged at ERROR** with the undeclared-paths list. Previously: silent return. Now the operator can recover the list from logs even when findings.json is corrupt. 8. **Parity: success branch now uses `experiment_id` AND `experiment_dir` guards**, matching the cleanup path's `experiment_id` guard. A future refactor that decouples them won't silently disable the tripwire. ## Test additions (+8 behavioral tests) - `test_modify_then_stage_flagged` — porcelain MM - `test_added_staged_file_flagged` — porcelain A - `test_renamed_file_reports_destination` — rename parser - `test_deleted_file_not_flagged` — codify the D-skip behavior - `test_renamed_destination_can_be_declared` — declared-paths filter survives rename round-trip - `test_git_failure_logs_warning` — diagnostic failures aren't silent - `test_malformed_findings_logs_error_with_paths` — recovery info in logs - `test_corrupted_bundle_logs_error` — bundle corruption is loud ## Test relaxations (review-flagged brittleness) - `TestWorktreeDisciplineGuidance` no longer asserts the literal string "Do not `cd` to the parent repo" — anchors on "Worktree discipline", "worktree_extras", "code_changes" instead. Editorial prose tweaks won't break the test; the *concepts* still must be present. - `test_logs_missing_placeholders_and_context_keys` now matches by substring (not exact list-repr). A future format swap (JSON, comma-joined) preserves the diagnostic intent. ## Test plan - 1175 passed, 1 skipped (was 1167; +8 new). Zero regressions. - All tests behavioral; real on-disk git repos in tmp_path; caplog for diagnostic-line assertions. Refs #228, closes #229 #230 #231 #232 (v1). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 0ae90a9 commit 76ed590

9 files changed

Lines changed: 933 additions & 5 deletions

File tree

orchestrator/iteration.py

Lines changed: 114 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,97 @@ class IterationOutcome(str, Enum):
4848

4949
TEMPLATES_DIR = Path(__file__).resolve().parent / "templates"
5050
SCHEMAS_DIR = Path(__file__).resolve().parent / "schemas"
51+
52+
53+
def _declared_code_change_paths(bundle_path: Path) -> set[str]:
54+
"""Read ``bundle.yaml`` and return every ``arms[].code_changes[].file``
55+
declared on any arm (#230).
56+
57+
Returns the set of declared file paths verbatim — paths are NOT
58+
normalized. If the bundle declares an absolute path or a ``./``-
59+
prefixed path, it won't match the relative paths that
60+
``detect_undeclared_writes`` reports; that mismatch is the bundle
61+
author's responsibility (the bundle schema already constrains
62+
``file`` shape).
63+
64+
Returns an empty set when the bundle is missing or declares no
65+
``code_changes``. A YAML parse failure is logged at ERROR (the
66+
bundle is a system boundary; corruption is operator-actionable)
67+
and an empty set returned so cleanup proceeds.
68+
"""
69+
if not bundle_path.exists():
70+
return set()
71+
try:
72+
bundle = yaml.safe_load(bundle_path.read_text()) or {}
73+
except yaml.YAMLError as exc:
74+
logger.error(
75+
"_declared_code_change_paths: bundle.yaml parse failed at %s "
76+
"(%s); treating as if no code_changes were declared. Every "
77+
"executor write will be flagged as undeclared until this is fixed.",
78+
bundle_path, exc,
79+
)
80+
return set()
81+
arms = bundle.get("arms") or []
82+
declared: set[str] = set()
83+
for arm in arms:
84+
if not isinstance(arm, dict):
85+
continue
86+
for change in arm.get("code_changes") or []:
87+
if isinstance(change, dict) and isinstance(change.get("file"), str):
88+
declared.add(change["file"])
89+
return declared
90+
91+
92+
def _record_undeclared_writes_in_findings(
93+
findings_path: Path,
94+
undeclared: list[str],
95+
) -> None:
96+
"""Merge ``worktree_uncommitted_writes`` into ``findings.json`` (#230).
97+
98+
No-op if findings.json is missing — the cleanup may be running in
99+
the execute-incomplete branch where findings was never produced
100+
(the caller surfaces the data via retry_log there instead).
101+
102+
A JSONDecodeError on the existing findings is logged at ERROR
103+
(corrupted findings is operator-actionable) and the function
104+
returns without writing — modifying a corrupt JSON file would
105+
only make recovery harder.
106+
"""
107+
if not undeclared or not findings_path.exists():
108+
return
109+
try:
110+
findings = json.loads(findings_path.read_text())
111+
except json.JSONDecodeError as exc:
112+
logger.error(
113+
"_record_undeclared_writes_in_findings: findings.json at %s "
114+
"is not valid JSON (%s); the undeclared-writes list will not "
115+
"be persisted. Undeclared paths: %s",
116+
findings_path, exc, undeclared,
117+
)
118+
return
119+
findings["worktree_uncommitted_writes"] = sorted(set(undeclared))
120+
atomic_write(findings_path, json.dumps(findings, indent=2) + "\n")
121+
122+
123+
def _detect_undeclared_writes_for_iter(
124+
iter_dir: Path,
125+
experiment_dir: Path,
126+
) -> list[str]:
127+
"""Detect undeclared writes in ``experiment_dir`` and log a WARNING
128+
if any are found (#230). Returns the list so the caller can decide
129+
where to persist it (findings.json on success, retry_log on
130+
incomplete). Pure tripwire — never raises, never blocks cleanup."""
131+
from orchestrator.worktree import detect_undeclared_writes
132+
declared = _declared_code_change_paths(iter_dir / "bundle.yaml")
133+
undeclared = detect_undeclared_writes(experiment_dir, declared)
134+
if undeclared:
135+
logger.warning(
136+
"Executor wrote %d files in the experiment worktree "
137+
"without declaring them in bundle.arms[].code_changes; "
138+
"they will be lost on cleanup: %s",
139+
len(undeclared), undeclared[:20],
140+
)
141+
return undeclared
51142
DEFAULTS_PATH = Path(__file__).resolve().parent / "defaults.yaml"
52143
_ARM_TYPE_RE = re.compile(r"^[a-zA-Z0-9_-]+$")
53144

@@ -960,8 +1051,9 @@ def _max_turns_for(phase_key: str) -> int:
9601051
create_experiment_worktree,
9611052
remove_experiment_worktree,
9621053
)
1054+
extras = campaign.get("target_system", {}).get("worktree_extras") or []
9631055
experiment_dir, experiment_id = create_experiment_worktree(
964-
Path(repo_path), iteration,
1056+
Path(repo_path), iteration, extras=extras,
9651057
)
9661058
(iter_dir / ".experiment_id").write_text(experiment_id)
9671059
print(f" Experiment worktree: {experiment_dir}")
@@ -1007,13 +1099,22 @@ def _max_turns_for(phase_key: str) -> int:
10071099
# "X not found" from validate_execution.
10081100
missing = _missing_execute_artifacts(iter_dir)
10091101
if missing:
1102+
# #230: even on incomplete, the executor may have written
1103+
# partial code into the worktree — exactly the case where
1104+
# undeclared writes matter most. Capture before cleanup.
1105+
incomplete_undeclared: list[str] = []
1106+
if repo_path and experiment_id and experiment_dir is not None:
1107+
incomplete_undeclared = _detect_undeclared_writes_for_iter(
1108+
iter_dir, experiment_dir,
1109+
)
10101110
from orchestrator.metrics import log_retry_event
10111111
log_retry_event(work_dir / "llm_metrics.jsonl", {
10121112
"iteration": iteration,
10131113
"phase": "execute-analyze",
10141114
"failure_type": "execute_incomplete",
10151115
"missing_artifacts": missing,
10161116
"max_turns": _max_turns_for("execute_analyze"),
1117+
"undeclared_writes": incomplete_undeclared,
10171118
})
10181119
# Clean up the experiment worktree so a re-run isn't blocked.
10191120
if repo_path and experiment_id:
@@ -1031,6 +1132,18 @@ def _max_turns_for(phase_key: str) -> int:
10311132
"Executor artifacts failed post-check validation: %s",
10321133
result["errors"],
10331134
)
1135+
# #230: surface undeclared writes that would be lost when the
1136+
# worktree is removed below. Persist into findings.json so the
1137+
# design agent on iter-N+1 can see what to declare in
1138+
# ``code_changes``. Tripwire only — never blocks cleanup.
1139+
if repo_path and experiment_id and experiment_dir is not None:
1140+
undeclared = _detect_undeclared_writes_for_iter(
1141+
iter_dir, experiment_dir,
1142+
)
1143+
if undeclared:
1144+
_record_undeclared_writes_in_findings(
1145+
iter_dir / "findings.json", undeclared,
1146+
)
10341147
# Clean up worktree only on success
10351148
if repo_path and experiment_id:
10361149
remove_experiment_worktree(Path(repo_path), experiment_id)

orchestrator/prompt_loader.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,21 @@ def load(self, template_name: str, context: dict[str, str]) -> str:
6060

6161
remaining = _PLACEHOLDER_RE.findall(text)
6262
if remaining:
63+
missing = sorted(set(remaining))
64+
# #232: forensic logging on the resume-time placeholder bug.
65+
# The error message names the missing placeholders; we add
66+
# what keys WERE present in the context so the next
67+
# occurrence produces evidence pointing at the actual cause
68+
# (phase string mismatch, stale loader, resume-path field
69+
# uninitialized, ...).
70+
logger.error(
71+
"prompt render failed: template=%s resolved_path=%s "
72+
"missing_placeholders=%s context_keys=%s",
73+
template_name, path, missing, sorted(context.keys()),
74+
)
6375
raise ValueError(
6476
f"Unreplaced placeholders in {template_name}.md: "
65-
f"{', '.join(sorted(set(remaining)))}"
77+
f"{', '.join(missing)}"
6678
)
6779

6880
logger.debug("Loaded prompt %s (%d chars)", template_name, len(text))

orchestrator/schemas/campaign.schema.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,23 @@ properties:
8484
type: ["string", "null"]
8585
minLength: 1
8686
description: "Path to target system git repo. Used by CLIDispatcher for code-access agents. If set, experiments run in isolated worktrees."
87+
worktree_extras:
88+
type: array
89+
items:
90+
type: string
91+
minLength: 1
92+
uniqueItems: true
93+
description: >
94+
Paths (relative to repo_path) to symlink into each experiment
95+
worktree on creation (#229). Use this for gitignored assets
96+
the executor needs from main — virtualenvs, pre-fetched data
97+
dirs, prior-iteration outputs, build artifacts. Without this
98+
mechanism, executors discover that the worktree is missing
99+
their tooling and `cd` back to the parent repo, silently
100+
breaking isolation. Each entry must be a relative path that
101+
resolves under repo_path; absolute paths and ``..`` traversal
102+
are rejected at worktree creation. Source must exist in main
103+
at the time of worktree creation.
87104
88105
models:
89106
type: object

orchestrator/schemas/findings.schema.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,12 @@
2424
"minimum": 0,
2525
"maximum": 100,
2626
"description": "Percentage of total effect from single dominant component, if detected."
27+
},
28+
"worktree_uncommitted_writes": {
29+
"type": "array",
30+
"items": { "type": "string", "minLength": 1 },
31+
"uniqueItems": true,
32+
"description": "#230 — paths the executor wrote inside the experiment worktree without declaring them in the bundle's `code_changes`. Surfaced just before worktree cleanup; logged at WARNING. Empty array (or absent) means the executor declared everything it wrote, or wrote nothing untracked. The orchestrator does not block cleanup on undeclared writes — this is a tripwire, not a gate."
2733
}
2834
},
2935
"$defs": {

0 commit comments

Comments
 (0)