You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(isolation): close all 4 subissues of tracker #228 (worktree, resume, persistence) (#234)
* fix(isolation): close all 4 subissues of tracker #228
Lands v1 scope of every subissue under tracker #228. Friction observed
across two campaigns on 2026-05-27 (paper-burst + cross-account-signal-
pooling) where the experiment worktree's isolation guarantee was
silently undermined: executors `cd` to the parent repo to use
gitignored deps, write Python modules into the worktree without
declaring them as code_changes, and on resume occasionally hit
unreplaced-placeholder errors.
Closes#229 — `target_system.worktree_extras` schema field +
auto-symlink gitignored deps from main into each experiment
worktree on creation. Each entry must be a non-empty relative path
resolving under repo_path; absolute paths and ../ traversal are
rejected at creation time. Source must exist. On extras failure,
the half-built worktree + branch are cleaned up before re-raising
(no leak). Symlinks live inside the worktree dir, so
`git worktree remove --force` reaps them with the rest of the dir.
Closes#230 — Pre-cleanup `git status --porcelain` warns on
undeclared writes; surfaced in `findings.json` under a new optional
`worktree_uncommitted_writes` key (schema-allowed). Filters
symlinks (orchestrator-managed inputs from #229) and any path
declared in `bundle.arms[].code_changes[].file`. Tripwire only —
never blocks cleanup.
Closes#231 — "Worktree discipline" guidance added to
`execute_analyze.md` (full) and `execute_analyze_thin.md`. Tells
the executor: stay in the worktree, reference parent assets via
`worktree_extras` symlinks (relative paths, not absolute paths
into main), declare any new files via `code_changes`. Prose-only
change — no new placeholders, no behavior change to dispatch.
Closes#232 — Forensic logging in `prompt_loader.py`: when
rendering fails on unreplaced placeholders, log
`template`, `resolved_path`, `missing_placeholders`, and
`context_keys` at ERROR before raising. Diagnostic only — no
speculative fix for the resume-time bug. The intermittent
Unreplaced-placeholders failure on iter-2 EXECUTE_ANALYZE is
gated on reproduction; this PR ships the seine.
Refs #228 (tracker).
## Test plan
- 1167 passed, 1 skipped (was 1133 + 1 in #227 baseline; +34 new
behavioral tests).
- All tests use real on-disk fixtures or seam-injected fakes per
CLAUDE.md (no live LLM calls).
- New behavioral coverage:
- test_worktree.py: TestWorktreeExtras (9), TestDetectUndeclared
Writes (6), TestDeclaredCodeChangePaths (5), TestRecord
UndeclaredWritesInFindings (5)
- test_prompt_loader.py: TestWorktreeDisciplineGuidance (3),
TestPlaceholderDiagnosticLogging (2)
## Out of scope (v2 — explicit, in tracker #228)
- Post-execution `git -C <main> diff` fail-loud check on main's
working tree (defense in depth for #229)
- `auto_capture_writes` synthetic `code_changes` flag (alternative
for #230)
- Actual fix for the resume-time placeholder bug (gated on
reproduction; #232 ships the diagnostic)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(review): address PR #234 review findings
Addresses 4 reviewer reports against PR #234. All findings actioned
except those explicitly out of scope (silent-failure-hunter's "write
to sibling file" alternative — overkill given the tripwire is
non-blocking).
## Critical fixes
1. **Porcelain parser rewrite** (orchestrator/worktree.py)
The substring filter `"M" in status` was too loose (matched
phantom code combinations) and too narrow (missed staged-add `A`,
renames `R`, copies `C`, typechanges `T`). Rename paths in
porcelain v1 use `orig -> new` syntax that the old parser
reported as a single garbage path.
New: explicit `_PORCELAIN_WRITE_CODES` set, `_parse_porcelain_line`
helper that splits rename destinations correctly. Untracked is
handled via the `??` special case; deletions (`D`) are documented
as intentionally NOT surfaced (removing a file isn't a "write").
2. **Tripwire hoisted into execute-incomplete branch**
(orchestrator/iteration.py) The original implementation only ran
`detect_undeclared_writes` on EXECUTE_ANALYZE success. But that's
exactly when undeclared writes matter LEAST — the executor wrote
findings.json and likely declared what it needed. The case that
matters MOST is `_missing_execute_artifacts` (max-turn exhaustion,
subprocess crash) where the executor wrote partial code and never
got to declare it. New `_detect_undeclared_writes_for_iter`
helper runs in both branches; results land in retry_log.jsonl
on incomplete and findings.json on success.
3. **`detect_undeclared_writes` git-failure log: DEBUG → WARNING**
The whole point of the helper is "turn silent loss into an
auditable trail." Swallowing diagnostic failures at DEBUG
re-introduces the silent loss for the failure case. Now logs at
WARNING with returncode + stderr.
4. **`_declared_code_change_paths` YAML failure logged at ERROR**
bundle.yaml is a system boundary; corruption is operator-
actionable. Previously: silent empty-set return. Now: ERROR log
naming the parse error and the bundle path before returning empty.
## Important fixes
5. **`except BaseException` → `except Exception`** in
create_experiment_worktree's extras-cleanup path. BaseException
catches KeyboardInterrupt/SystemExit; we don't want to trigger a
`git worktree remove` subprocess on Ctrl-C.
6. **Existing-path collision in `_link_worktree_extras`** now logs a
loud, explanatory WARNING (was a quiet warn-and-continue) — names
the collision, explains why declaring a tracked path as an extra
is almost certainly a misconfiguration, and tells the operator
how to fix.
7. **`_record_undeclared_writes_in_findings` JSON failure logged at
ERROR** with the undeclared-paths list. Previously: silent return.
Now the operator can recover the list from logs even when
findings.json is corrupt.
8. **Parity: success branch now uses `experiment_id` AND
`experiment_dir` guards**, matching the cleanup path's
`experiment_id` guard. A future refactor that decouples them
won't silently disable the tripwire.
## Test additions (+8 behavioral tests)
- `test_modify_then_stage_flagged` — porcelain MM
- `test_added_staged_file_flagged` — porcelain A
- `test_renamed_file_reports_destination` — rename parser
- `test_deleted_file_not_flagged` — codify the D-skip behavior
- `test_renamed_destination_can_be_declared` — declared-paths
filter survives rename round-trip
- `test_git_failure_logs_warning` — diagnostic failures aren't silent
- `test_malformed_findings_logs_error_with_paths` — recovery info in logs
- `test_corrupted_bundle_logs_error` — bundle corruption is loud
## Test relaxations (review-flagged brittleness)
- `TestWorktreeDisciplineGuidance` no longer asserts the literal
string "Do not `cd` to the parent repo" — anchors on
"Worktree discipline", "worktree_extras", "code_changes" instead.
Editorial prose tweaks won't break the test; the *concepts* still
must be present.
- `test_logs_missing_placeholders_and_context_keys` now matches by
substring (not exact list-repr). A future format swap (JSON,
comma-joined) preserves the diagnostic intent.
## Test plan
- 1175 passed, 1 skipped (was 1167; +8 new). Zero regressions.
- All tests behavioral; real on-disk git repos in tmp_path; caplog
for diagnostic-line assertions.
Refs #228, closes#229#230#231#232 (v1).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: orchestrator/schemas/findings.schema.json
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,12 @@
24
24
"minimum": 0,
25
25
"maximum": 100,
26
26
"description": "Percentage of total effect from single dominant component, if detected."
27
+
},
28
+
"worktree_uncommitted_writes": {
29
+
"type": "array",
30
+
"items": { "type": "string", "minLength": 1 },
31
+
"uniqueItems": true,
32
+
"description": "#230 — paths the executor wrote inside the experiment worktree without declaring them in the bundle's `code_changes`. Surfaced just before worktree cleanup; logged at WARNING. Empty array (or absent) means the executor declared everything it wrote, or wrote nothing untracked. The orchestrator does not block cleanup on undeclared writes — this is a tripwire, not a gate."
0 commit comments