Skip to content

Commit e1f95b9

Browse files
NagyViktNagyVikt
andauthored
fix(agent-worktree-prune): skip worktrees with live processes (#570)
`gx branch finish --cleanup` (and the underlying `agent-worktree-prune.sh`) deleted active Codex agent worktrees while the Codex TUI was still running inside them. After the cwd disappeared, Codex tried to refresh skills and run stop hooks and crashed with `No such file or directory (os error 2)`. - Add `has_live_process_in_worktree()` that walks `/proc/*/cwd` and returns true when any live process's cwd (including the "(deleted)" suffix that appears after a partial unlink) resolves inside the managed worktree. - Call it from `process_entry()` BEFORE branch/worktree removal. Live hits skip with `[agent-worktree-prune] Skipping live process worktree: <path>` and bump the `skipped_active` counter. - Regression test: spawn a long-running Node child inside a detached agent worktree, run `gx doctor`, assert the worktree is preserved and the log line emits. Fail-open on platforms without /proc — the check returns false and the existing prune flow proceeds unchanged. Co-authored-by: NagyVikt <nagy.viktordp@gmail.com>
1 parent 42b9ac3 commit e1f95b9

6 files changed

Lines changed: 138 additions & 0 deletions

File tree

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
schema: spec-driven
2+
created: 2026-05-12
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Why
2+
3+
`gx branch finish --cleanup` (and the underlying `agent-worktree-prune.sh`) deleted the active Codex agent worktree while the Codex TUI was still running inside it. Once the cwd disappeared, Codex tried to refresh skills / run stop hooks and crashed with `No such file or directory (os error 2)` / `failed to reload config`. Operator lost the session and any in-flight unsaved work in that pane.
4+
5+
## What Changes
6+
7+
- Add `has_live_process_in_worktree()` to `templates/scripts/agent-worktree-prune.sh` that walks `/proc/*/cwd` and returns true when any live process's cwd resolves to inside the managed worktree (including the "(deleted)" suffix that appears after a partial unlink).
8+
- Call it from `process_entry()` BEFORE any branch/worktree removal. When a live process is detected, the worktree is skipped and a clear `[agent-worktree-prune] Skipping live process worktree: <path>` line is logged. The `skipped_active` counter is incremented for the summary.
9+
- Add a regression test (`test/doctor.test.js`) that spawns a long-running Node child process inside a detached agent worktree, runs `gx doctor` cleanup, and asserts the worktree is preserved and the log line is emitted.
10+
11+
## Impact
12+
13+
- Affects `templates/scripts/agent-worktree-prune.sh` (the script copied into managed repos by `gx setup`) and the doctor cleanup path it drives.
14+
- No public API change; just a stricter precondition before destructive cleanup.
15+
- Risk: if `/proc` is unavailable (non-Linux), the live-process check returns false and prune behaves exactly as before. Fail-open is the correct posture here — we'd rather over-cleanup on platforms without `/proc` than block all cleanup permanently.
16+
- Rollout: no migration. New behavior takes effect the moment users pick up the updated `agent-worktree-prune.sh` (via `gx setup` / template copy or via a fresh clone of the repo).
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
## ADDED Requirements
2+
3+
### Requirement: agent-worktree-prune skips worktrees with live processes
4+
5+
The `agent-worktree-prune.sh` cleanup script SHALL NOT remove a managed agent worktree, nor delete its branch, while any live process on the host has its current working directory resolved to a path inside that worktree.
6+
7+
#### Scenario: Live process inside detached agent worktree preserves the worktree
8+
9+
- **GIVEN** a managed agent worktree at `<repo>/.omc/agent-worktrees/<slug>` is in detached-HEAD state and would otherwise satisfy the prune criteria
10+
- **AND** a live process on the host has its cwd inside that worktree (as reported by `/proc/*/cwd`)
11+
- **WHEN** `agent-worktree-prune.sh` runs against the parent repo
12+
- **THEN** the worktree directory continues to exist after the run
13+
- **AND** a `[agent-worktree-prune] Skipping live process worktree: <path>` line is emitted to stdout
14+
- **AND** the `skipped_active` counter is incremented in the run summary
15+
- **AND** regressions are covered by a `test/doctor.test.js` case that spawns a child process inside a detached worktree and asserts both the preservation and the log line.
16+
17+
#### Scenario: No /proc available falls back to legacy behavior
18+
19+
- **GIVEN** the host does not expose `/proc` (e.g., the script runs on a platform without procfs)
20+
- **WHEN** `agent-worktree-prune.sh` runs
21+
- **THEN** the live-process check returns false (fail-open)
22+
- **AND** the rest of the prune flow proceeds exactly as it did before this change, so cleanup on non-Linux hosts is not permanently blocked.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
## Definition of Done
2+
3+
This change is complete only when **all** of the following are true:
4+
5+
- Every checkbox below is checked.
6+
- The agent branch reaches `MERGED` state on `origin` and the PR URL + state are recorded in the completion handoff.
7+
- If any step blocks (test failure, conflict, ambiguous result), append a `BLOCKED:` line under section 4 explaining the blocker and **STOP**. Do not tick remaining cleanup boxes; do not silently skip the cleanup pipeline.
8+
9+
## Handoff
10+
11+
- Handoff: change=`agent-codex-skip-live-codex-worktree-cleanup-2026-05-13-01-13`; branch=`agent/<your-name>/<branch-slug>`; scope=`TODO`; action=`continue this sandbox or finish cleanup after a usage-limit/manual takeover`.
12+
- Copy prompt: Continue `agent-codex-skip-live-codex-worktree-cleanup-2026-05-13-01-13` on branch `agent/<your-name>/<branch-slug>`. Work inside the existing sandbox, review `openspec/changes/agent-codex-skip-live-codex-worktree-cleanup-2026-05-13-01-13/tasks.md`, continue from the current state instead of creating a new sandbox, and when the work is done run `gx branch finish --branch agent/<your-name>/<branch-slug> --base dev --via-pr --wait-for-merge --cleanup`.
13+
14+
## 1. Specification
15+
16+
- [x] 1.1 Finalize proposal scope and acceptance criteria for `agent-codex-skip-live-codex-worktree-cleanup-2026-05-13-01-13`.
17+
- [x] 1.2 Define normative requirements in `specs/skip-live-codex-worktree-cleanup/spec.md`.
18+
19+
## 2. Implementation
20+
21+
- [x] 2.1 Implement scoped behavior changes (`has_live_process_in_worktree` + `process_entry` precondition in `templates/scripts/agent-worktree-prune.sh`).
22+
- [x] 2.2 Add/update focused regression coverage (`test/doctor.test.js` — "preserves detached agent worktrees with live processes").
23+
24+
## 3. Verification
25+
26+
- [x] 3.1 Run targeted project verification commands. Evidence: `node --test --test-name-pattern="preserves detached agent worktrees with live processes" test/doctor.test.js` → 1 passed; sibling tests ("auto-prunes detached-HEAD agent worktrees", "preserves stranded worktrees when GUARDEX_SKIP_AUTO_WORKTREE_PRUNE=1") also pass (3/3).
27+
- [x] 3.2 Run `openspec validate agent-codex-skip-live-codex-worktree-cleanup-2026-05-13-01-13 --type change --strict`. Evidence: "Change ... is valid".
28+
- [x] 3.3 Run `openspec validate --specs`. Evidence: "No items found to validate" (no main spec deltas in this change).
29+
30+
## 4. Cleanup (mandatory; run before claiming completion)
31+
32+
- [ ] 4.1 Run the cleanup pipeline: `gx branch finish --branch agent/codex/skip-live-codex-worktree-cleanup-2026-05-13-01-13 --base main --via-pr --wait-for-merge --cleanup`. This handles commit -> push -> PR create -> merge wait -> worktree prune in one invocation.
33+
- [ ] 4.2 Record the PR URL and final merge state (`MERGED`) in the completion handoff.
34+
- [ ] 4.3 Confirm the sandbox worktree is gone (`git worktree list` no longer shows the agent path; `git branch -a` shows no surviving local/remote refs for the branch).

templates/scripts/agent-worktree-prune.sh

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,26 @@ read_branch_activity_epoch() {
382382

383383
skipped_recent=0
384384

385+
has_live_process_in_worktree() {
386+
local wt="$1"
387+
local proc_cwd=""
388+
389+
[[ -d /proc ]] || return 1
390+
391+
for proc_cwd in /proc/[0-9]*/cwd; do
392+
[[ -e "$proc_cwd" ]] || continue
393+
local live_cwd=""
394+
live_cwd="$(readlink "$proc_cwd" 2>/dev/null || true)"
395+
[[ -n "$live_cwd" ]] || continue
396+
live_cwd="${live_cwd% (deleted)}"
397+
if [[ "$live_cwd" == "$wt" || "$live_cwd" == "${wt}"/* ]]; then
398+
return 0
399+
fi
400+
done
401+
402+
return 1
403+
}
404+
385405
branch_idle_gate() {
386406
local branch="$1"
387407
local wt="$2"
@@ -501,6 +521,11 @@ process_entry() {
501521
echo "[agent-worktree-prune] Skipping active cwd worktree: ${wt}"
502522
return
503523
fi
524+
if has_live_process_in_worktree "$wt"; then
525+
skipped_active=$((skipped_active + 1))
526+
echo "[agent-worktree-prune] Skipping live process worktree: ${wt}"
527+
return
528+
fi
504529

505530
local remove_reason=""
506531
local branch_delete_mode="safe"

test/doctor.test.js

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1141,6 +1141,45 @@ test('gx doctor auto-prunes detached-HEAD agent worktrees under .omc/agent-workt
11411141
);
11421142
});
11431143

1144+
1145+
test('gx doctor preserves detached agent worktrees with live processes', async () => {
1146+
const repoDir = initRepoOnBranch('main');
1147+
seedCommit(repoDir);
1148+
1149+
const worktreeRoot = path.join(repoDir, '.omc', 'agent-worktrees');
1150+
fs.mkdirSync(worktreeRoot, { recursive: true });
1151+
const liveWorktree = path.join(worktreeRoot, 'live-agent-worktree');
1152+
1153+
let result = runHumanCmd('git', ['branch', 'agent/claude/live-demo'], repoDir);
1154+
assert.equal(result.status, 0, result.stderr || result.stdout);
1155+
result = runHumanCmd('git', ['worktree', 'add', liveWorktree, 'agent/claude/live-demo'], repoDir);
1156+
assert.equal(result.status, 0, result.stderr || result.stdout);
1157+
result = runHumanCmd('git', ['-C', liveWorktree, 'checkout', '--detach', 'HEAD'], repoDir);
1158+
assert.equal(result.status, 0, result.stderr || result.stdout);
1159+
result = runHumanCmd('git', ['branch', '-D', 'agent/claude/live-demo'], repoDir);
1160+
assert.equal(result.status, 0, result.stderr || result.stdout);
1161+
1162+
const child = cp.spawn(process.execPath, ['-e', 'setInterval(() => {}, 1000);'], {
1163+
cwd: liveWorktree,
1164+
stdio: 'ignore',
1165+
});
1166+
const exitPromise = new Promise((resolve) => {
1167+
child.once('exit', (code, signal) => resolve({ code, signal }));
1168+
});
1169+
1170+
try {
1171+
assert.equal(isPidAlive(child.pid), true, 'live worktree process should be running');
1172+
result = runNode(['doctor', '--target', repoDir, '--skip-agents', '--no-global-install'], repoDir);
1173+
assert.equal(result.status, 0, result.stderr || result.stdout);
1174+
const combined = `${result.stdout}\n${result.stderr}`;
1175+
assert.match(combined, /Skipping live process worktree/);
1176+
assert.equal(fs.existsSync(liveWorktree), true, 'live process worktree should be preserved');
1177+
} finally {
1178+
child.kill('SIGTERM');
1179+
await exitPromise;
1180+
}
1181+
});
1182+
11441183
test('gx doctor preserves stranded worktrees when GUARDEX_SKIP_AUTO_WORKTREE_PRUNE=1', () => {
11451184
const repoDir = initRepoOnBranch('main');
11461185
seedCommit(repoDir);

0 commit comments

Comments
 (0)