Skip to content

fix(remove): atomic CAS branch deletion unifies safe-delete semantics#2903

Merged
max-sixty merged 15 commits into
mainfrom
agent-a3bad31af2bb71c8e-cas
Jun 28, 2026
Merged

fix(remove): atomic CAS branch deletion unifies safe-delete semantics#2903
max-sixty merged 15 commits into
mainfrom
agent-a3bad31af2bb71c8e-cas

Conversation

@max-sixty

@max-sixty max-sixty commented May 25, 2026

Copy link
Copy Markdown
Owner

Three follow-ups to #2870 plus a CAS-based unification of safe-delete semantics. (#2900 was the small/incremental path for the first two items; it has been closed as superseded — its commits are all contained here.)

Item 1 — drop the defensive check_lock (refactor)

The check_lock RwLock<()> in step_prune was carried over from #2808's Windows .git/config race fix. After #2870 restructured the flow, the phase ordering already serializes integration-check readers against removals: the background par_iter is the only reader, the for ... in rx loop exits only once that thread has finished, and removals run strictly after. Belt-and-suspenders per CLAUDE.md — removed.

Item 2 — warn when background branch deletion silently retains (fix)

Both branch-deletion paths in execute_instant_removal_or_fallback swallowed errors at log::debug!. Promoted the surprise Ok(NotDeleted) (planner predicted deletion, branch survived — a hook moved the tip) to a warning_message with a wt remove -D <branch> recovery hint. Suppressed when the planner already predicted retention so the existing print_hints unmerged-branch message doesn't duplicate. Err failures now log at warn level (developer-facing) rather than user-actionable warnings — the failure modes (git update-ref exec error, refs DB I/O) aren't actionable beyond re-running.

Item 3 / CAS rewrite — atomic safe-delete (the big one)

delete_branch_if_safe now uses git update-ref -d refs/heads/<branch> <expected-sha> instead of git branch -D. The expected SHA comes from the snapshot the integration check already consulted, so the check→delete sequence becomes one atomic step against a known SHA. If anything moved the ref in between, git rejects the CAS — the new BranchDeletionOutcome::RetainedRaced outcome is returned and surfaced to the user. Force-delete keeps git branch -D (explicit override).

This unifies the divergent safe-delete semantics in execute_instant_removal_or_fallback:

  • Fast path + SynchronousForNonCurrent fallback routed through delete_branch_if_safe, so they pick up CAS for free.
  • Detached fallback (rename failed AND current worktree): the shell && git branch -d <branch> is replaced by a foreground integration check + atomic && git update-ref -d <ref> <expected-sha> tail (build_cas_branch_delete_tail). Squash-merged / patch-id / ancestor branches the planner accepts are now accepted at delete time too, matching the fast path.
  • Branch-only deletion (handle_branch_only_output) switches from bare git branch -D (effectively a force-delete after a stale integration check) to delete_branch_if_safe. CAS protects it too.

Follow-up consolidation (commit 971e8b4)

The "branch kept because its tip moved during the delete" message had drifted into three shapes — two near-identical inline strings plus a gap: the foreground worktree path routed RetainedRaced through show_unmerged_hint and printed the generic "Branch unmerged; run -D" hint, whose bare -D would force-delete the just-arrived racing commits. All three emit paths (warn_if_branch_retained, handle_branch_only_output, print_hints) now route through one retained_raced_branch_message helper, with a dedicated RetainedRaced early-return arm in print_hints; show_unmerged_hint is now NotDeleted-only. Also converged the recovery command to the canonical wt remove -D <branch> (via suggest_command) and inlined the single-caller snapshot_sha wrapper.

Race coverage

CAS closes the TOCTOU window inside delete_branch_if_safe (between its own capture_refs and update-ref -d). Hook-driven races continue to be caught by the existing fresh integration check that runs after pre-remove hooks; CAS narrows the residual window from "between check and delete" to "never". The race rejection is unit-simulated (the ref is moved externally), not yet driven end-to-end through a real wt remove + hook.

Tests

Unit tests in git::remove::tests: cas_rejects_delete_when_branch_advances (pins the CAS rejection mechanism), cas_deletes_when_branch_unchanged, cas_propagates_error_when_ref_vanished, and deletes_via_fallback_when_branch_absent_from_snapshot. In output::handlers::tests: every warn_if_branch_retained arm, build_remove_command_with_tail_appends_only_when_present, and retained_raced_branch_message_lead_in_varies. test_prune_fallback_config_race_canary's wrapper-git script matches the new update-ref -d refs/heads/<branch> shape.

cargo run -- hook pre-merge --yes passes locally: 4263 nextest tests + doctests + pre-commit + clippy + docs, no snapshot drift.

Known gaps (deferred)

  • No end-to-end integration test of a real mid-delete race (a pre-remove hook advancing the branch); the mechanism is unit-covered.
  • The FAQ What can Worktrunk delete? doesn't yet mention the fail-closed-on-race behavior.

This was written by Claude Code on behalf of Maximilian Roos

max-sixty and others added 2 commits May 25, 2026 14:28
The `check_lock` in `step_prune` was carried over from #2808's
Windows `.git/config` race fix. After #2870 restructured the flow so
hook approval is exact, the phase ordering already serializes
integration-check readers against the rename-failure fallback's
`.git/config` rewrite: the background par_iter is the only reader, and
the channel-drain loop (`for (idx, result) in rx`) only exits once that
thread has finished. Removals run strictly after.

That made the lock belt-and-suspenders — one mechanism per guarantee,
per CLAUDE.md. Drop the `RwLock<()>`, the `Arc` wrapping it, the
read-guard in the par_iter worker, the write-guard at the top of
`try_remove`, and the `check_lock` field on `RemovalContext`. Comments
on the par_iter spawn and the `SynchronousForNonCurrent` variant now
name the real invariants (phase ordering, summary-accurate iteration)
rather than the dropped lock.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e branch

Both branch-deletion paths in `execute_instant_removal_or_fallback`
swallowed errors at `log::debug!` — invisible in default verbosity.
When the worktree is already gone but the branch survives (a hook
advanced it, a `git branch -D` exec failed, refs scan errored), the
user got no signal that the second half of the operation didn't happen.

Promote both to a `warning_message` with a `wt remove --force-delete <branch>`
recovery hint. To avoid duplicating the planner's existing "Branch
unmerged" hint, the warning suppresses the `Ok(NotDeleted)` case when
the planner already predicted retention (unmerged, or
`--no-delete-branch`); race-driven `NotDeleted` outcomes still warn.
`Err` always warns, since the prior `log::debug!` was invisible.

`BackgroundRemoval` carries a new `planner_expected_retention` bit so
the helper can make this distinction without re-deriving planner state.

This adds noise to a previously silent path — the tradeoff is that
silent half-completion of a destructive operation is worse than an
extra warning line on the rare race.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@worktrunk-bot worktrunk-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS approach unifies the four SafeDelete paths cleanly: one library helper (delete_branch_if_safe) for fast/synchronous/branch-only, plus a CAS-tail builder for the Detached fallback. The update-ref -d <ref> <oid> primitive picks up IntegrationReason semantics for free across all paths — the description of the prior && git branch -d <branch> shell tail silently losing squash-merged/patch-id matches is accurate. The race-rejection discrimination (try update-ref -d, then rev-parse --verify --quiet to distinguish "ref still present" from "real error") is the right shape per the structured-output rule.

The only gap is the one you raised in test plan item 3, and it shows up the most on handle_named_removed_worktree_foreground. In that path output.branch_result carries Ok(RetainedRaced) straight through RemovalDisplayInfo::from_branch_resulthandle_branch_deletion_result, which sets show_unmerged_hint = true. RemovalDisplayInfo::print_message then prints "Removed feature worktree" with no mention that the branch was kept (since branch_deleted() is false), and print_hints falls into the self.show_unmerged_hint || … arm and emits "Branch unmerged; to delete, run wt remove -D". The branch wasn't unmerged from the planner's view — it was integrated and then moved — so that hint mis-attributes the cause.

The other three paths handle this correctly: the rename-into-trash fast path and SynchronousForNonCurrent go through warn_if_branch_retained (handlers.rs:311), and handle_branch_only_output has its own dedicated branch (handlers.rs:1123). The detached fallback can't surface it (CAS happens in the detached child), so foreground is the one missing case.

Cleanest fix is probably either: (1) split show_unmerged_hint in BranchDeletionDisplay from a show_race_warning flag for RetainedRaced, then have print_hints emit the dedicated "moved during deletion" message; or (2) keep the field but add a RetainedRaced early branch in print_hints that prints the warning before falling through.

Nit: assert_ne!(original_sha, advanced_sha) works because TestRepo::git_output trims (src/testing/mod.rs:1120), but it's worth knowing — if that trim ever goes away, the assertion becomes vacuous since the trailing \n would always make the strings differ.

The `Err` arm of `warn_if_branch_retained` covered failure modes the user
cannot act on differently than re-running the command (`git branch -D` exec
errors, refs DB I/O failures). Promote it from `warning_message` to
`log::warn!` to match the writing-user-outputs guideline for non-user-fixable
issues, and let the `Ok(NotDeleted)` race path stay user-visible since the
recovery is `wt remove --force-delete <branch>`.

Net effect on the prior commit: removes the duplicate "to retry, run ..."
recovery hint for I/O errors (the prior wording suggested the same recovery
either way, conflating two different failure shapes) and cuts an
integration-test-unreachable code path that was the source of the
codecov/patch miss on #2900.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@max-sixty max-sixty force-pushed the agent-a3bad31af2bb71c8e-cas branch from c127ed5 to 96f7f14 Compare May 25, 2026 22:14
@max-sixty max-sixty changed the base branch from agent-a3bad31af2bb71c8e to main May 25, 2026 22:16
max-sixty and others added 3 commits May 25, 2026 15:21
Integration tests cover only one of the four arms — the
NotDeleted+planner-expected-deletion warning surfaced by
test_merge_pre_remove_new_commit_keeps_branch. Add a binary-crate unit
test that calls the helper with each outcome (Ok(NotDeleted) with both
planner predictions, Ok(ForceDeleted), Ok(Integrated), Err) so the
suppression logic and the log::warn Err arm are exercised in coverage.

Output goes to stderr and isn't captured — the assertion is the call
doesn't panic on any arm. User-visible message shapes stay pinned by
the integration tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three follow-ups to #2870 collapse into one rewrite of
`delete_branch_if_safe`: it now performs an atomic compare-and-swap
delete via `git update-ref -d refs/heads/<branch> <expected-sha>`,
where the expected SHA comes from the snapshot the integration check
already consulted. If the ref moves between the integration check and
the delete (a hook, a concurrent push), git rejects the CAS — the
branch is retained (fail-closed) and we surface the new
`BranchDeletionOutcome::RetainedRaced` outcome rather than dropping
the unmerged commits silently. Force-delete still uses `git branch -D`
(the user's explicit override).

This unifies the divergent safe-delete semantics in
`execute_instant_removal_or_fallback`:

  - Fast path and `SynchronousForNonCurrent` fallback already routed
    through `delete_branch_if_safe`, so they pick up CAS for free.
  - Detached fallback used to bake `&& git branch -d <branch>` into
    the shell, which only honored git's reachability-from-HEAD check
    (so squash-merged / patch-id / ancestor branches the planner
    accepted as integrated were rejected at delete time). Now
    `build_cas_branch_delete_tail` runs worktrunk's full integration
    check in the foreground and, if integrated, appends an atomic
    `git update-ref -d <ref> <expected-sha>` to the detached shell —
    same semantics as the other paths, plus CAS safety against tip
    movement between the foreground check and the detached delete.
  - `handle_branch_only_output` also switches its
    safe-delete path from `git branch -D` (an unsafe force-delete after
    a stale integration check) to `delete_branch_if_safe`, so the
    CAS protects branch-only deletions the same way.

The display layer absorbs the new variant: `flag_note` treats
`RetainedRaced` like `NotDeleted` for the success badge,
`handle_branch_deletion_result` shows the unmerged hint, and the
branch-only output path emits a dedicated "moved during deletion"
warning with a `wt remove --force-delete <branch>` recovery hint.

CAS protects the TOCTOU window inside `delete_branch_if_safe` (between
its own snapshot capture and `update-ref -d`). Hook-driven races are
still caught by the existing fresh integration check that runs after
pre-remove hooks; CAS narrows the residual unprotected window from
\"between check and delete\" to \"never\".

Tests: `git update-ref -d <ref> <sha>` is structured — exit 0 on
deletion, non-zero on stale-SHA rejection (\"cannot lock ref ... is at
X but expected Y\"). The CAS helper re-checks ref presence via
`rev-parse --verify --quiet` rather than parsing the error message,
keeping with the structured-output policy. Two new unit tests in
`git::remove::tests`:

  - `cas_rejects_delete_when_branch_advances` — captures a snapshot,
    moves the branch externally, then calls `delete_branch_if_safe`
    and asserts `RetainedRaced` plus surviving tip at the post-race
    SHA. Pins the CAS rejection mechanism.
  - `cas_deletes_when_branch_unchanged` — sanity check the common
    integrated case still deletes.

`test_prune_fallback_config_race_canary`'s wrapper-git script now
matches `update-ref -d refs/heads/<branch>` alongside the
`branch -d|-D <branch>` shapes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The doc rewrite that came with the CAS commit only mentioned the
NotDeleted-race case. Spell out RetainedRaced, NotDeleted (with the
planner-predicted-retention suppression), the silent success arms, and
the log::warn! Err arm so the file reads consistently after the rebase
onto the PR-2900 simplification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@max-sixty max-sixty force-pushed the agent-a3bad31af2bb71c8e-cas branch from 96f7f14 to 992790f Compare May 25, 2026 22:22

@worktrunk-bot worktrunk-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS rewrite addresses the previous review's structural concerns. One small follow-up on the new test (inline).

Comment thread src/output/handlers.rs
worktrunk-bot and others added 2 commits May 25, 2026 22:35
The arm this PR introduces — the only one that warns regardless of
planner_expected_retention — was the one missing from the per-arm test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…8e-cas

# Conflicts:
#	src/commands/step/prune.rs
worktrunk-bot
worktrunk-bot previously approved these changes May 28, 2026

@worktrunk-bot worktrunk-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codecov/patch is failing on the CAS changes. Triaging the report:

False positives (test exists, llvm-cov mis-attributes the middle lines of eprintln!(warning_message(cformat!(...)))): src/output/handlers.rs L324 and L335 in warn_if_branch_retained. The eprintln! openers at L322 and L333 are hit (L322 hits=2, L333 hits=1 in cobertura.xml), so the new warn_if_branch_retained_arms test added in 80e4b5e is doing its job — the gap is reporting noise inside the macro expansion.

Genuine gaps (hits=0 in cobertura):

  • src/output/handlers.rs L1127/L1129 — the RetainedRaced arm of handle_branch_only_output. Distinct from warn_if_branch_retained; the new unit test doesn't reach this function. Easiest fix: a sibling unit test that calls handle_branch_only_output with a BranchDeletionOutcome::RetainedRaced and asserts it doesn't panic, mirroring the pattern at handlers.rs:2016.
  • src/output/handlers.rs L393 — the None => remove_command branch of build_remove_command_with_tail. The Some(tail) arm is exercised (L391 hits=1); the no-tail arm is trivial to cover with a unit test on the helper.
  • src/git/remove.rs L437-438 — the snapshot_sha returns None fallback. By the inline comment this is intentionally a corner-case preservation of pre-CAS behavior, reachable only via a synthetic snapshot/branch mismatch. Either construct a RefSnapshot that hits it, or — if the path is genuinely unreachable — remove it and let the caller fall through to a single deletion code path per the CLAUDE.md coverage guidance ("remove unused code… where falling through to a general handler suffices").
  • src/git/remove.rs L495 — the real-error propagation in cas_delete_branch_outcome when both update-ref -d and the post-failure rev-parse --verify fail. Hardest to exercise without a fault-injection hook.

I'm dismissing my prior approval per CLAUDE.md's "never merge a PR with failing codecov/patch without explicit user approval" — the two helper-function gaps look mechanical to close.

max-sixty added 4 commits May 31, 2026 12:21
Close codecov/patch gaps on the new CAS branch-deletion paths:
- cas_delete_branch_outcome: the true-error arm (update-ref fails and the
  ref is already gone) propagates the error rather than reporting
  RetainedRaced.
- build_remove_command_with_tail: the no-tail branch returns the bare
  worktree-removal command.
delete_branch_if_safe falls back to a plain `branch -D` when the branch is
integrated — resolved via the live rev-parse fallback in
compute_integration_lazy — but absent from the captured snapshot, so
snapshot_sha has no SHA to compare-and-swap against. The arm is reachable
when the branch is created after the snapshot; exercise it directly rather
than leaving it as the lone uncovered patch path.
max-sixty and others added 3 commits June 27, 2026 16:48
# Conflicts:
#	src/output/handlers.rs
…t paths

The "branch kept because its tip moved during the delete" message had
drifted into three shapes: two near-identical inline strings (the
worktree-removal warning and the branch-only warning, differing only in
lead-in) plus a gap — the foreground worktree path routed `RetainedRaced`
through `show_unmerged_hint` and printed the generic "Branch unmerged;
run -D" hint, whose bare `-D` would force-delete the just-arrived racing
commits, the opposite of the cautious "inspect commits, then force-delete
if safe" the other paths gave.

Collapse all three onto one `retained_raced_branch_message` helper (a
sibling to `retained_unmerged_branch_messages`, same can't-drift
rationale), and give `print_hints` an explicit `RetainedRaced` arm so the
foreground path surfaces the correct message. `RetainedRaced` is no longer
folded into `show_unmerged_hint`, since every caller now handles it
explicitly.

While here:

- Converge the force-delete recovery command to the canonical
  `wt remove -D <branch>` via `suggest_command`, replacing the hardcoded
  `--force-delete` spelling these messages had drifted to.
- Inline the single-caller `snapshot_sha` wrapper in `delete_branch_if_safe`;
  it now reads `snapshot.local_branch(branch)` directly — the same accessor
  the detached-fallback CAS tail already used.

Behavior is unchanged except the now-correct foreground raced message and
the `-D` spelling; no snapshot pinned the raced/not-integrated strings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@worktrunk-bot worktrunk-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both concerns from the prior reviews are now resolved by the three new commits:

  • Foreground RetainedRaced mis-attribution (971e8b4) — the worktree path no longer folds RetainedRaced into show_unmerged_hint; print_hints gets a dedicated early-return arm, and all three emit paths (warn_if_branch_retained, handle_branch_only_output, print_hints) now route through one retained_raced_branch_message helper. I traced the readers: show_unmerged_hint is now NotDeleted-only, and every remaining NotDeleted | RetainedRaced grouping (flag_note) is correct since both yield an empty note. The raced framing ("inspect commits, then run wt remove -D if safe") is the right fix over the generic "Branch unmerged; run -D" hint, which mis-described an integrated-but-moved branch.
  • codecov/patch — now passing on dca30c2. The new tests (c58a41, 813f63) target the exact gaps flagged: the CAS true-error arm, the no-tail build_remove_command_with_tail branch, and the snapshot-skewed non-CAS fallback.

This change touches the branch-deletion / force-delete surface (src/git/remove.rs, src/output/handlers.rs, the wt remove -D recovery command), so per the repo's data-loss-surface review policy I'm not issuing an approval — that merge call is yours. No new findings on the incremental diff.

@max-sixty max-sixty merged commit 1eb4e58 into main Jun 28, 2026
37 checks passed
@max-sixty max-sixty deleted the agent-a3bad31af2bb71c8e-cas branch June 28, 2026 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants