Skip to content

Commit 1eb4e58

Browse files
max-sixtyclaudeworktrunk-bot
authored
fix(remove): atomic CAS branch deletion unifies safe-delete semantics (#2903)
Three follow-ups to #2870 plus a CAS-based unification of safe-delete semantics. (#2900 was the small/incremental path for the first two items; it has been closed as superseded — its commits are all contained here.) ## Item 1 — drop the defensive `check_lock` (refactor) The `check_lock` `RwLock<()>` in `step_prune` was carried over from #2808's Windows `.git/config` race fix. After #2870 restructured the flow, the phase ordering already serializes integration-check readers against removals: the background `par_iter` is the only reader, the `for ... in rx` loop exits only once that thread has finished, and removals run strictly after. Belt-and-suspenders per CLAUDE.md — removed. ## Item 2 — warn when background branch deletion silently retains (fix) Both branch-deletion paths in `execute_instant_removal_or_fallback` swallowed errors at `log::debug!`. Promoted the surprise `Ok(NotDeleted)` (planner predicted deletion, branch survived — a hook moved the tip) to a `warning_message` with a `wt remove -D <branch>` recovery hint. Suppressed when the planner already predicted retention so the existing `print_hints` unmerged-branch message doesn't duplicate. `Err` failures now log at warn level (developer-facing) rather than user-actionable warnings — the failure modes (`git update-ref` exec error, refs DB I/O) aren't actionable beyond re-running. ## Item 3 / CAS rewrite — atomic safe-delete (the big one) `delete_branch_if_safe` now uses `git update-ref -d refs/heads/<branch> <expected-sha>` instead of `git branch -D`. The expected SHA comes from the snapshot the integration check already consulted, so the check→delete sequence becomes one atomic step against a known SHA. If anything moved the ref in between, git rejects the CAS — the new `BranchDeletionOutcome::RetainedRaced` outcome is returned and surfaced to the user. Force-delete keeps `git branch -D` (explicit override). This unifies the divergent safe-delete semantics in `execute_instant_removal_or_fallback`: - **Fast path** + **SynchronousForNonCurrent fallback** routed through `delete_branch_if_safe`, so they pick up CAS for free. - **Detached fallback** (rename failed AND current worktree): the shell `&& git branch -d <branch>` is replaced by a foreground integration check + atomic `&& git update-ref -d <ref> <expected-sha>` tail (`build_cas_branch_delete_tail`). Squash-merged / patch-id / ancestor branches the planner accepts are now accepted at delete time too, matching the fast path. - **Branch-only deletion** (`handle_branch_only_output`) switches from bare `git branch -D` (effectively a force-delete after a stale integration check) to `delete_branch_if_safe`. CAS protects it too. ## Follow-up consolidation (commit `971e8b4`) The "branch kept because its tip moved during the delete" message had drifted into three shapes — two near-identical inline strings plus a gap: the foreground worktree path routed `RetainedRaced` through `show_unmerged_hint` and printed the generic "Branch unmerged; run `-D`" hint, whose bare `-D` would force-delete the just-arrived racing commits. All three emit paths (`warn_if_branch_retained`, `handle_branch_only_output`, `print_hints`) now route through one `retained_raced_branch_message` helper, with a dedicated `RetainedRaced` early-return arm in `print_hints`; `show_unmerged_hint` is now `NotDeleted`-only. Also converged the recovery command to the canonical `wt remove -D <branch>` (via `suggest_command`) and inlined the single-caller `snapshot_sha` wrapper. ### Race coverage CAS closes the TOCTOU window inside `delete_branch_if_safe` (between its own `capture_refs` and `update-ref -d`). Hook-driven races continue to be caught by the existing fresh integration check that runs after pre-remove hooks; CAS narrows the residual window from "between check and delete" to "never". The race rejection is unit-simulated (the ref is moved externally), not yet driven end-to-end through a real `wt remove` + hook. ### Tests Unit tests in `git::remove::tests`: `cas_rejects_delete_when_branch_advances` (pins the CAS rejection mechanism), `cas_deletes_when_branch_unchanged`, `cas_propagates_error_when_ref_vanished`, and `deletes_via_fallback_when_branch_absent_from_snapshot`. In `output::handlers::tests`: every `warn_if_branch_retained` arm, `build_remove_command_with_tail_appends_only_when_present`, and `retained_raced_branch_message_lead_in_varies`. `test_prune_fallback_config_race_canary`'s wrapper-git script matches the new `update-ref -d refs/heads/<branch>` shape. `cargo run -- hook pre-merge --yes` passes locally: 4263 nextest tests + doctests + pre-commit + clippy + docs, no snapshot drift. ### Known gaps (deferred) - No end-to-end integration test of a real mid-delete race (a `pre-remove` hook advancing the branch); the mechanism is unit-covered. - The FAQ [What can Worktrunk delete?](docs/content/faq.md#what-can-worktrunk-delete) doesn't yet mention the fail-closed-on-race behavior. > _This was written by Claude Code on behalf of Maximilian Roos_ --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: worktrunk-bot <254187624+worktrunk-bot@users.noreply.github.com>
1 parent 02c0621 commit 1eb4e58

3 files changed

Lines changed: 514 additions & 42 deletions

File tree

src/git/remove.rs

Lines changed: 199 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,12 @@ impl BranchDeletionMode {
225225
pub enum BranchDeletionOutcome {
226226
/// Branch was not deleted — it was not integrated, and deletion was not forced.
227227
NotDeleted,
228+
/// Branch was integrated but the atomic compare-and-swap deletion was
229+
/// rejected because the ref moved between the integration check and the
230+
/// delete attempt — e.g. a hook or concurrent process advanced it. The
231+
/// branch is retained (fail-closed), and the caller surfaces this as a
232+
/// warning so the user can decide whether to re-check and delete.
233+
RetainedRaced,
228234
/// Branch was force-deleted without an integration check.
229235
ForceDeleted,
230236
/// Branch was deleted because it was integrated (the specific reason is attached).
@@ -412,8 +418,28 @@ pub fn delete_branch_if_safe(
412418

413419
let outcome = match reason {
414420
Some(r) => {
415-
repo.run_command(&["branch", "-D", branch_name])?;
416-
BranchDeletionOutcome::Integrated(r)
421+
// Atomic compare-and-swap against the snapshotted SHA. If the ref
422+
// moved between `integration_reason` and the delete (e.g. a hook
423+
// advanced the branch), `git update-ref -d <ref> <expected>` fails
424+
// closed: the branch is retained and we surface a `RetainedRaced`
425+
// outcome rather than dropping the unmerged commits silently.
426+
//
427+
// Read the SHA from the snapshot inventory (`local_branch`) rather
428+
// than `resolve()`, so it reflects the same `refs/heads/` walk
429+
// `integration_reason` consulted.
430+
match snapshot.local_branch(branch_name) {
431+
Some(b) => cas_delete_branch_outcome(repo, branch_name, &b.commit_sha, r)?,
432+
// Snapshot doesn't carry the branch SHA — extremely unusual
433+
// (the caller just captured refs, the branch is present in the
434+
// integration check). Fall through to a non-CAS delete rather
435+
// than failing the whole operation: this preserves the
436+
// pre-CAS behavior in a corner case rather than introducing a
437+
// new error class.
438+
None => {
439+
repo.run_command(&["branch", "-D", branch_name])?;
440+
BranchDeletionOutcome::Integrated(r)
441+
}
442+
}
417443
}
418444
None => BranchDeletionOutcome::NotDeleted,
419445
};
@@ -424,6 +450,46 @@ pub fn delete_branch_if_safe(
424450
})
425451
}
426452

453+
/// Atomically delete `refs/heads/<branch>` iff it currently points at
454+
/// `expected_sha`, and translate the result into a [`BranchDeletionOutcome`].
455+
///
456+
/// `git update-ref -d <ref> <oid>` is git's compare-and-swap delete primitive:
457+
/// the ref is removed only if its current value matches `<oid>`. If it has
458+
/// moved (a hook or concurrent process advanced the branch), the command
459+
/// exits non-zero with a `cannot lock ref` message and the ref is left alone —
460+
/// fail-closed semantics that protect unmerged commits.
461+
///
462+
/// When `update-ref` fails, distinguishes "ref moved" (the ref still exists
463+
/// → `RetainedRaced`) from "real error" (the ref is gone or git itself
464+
/// failed → propagate the original error) by re-checking with `rev-parse
465+
/// --verify --quiet`, which has a structured exit code (0 = present, 1 =
466+
/// absent) rather than relying on locale-sensitive error-message text.
467+
fn cas_delete_branch_outcome(
468+
repo: &Repository,
469+
branch_name: &str,
470+
expected_sha: &str,
471+
reason: IntegrationReason,
472+
) -> anyhow::Result<BranchDeletionOutcome> {
473+
let ref_name = format!("refs/heads/{branch_name}");
474+
let update_err = match repo.run_command(&["update-ref", "-d", &ref_name, expected_sha]) {
475+
Ok(_) => return Ok(BranchDeletionOutcome::Integrated(reason)),
476+
Err(e) => e,
477+
};
478+
479+
// CAS failed. Re-check the ref to distinguish a race rejection (ref
480+
// moved → still present) from a true error (refs DB I/O, permissions,
481+
// git missing → propagate). `rev-parse --verify --quiet` returns exit
482+
// 0 when the ref exists, exit 1 when it does not — no message parsing.
483+
if repo
484+
.run_command(&["rev-parse", "--verify", "--quiet", &ref_name])
485+
.is_ok()
486+
{
487+
Ok(BranchDeletionOutcome::RetainedRaced)
488+
} else {
489+
Err(update_err)
490+
}
491+
}
492+
427493
/// Generate a staging path for worktree removal.
428494
///
429495
/// Places the staging directory inside `<git-common-dir>/wt/trash/` so it is
@@ -445,12 +511,143 @@ pub(crate) fn generate_removing_path(trash_dir: &Path, worktree_path: &Path) ->
445511
#[cfg(test)]
446512
mod tests {
447513
use super::*;
514+
use crate::testing::TestRepo;
515+
516+
/// When the branch tip moves between snapshot capture and the deletion
517+
/// attempt, the atomic compare-and-swap rejects the delete and surfaces
518+
/// `RetainedRaced` rather than dropping the new commits silently.
519+
///
520+
/// The setup mimics a hook (or any concurrent writer) that advances the
521+
/// branch after the planner observed it as integrated: we capture refs
522+
/// when `feature` is at the same commit as `main` (trivially integrated),
523+
/// then move `feature` forward, then call `delete_branch_if_safe` with
524+
/// the stale snapshot. Integration check still says "integrated" (the
525+
/// snapshotted SHA is reachable from main), but CAS catches the live
526+
/// tip having moved and refuses the delete.
527+
#[test]
528+
fn cas_rejects_delete_when_branch_advances() {
529+
let test = TestRepo::with_initial_commit();
530+
test.run_git(&["branch", "feature"]);
531+
let repo = Repository::at(test.root_path()).unwrap();
532+
533+
// Snapshot captures `feature` at the initial commit (same as `main`).
534+
let snapshot = repo.capture_refs().unwrap();
535+
let original_sha = snapshot.local_branch("feature").unwrap().commit_sha.clone();
536+
537+
// Race: advance `feature` after the snapshot.
538+
test.run_git(&["checkout", "feature"]);
539+
std::fs::write(test.root_path().join("race.txt"), "boom\n").unwrap();
540+
test.run_git(&["add", "race.txt"]);
541+
test.run_git(&["commit", "-m", "post-snapshot advance"]);
542+
test.run_git(&["checkout", "main"]);
543+
544+
let advanced_sha = test.git_output(&["rev-parse", "feature"]);
545+
assert_ne!(
546+
original_sha, advanced_sha,
547+
"test setup: tip must have moved"
548+
);
549+
550+
let result = delete_branch_if_safe(&repo, &snapshot, "feature", "main", false).unwrap();
551+
assert!(
552+
matches!(result.outcome, BranchDeletionOutcome::RetainedRaced),
553+
"expected RetainedRaced, got a different outcome"
554+
);
555+
556+
// Branch survives, still at the post-race SHA.
557+
let live = test.git_output(&["rev-parse", "feature"]);
558+
assert_eq!(live, advanced_sha, "branch must not be deleted nor reset");
559+
}
560+
561+
/// Plain integrated case still deletes via CAS. Sanity check that the
562+
/// new code path doesn't break the common case.
563+
#[test]
564+
fn cas_deletes_when_branch_unchanged() {
565+
let test = TestRepo::with_initial_commit();
566+
test.run_git(&["branch", "feature"]);
567+
let repo = Repository::at(test.root_path()).unwrap();
568+
569+
let snapshot = repo.capture_refs().unwrap();
570+
let result = delete_branch_if_safe(&repo, &snapshot, "feature", "main", false).unwrap();
571+
assert!(
572+
matches!(result.outcome, BranchDeletionOutcome::Integrated(_)),
573+
"expected Integrated, got a different outcome"
574+
);
575+
576+
// Ref should be gone.
577+
let exit = std::process::Command::new("git")
578+
.args(["rev-parse", "--verify", "--quiet", "refs/heads/feature"])
579+
.current_dir(test.root_path())
580+
.status()
581+
.unwrap();
582+
assert!(!exit.success(), "branch should have been deleted");
583+
}
584+
585+
/// When the branch ref vanishes between snapshot capture and the CAS
586+
/// delete, `git update-ref -d` fails *and* the ref is already absent, so
587+
/// the outcome is a real error (propagated) — distinct from the
588+
/// `RetainedRaced` case where the ref moved but still exists.
589+
#[test]
590+
fn cas_propagates_error_when_ref_vanished() {
591+
let test = TestRepo::with_initial_commit();
592+
test.run_git(&["branch", "feature"]);
593+
let repo = Repository::at(test.root_path()).unwrap();
594+
595+
// Snapshot captures `feature`, then it is deleted out-of-band.
596+
let snapshot = repo.capture_refs().unwrap();
597+
test.run_git(&["branch", "-D", "feature"]);
598+
599+
// Integration still reads "integrated" from the stale snapshot, the CAS
600+
// update-ref fails, and rev-parse confirms the ref is gone → error.
601+
let result = delete_branch_if_safe(&repo, &snapshot, "feature", "main", false);
602+
assert!(
603+
result.is_err(),
604+
"expected a propagated error when the ref vanished, got Ok"
605+
);
606+
}
607+
608+
/// When the branch is integrated but absent from the captured snapshot
609+
/// (created after capture), `integration_reason` still resolves it via the
610+
/// live `rev-parse` fallback, yet the snapshot carries no SHA to CAS
611+
/// against. The delete falls back to a plain `branch -D` and reports
612+
/// `Integrated` — the non-CAS arm that exists for exactly this skew.
613+
#[test]
614+
fn deletes_via_fallback_when_branch_absent_from_snapshot() {
615+
let test = TestRepo::with_initial_commit();
616+
let repo = Repository::at(test.root_path()).unwrap();
617+
618+
// Capture refs BEFORE `feature` exists, so the snapshot carries no SHA
619+
// for it (forcing the snapshot-miss, non-CAS arm).
620+
let snapshot = repo.capture_refs().unwrap();
621+
assert!(
622+
snapshot.local_branch("feature").is_none(),
623+
"test setup: snapshot must predate the branch"
624+
);
625+
626+
// Create `feature` at main's commit → trivially integrated (same
627+
// commit), resolvable live but missing from the stale snapshot.
628+
test.run_git(&["branch", "feature"]);
629+
630+
let result = delete_branch_if_safe(&repo, &snapshot, "feature", "main", false).unwrap();
631+
assert!(
632+
matches!(result.outcome, BranchDeletionOutcome::Integrated(_)),
633+
"expected Integrated via the non-CAS fallback"
634+
);
635+
636+
// Branch was deleted.
637+
let exit = std::process::Command::new("git")
638+
.args(["rev-parse", "--verify", "--quiet", "refs/heads/feature"])
639+
.current_dir(test.root_path())
640+
.status()
641+
.unwrap();
642+
assert!(!exit.success(), "branch should have been deleted");
643+
}
448644

449645
#[test]
450646
fn test_branch_deletion_outcome_matching() {
451647
// Ensure the match patterns work correctly
452648
let outcomes = [
453649
(BranchDeletionOutcome::NotDeleted, false),
650+
(BranchDeletionOutcome::RetainedRaced, false),
454651
(BranchDeletionOutcome::ForceDeleted, true),
455652
(
456653
BranchDeletionOutcome::Integrated(IntegrationReason::SameCommit),

0 commit comments

Comments
 (0)