Skip to content

gix-diff: order-independent and file-name-aware rename tracking#2687

Open
Amey Pawar (ameyypawar) wants to merge 2 commits into
GitoxideLabs:mainfrom
ameyypawar:fix/1832-rename-tracker-order-independent
Open

gix-diff: order-independent and file-name-aware rename tracking#2687
Amey Pawar (ameyypawar) wants to merge 2 commits into
GitoxideLabs:mainfrom
ameyypawar:fix/1832-rename-tracker-order-independent

Conversation

@ameyypawar

@ameyypawar Amey Pawar (ameyypawar) commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Fixes #1832.

Problem

The rewrites tracker's pre-match sort broke same-id ties by path.start/path.end — offsets into path_backing that are assigned in push order. Since changes can be pushed in a nondeterministic order (the index-worktree status feeds the tracker from a dirwalk thread and an index-traversal thread), the matcher could select a different source for identical-content candidates between runs. This is what makes gix-status's index_as_worktree_with_renames::changed_and_untracked_and_renamed fail intermittently on CI.

The same order-dependence also made merges direction-sensitive: in the rename-within-rename-2 baseline, one merge direction produced a clean tree while the other produced a spurious content conflict — the asymmetry noted in the fixture itself ("It should of course be reversible…").

Changes

  • Commit 1: sort tracker items by their observable identity — id → location → change-kind → relation → entry-mode — a total order independent of push order. id stays primary so find_match's partition_point identity search is unaffected; items that still compare equal are identical in every observable field and thus interchangeable. Adds permutation tests that run a fixed change-set through every input order, for both the rename and the exhaustive-copy paths, and updates the realistic_renames_2 snapshot where two same-id entries now emit in path order (a pure reorder).
  • Commit 2: prefer sources whose file name matches the destination's when pairing candidates — in both the identity and similarity paths — in the spirit of Git's basename-driven matching. This makes the deterministic choice also the semantically better one; previously the pairing among identical candidates was arbitrary, which tree-baseline.sh had documented: "it finds a rename of a/w to a-renamed/z … because it has no basename based matching like Git".

Tree-merge baseline updates (gix-merge)

114 of 117 baseline cases are byte-identical. The three exceptions are the deviating cases whose hand-crafted expectations had recorded the old order-dependent pairings:

  • rename-within-rename-2-A-B-deviates: both directions now produce the same cleanly merged tree — the result that was present but commented out in the fixture, and content-identical to what Git computes (Git additionally keeps the doubly-renamed file at both rename/rename destinations, while we compose the nested directory renames into one). The expected-reversed workaround branch is no longer needed.
  • conflicting-rename-complex-A-B (+ reversed): contents now follow their true renames (y.f and z pair by name; a/x.f and a/w, which B deleted, conflict with A's directory rename and keep our side). The documented a/wa-renamed/z mismatch no longer happens, and forward/reversed merges now produce the same tree with mirror-image conflict stages.

Fixture archives regenerated for SHA-1 and SHA-256.

Note

In the identity pass, when no file-name match exists the scan now covers the whole same-id range before falling back to the first eligible candidate (previously it returned there). This only matters for degenerate ranges of many identical blobs — Git scans its full same-hash bucket for basename scoring as well — but I'm happy to bound it if preferred.

Validation

  • cargo test -p gix-diff — includes the new permutation tests
  • cargo test -p gix-merge and GIX_TEST_FIXTURE_HASH=sha256 cargo test -p gix-merge tree::run_baseline — all 117 baseline cases in both hashes
  • cargo test -p gix-status — includes the previously-flaky changed_and_untracked_and_renamed
  • cargo test -p gix, cargo test -p gitoxide-core
  • cargo fmt --all -- --check, cargo clippy -p gix-diff -p gix-merge --tests -- -D warnings

…deLabs#1832)

The rewrites tracker sorted its items with a same-id tiebreak on the path byte-range offsets, which are assigned in the order changes are pushed. As changes can arrive nondeterministically - the index-worktree status feeds the tracker from a dirwalk thread and an index-traversal thread - the matcher could pick different sources for identical-content candidates between runs.

Sort by each item's observable identity instead (id, then location, change-kind, relation and entry-mode), a total order that is independent of push order. Items that still compare equal are identical in every observed field and thus interchangeable for matching.

Adds permutation tests for the rename and exhaustive-copy paths, and updates the realistic_renames_2 snapshot where two same-id entries now emit in path order (a pure reorder).
@Byron Sebastian Thiel (Byron) marked this pull request as draft July 2, 2026 09:05
@Byron

Copy link
Copy Markdown
Member

Thanks a lot for contributing. Making the rename tracking more robust and also more aligned with what Git does is definitely appreciated.

For now, I put this PR back into draft mode, as CI is legitimately failing in tests.

GitoxideLabs#1832)

When several sources are equally viable for a destination - identical ids in the identity pass, or equal similarity in the similarity pass - prefer the one whose file name matches the destination's, in the spirit of Git's basename-driven matching. This makes the now-deterministic choice also the semantically better one, where previously the pairing among identical candidates was arbitrary.

The tree-merge baseline had recorded the arbitrary pairings in its three deviating cases, along with their consequences. These expectations are updated to the improved outcomes:

- rename-within-rename-2: both merge directions now produce the same cleanly merged tree - the result that was previously present but commented out in favor of the order-dependent conflict, making the merge reversible as intended. The expected-reversed workaround branch is no longer needed.
- conflicting-rename-complex (both directions): contents follow their true renames now that files pair up by name, and the previously documented mismatch of finding a rename of a/w to a-renamed/z no longer happens. Forward and reversed merges produce the same tree and mirror-image conflict stages.

All other 114 baseline cases are unchanged, in SHA-1 and SHA-256. Note that in the identity pass, when no file-name match exists, the scan now covers the whole same-id range before falling back to the first eligible candidate, where it stopped previously.
@ameyypawar Amey Pawar (ameyypawar) changed the title gix-diff: make rename tracking independent of the input order gix-diff: order-independent and file-name-aware rename tracking Jul 2, 2026
@ameyypawar Amey Pawar (ameyypawar) marked this pull request as ready for review July 2, 2026 17:29

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecf6eba5c0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

// Like Git's basename-driven matching, try sources whose file name matches the
// destination's first - contents can be equally similar in either direction, but
// matching names are the better pairing.
for match_filename in [true, false] {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compare all candidates before applying basename preference

When similarity-based matching is enabled, this two-pass loop returns as soon as any same-filename source clears the threshold, so a weaker basename match can preempt a much stronger non-basename match. For example, if old/foo is only ~60% similar to new/foo but bar is ~90% similar, the first pass will pair old/foo -> new/foo without ever checking bar, which can make diffs and merge rename resolution attach changes to the wrong source.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename tracker shouldn't be order-dependent

2 participants