gix-diff: order-independent and file-name-aware rename tracking#2687
gix-diff: order-independent and file-name-aware rename tracking#2687Amey Pawar (ameyypawar) wants to merge 2 commits into
Conversation
…deLabs#1832) The rewrites tracker sorted its items with a same-id tiebreak on the path byte-range offsets, which are assigned in the order changes are pushed. As changes can arrive nondeterministically - the index-worktree status feeds the tracker from a dirwalk thread and an index-traversal thread - the matcher could pick different sources for identical-content candidates between runs. Sort by each item's observable identity instead (id, then location, change-kind, relation and entry-mode), a total order that is independent of push order. Items that still compare equal are identical in every observed field and thus interchangeable for matching. Adds permutation tests for the rename and exhaustive-copy paths, and updates the realistic_renames_2 snapshot where two same-id entries now emit in path order (a pure reorder).
|
Thanks a lot for contributing. Making the rename tracking more robust and also more aligned with what Git does is definitely appreciated. For now, I put this PR back into draft mode, as CI is legitimately failing in tests. |
GitoxideLabs#1832) When several sources are equally viable for a destination - identical ids in the identity pass, or equal similarity in the similarity pass - prefer the one whose file name matches the destination's, in the spirit of Git's basename-driven matching. This makes the now-deterministic choice also the semantically better one, where previously the pairing among identical candidates was arbitrary. The tree-merge baseline had recorded the arbitrary pairings in its three deviating cases, along with their consequences. These expectations are updated to the improved outcomes: - rename-within-rename-2: both merge directions now produce the same cleanly merged tree - the result that was previously present but commented out in favor of the order-dependent conflict, making the merge reversible as intended. The expected-reversed workaround branch is no longer needed. - conflicting-rename-complex (both directions): contents follow their true renames now that files pair up by name, and the previously documented mismatch of finding a rename of a/w to a-renamed/z no longer happens. Forward and reversed merges produce the same tree and mirror-image conflict stages. All other 114 baseline cases are unchanged, in SHA-1 and SHA-256. Note that in the identity pass, when no file-name match exists, the scan now covers the whole same-id range before falling back to the first eligible candidate, where it stopped previously.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecf6eba5c0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| // Like Git's basename-driven matching, try sources whose file name matches the | ||
| // destination's first - contents can be equally similar in either direction, but | ||
| // matching names are the better pairing. | ||
| for match_filename in [true, false] { |
There was a problem hiding this comment.
Compare all candidates before applying basename preference
When similarity-based matching is enabled, this two-pass loop returns as soon as any same-filename source clears the threshold, so a weaker basename match can preempt a much stronger non-basename match. For example, if old/foo is only ~60% similar to new/foo but bar is ~90% similar, the first pass will pair old/foo -> new/foo without ever checking bar, which can make diffs and merge rename resolution attach changes to the wrong source.
Useful? React with 👍 / 👎.
Fixes #1832.
Problem
The rewrites tracker's pre-match sort broke same-id ties by
path.start/path.end— offsets intopath_backingthat are assigned in push order. Since changes can be pushed in a nondeterministic order (the index-worktree status feeds the tracker from a dirwalk thread and an index-traversal thread), the matcher could select a different source for identical-content candidates between runs. This is what makesgix-status'sindex_as_worktree_with_renames::changed_and_untracked_and_renamedfail intermittently on CI.The same order-dependence also made merges direction-sensitive: in the
rename-within-rename-2baseline, one merge direction produced a clean tree while the other produced a spurious content conflict — the asymmetry noted in the fixture itself ("It should of course be reversible…").Changes
id → location → change-kind → relation → entry-mode— a total order independent of push order.idstays primary sofind_match'spartition_pointidentity search is unaffected; items that still compare equal are identical in every observable field and thus interchangeable. Adds permutation tests that run a fixed change-set through every input order, for both the rename and the exhaustive-copy paths, and updates therealistic_renames_2snapshot where two same-id entries now emit in path order (a pure reorder).tree-baseline.shhad documented: "it finds a rename of a/w to a-renamed/z … because it has no basename based matching like Git".Tree-merge baseline updates (gix-merge)
114 of 117 baseline cases are byte-identical. The three exceptions are the deviating cases whose hand-crafted expectations had recorded the old order-dependent pairings:
rename-within-rename-2-A-B-deviates: both directions now produce the same cleanly merged tree — the result that was present but commented out in the fixture, and content-identical to what Git computes (Git additionally keeps the doubly-renamed file at both rename/rename destinations, while we compose the nested directory renames into one). Theexpected-reversedworkaround branch is no longer needed.conflicting-rename-complex-A-B(+ reversed): contents now follow their true renames (y.fandzpair by name;a/x.fanda/w, which B deleted, conflict with A's directory rename and keep our side). The documenteda/w→a-renamed/zmismatch no longer happens, and forward/reversed merges now produce the same tree with mirror-image conflict stages.Fixture archives regenerated for SHA-1 and SHA-256.
Note
In the identity pass, when no file-name match exists the scan now covers the whole same-id range before falling back to the first eligible candidate (previously it returned there). This only matters for degenerate ranges of many identical blobs — Git scans its full same-hash bucket for basename scoring as well — but I'm happy to bound it if preferred.
Validation
cargo test -p gix-diff— includes the new permutation testscargo test -p gix-mergeandGIX_TEST_FIXTURE_HASH=sha256 cargo test -p gix-merge tree::run_baseline— all 117 baseline cases in both hashescargo test -p gix-status— includes the previously-flakychanged_and_untracked_and_renamedcargo test -p gix,cargo test -p gitoxide-corecargo fmt --all -- --check,cargo clippy -p gix-diff -p gix-merge --tests -- -D warnings