fix(split): support splitting renamed files#1665
Draft
claytonrcarter wants to merge 11 commits into
Draft
Conversation
9c994a6 to
cbd941c
Compare
DUMMY_DATE is used to populate GIT_AUTHOR_DATE and GIT_COMMITTER_DATE, both of which expect dates in RFC2822 format, but DUMMY_DATE was itselt not in RFC2822 format, and it was made less so by appending a 2 digit timezone offset. This has not been an issue thus far because we were only passing the date to git as a string, and git seems to be fairly liberal when parsing dates. `git record --new` seems to be the first time we're actually creating wholey new commits (vs just modifying existing commits), so we need to set a current time on these new commits. To do so, we need to parse DUMMY_DATE into a SystemTime with chrono, and chrono is not lenient when parsing dates, leading to various errors: 1. The existing format yielded Invalid, because of the time/year order. 2. Fixing the order of the year led to TooShort, because of the 2 digit timezone offset. 3. Fixing the timezone offset yielded Impossible, because 2020-10-29 was not a Wednesday. Fun! Regardless, I don't expect these changes to have any impact outside of the upcoming tests for `record --new`. Old format: Wed 29 Oct 12:34:56 2020 PDT -02 New format: Thu, 29 Oct 2020 12:34:56 -0200 Ref: https://github.com/git/git/blob/master/Documentation/date-formats.adoc Ref: https://www.rfc-editor.org/rfc/rfc2822#section-3.3
`record --new` is the first feature in which we create entirely new commits (vs rewriting or splitting existing commits) and I was running into test failures in CI that seemed to only differ by timezone. Explicitly setting the time zone in the test git environment resolved the issue.
Fixes various defects encountered during testing.
Prep only, no logic changes yet.
Prep only, no logic changes. This sould make the next diff more concise.
When --before is used: 1. The commit C is created normally as a child of HEAD (B). 2. A c_alt commit is created in-memory as a child of B's parent (A), with the same tree and metadata as C. 3. A RewriteEvent records C as obsolete (replaced by c_alt). 4. `git reset --soft HEAD~` moves HEAD back to B, orphaning C and making it invisible in the DAG (preventing constraint cycles). 5. A rebase plan moves B to be a child of c_alt, yielding B'. The result is A <- c_alt <- B', with HEAD and any branch tracking B'. Four test cases covering the new --before behaviour: - test_record_before: basic happy path - working-copy changes are inserted before HEAD as a new commit; HEAD (and its branch) advance to the rebased original HEAD. - test_record_before_with_new: --new --before inserts an empty commit before HEAD and rebases HEAD on top of it. - test_record_before_rewrite_public_commit: warns when HEAD is a public commit and prints a specific `git move` invocation to force-proceed. - test_record_before_merge_conflict: conflict during the rebase of HEAD is surfaced with the Before remediation message.
Add a new FileExtractSpec enum to git-branchless-opts with two variants:
- WholeFile(String): existing whole-file behaviour (no change to users)
- LineRange { file, start_line, end_line }: extract only the diff hunks
overlapping a specific line range in the target (post-commit) file
Implement FromStr so clap parses CLI args directly into the enum:
file.txt -> WholeFile
file.txt:42 -> LineRange { start_line: 42, end_line: 42 }
file.txt:10-42 -> LineRange { start_line: 10, end_line: 42 }
The ':/path' git repo-relative prefix is handled specially (never
treated as containing a line-range suffix). Windows drive-letter paths
are also handled correctly via rfind(':') + digit validation.
Update the Split command's files field from Vec<String> to
Vec<FileExtractSpec> and expand its doc comment accordingly.
When a FileExtractSpec::LineRange is passed to split(), extract only the
diff hunks that overlap the specified line range in the post-commit
(new) version of the file rather than swapping the entire file entry.
How it works
============
A new helper, select_hunks_by_line_range(), walks the scm_record::File
sections produced by process_diff_for_record(), tracking the running
new-file line counter:
- Unchanged sections advance the counter by their line count.
- Changed sections are marked as checked (is_checked = true) when
their new-file span overlaps [start_line, end_line].
- Pure-removal sections (no Added lines) are matched when their
insertion point falls within the range.
After selection, File::get_selected_contents() returns:
- selected: parent content + checked hunks applied
- unselected: parent content + unchecked hunks applied
These are written as blobs via repo.create_blob_from_contents() and
grafted into the remainder tree with hydrate_tree().
Mode-specific behaviour
=======================
InsertAfter / DetachAfter / Discard:
remainder_tree[file] = unselected blob (target minus selected hunks)
The extracted commit is still produced by cherry_pick_fast(), which
performs a 3-way merge; for independent hunks this cleanly restores
the selected changes on top of the remainder.
InsertBefore:
remainder_tree[file] = selected blob (parent plus selected hunks)
The original target is rebased on top via move_subtree(), adding the
remaining hunks.
Error handling
==============
- Binary files: rejected with a descriptive message.
- No matching hunks: rejected with the line range and filename.
- File not changed in commit: same error as whole-file extraction.
Dependencies: add scm-record as a direct dependency of the
git-branchless crate so that scm_record::Section and ChangeType are
available in split.rs.
Testing
=======
New tests for the FileExtractSpec::LineRange feature:
test_split_hunk_by_line_number
File with two hunks far apart (lines 1 and 10). Split with 'test.txt:1'
and verify: remainder keeps the line-10 change; extracted commit
introduces only the line-1 change on top.
test_split_hunk_line_range
Same two-hunk setup, but use 'test.txt:10-10' to extract the bottom
hunk instead. Verifies the complementary split direction.
test_split_hunk_insert_before
Same setup with --before: the line-1 hunk becomes a new parent commit
while the rebased original contributes only the line-10 change.
test_split_hunk_no_match_error
Single-hunk file; specifying a line with no changed hunk produces
exit code 1 and a descriptive error message.
Tests use git show COMMIT:path to check file contents directly,
avoiding commit-hash dependence in snapshot strings.
This is a 1st pass; still TODO: - confirm rename w/ changed contents can split - confirm rename w/ other files can split
cbd941c to
bf02f5b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
wip, currently includes #1612 #1663 and #1664