Skip to content

fix(split): support splitting renamed files#1665

Draft
claytonrcarter wants to merge 11 commits into
arxanas:masterfrom
claytonrcarter:split-rename
Draft

fix(split): support splitting renamed files#1665
claytonrcarter wants to merge 11 commits into
arxanas:masterfrom
claytonrcarter:split-rename

Conversation

@claytonrcarter
Copy link
Copy Markdown
Collaborator

wip, currently includes #1612 #1663 and #1664

@claytonrcarter claytonrcarter force-pushed the split-rename branch 3 times, most recently from 9c994a6 to cbd941c Compare May 20, 2026 03:15
DUMMY_DATE is used to populate GIT_AUTHOR_DATE and GIT_COMMITTER_DATE, both of
which expect dates in RFC2822 format, but DUMMY_DATE was itselt not in RFC2822
format, and it was made less so by appending a 2 digit timezone offset.

This has not been an issue thus far because we were only passing the date to
git as a string, and git seems to be fairly liberal when parsing dates. `git
record --new` seems to be the first time we're actually creating wholey new
commits (vs just modifying existing commits), so we need to set a current time
on these new commits. To do so, we need to parse DUMMY_DATE into a SystemTime
with chrono, and chrono is not lenient when parsing dates, leading to various
errors:

1. The existing format yielded Invalid, because of the time/year order.
2. Fixing the order of the year led to TooShort, because of the 2 digit
   timezone offset.
3. Fixing the timezone offset yielded Impossible, because 2020-10-29 was
   not a Wednesday.

Fun! Regardless, I don't expect these changes to have any impact outside of
the upcoming tests for `record --new`.

Old format: Wed 29 Oct 12:34:56 2020 PDT -02
New format: Thu, 29 Oct 2020 12:34:56 -0200

Ref: https://github.com/git/git/blob/master/Documentation/date-formats.adoc
Ref: https://www.rfc-editor.org/rfc/rfc2822#section-3.3
`record --new` is the first feature in which we create entirely new
commits (vs rewriting or splitting existing commits) and I was running
into test failures in CI that seemed to only differ by timezone.
Explicitly setting the time zone in the test git environment resolved
the issue.
Fixes various defects encountered during testing.
Prep only, no logic changes yet.
Prep only, no logic changes. This sould make the next diff more concise.
When --before is used:
1. The commit C is created normally as a child of HEAD (B).
2. A c_alt commit is created in-memory as a child of B's parent (A),
   with the same tree and metadata as C.
3. A RewriteEvent records C as obsolete (replaced by c_alt).
4. `git reset --soft HEAD~` moves HEAD back to B, orphaning C and
   making it invisible in the DAG (preventing constraint cycles).
5. A rebase plan moves B to be a child of c_alt, yielding B'.

The result is A <- c_alt <- B', with HEAD and any branch tracking B'.

Four test cases covering the new --before behaviour:

- test_record_before: basic happy path - working-copy changes are
  inserted before HEAD as a new commit; HEAD (and its branch) advance
  to the rebased original HEAD.
- test_record_before_with_new: --new --before inserts an empty commit
  before HEAD and rebases HEAD on top of it.
- test_record_before_rewrite_public_commit: warns when HEAD is a public
  commit and prints a specific `git move` invocation to force-proceed.
- test_record_before_merge_conflict: conflict during the rebase of HEAD
  is surfaced with the Before remediation message.
Add a new FileExtractSpec enum to git-branchless-opts with two variants:
- WholeFile(String): existing whole-file behaviour (no change to users)
- LineRange { file, start_line, end_line }: extract only the diff hunks
  overlapping a specific line range in the target (post-commit) file

Implement FromStr so clap parses CLI args directly into the enum:
  file.txt         -> WholeFile
  file.txt:42      -> LineRange { start_line: 42, end_line: 42 }
  file.txt:10-42   -> LineRange { start_line: 10, end_line: 42 }

The ':/path' git repo-relative prefix is handled specially (never
treated as containing a line-range suffix). Windows drive-letter paths
are also handled correctly via rfind(':') + digit validation.

Update the Split command's files field from Vec<String> to
Vec<FileExtractSpec> and expand its doc comment accordingly.
When a FileExtractSpec::LineRange is passed to split(), extract only the
diff hunks that overlap the specified line range in the post-commit
(new) version of the file rather than swapping the entire file entry.

How it works
============
A new helper, select_hunks_by_line_range(), walks the scm_record::File
sections produced by process_diff_for_record(), tracking the running
new-file line counter:

  - Unchanged sections advance the counter by their line count.
  - Changed sections are marked as checked (is_checked = true) when
    their new-file span overlaps [start_line, end_line].
  - Pure-removal sections (no Added lines) are matched when their
    insertion point falls within the range.

After selection, File::get_selected_contents() returns:
  - selected:   parent content + checked hunks applied
  - unselected: parent content + unchecked hunks applied

These are written as blobs via repo.create_blob_from_contents() and
grafted into the remainder tree with hydrate_tree().

Mode-specific behaviour
=======================
InsertAfter / DetachAfter / Discard:
  remainder_tree[file] = unselected blob (target minus selected hunks)
  The extracted commit is still produced by cherry_pick_fast(), which
  performs a 3-way merge; for independent hunks this cleanly restores
  the selected changes on top of the remainder.

InsertBefore:
  remainder_tree[file] = selected blob (parent plus selected hunks)
  The original target is rebased on top via move_subtree(), adding the
  remaining hunks.

Error handling
==============
  - Binary files: rejected with a descriptive message.
  - No matching hunks: rejected with the line range and filename.
  - File not changed in commit: same error as whole-file extraction.

Dependencies: add scm-record as a direct dependency of the
git-branchless crate so that scm_record::Section and ChangeType are
available in split.rs.

Testing
=======
New tests for the FileExtractSpec::LineRange feature:

test_split_hunk_by_line_number
  File with two hunks far apart (lines 1 and 10). Split with 'test.txt:1'
  and verify: remainder keeps the line-10 change; extracted commit
  introduces only the line-1 change on top.

test_split_hunk_line_range
  Same two-hunk setup, but use 'test.txt:10-10' to extract the bottom
  hunk instead. Verifies the complementary split direction.

test_split_hunk_insert_before
  Same setup with --before: the line-1 hunk becomes a new parent commit
  while the rebased original contributes only the line-10 change.

test_split_hunk_no_match_error
  Single-hunk file; specifying a line with no changed hunk produces
  exit code 1 and a descriptive error message.

Tests use git show COMMIT:path to check file contents directly,
avoiding commit-hash dependence in snapshot strings.
This is a 1st pass; still TODO:
- confirm rename w/ changed contents can split
- confirm rename w/ other files can split
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant