Skip to content

rev-list: use merge-base --independent algorithm when possible#2082

Open
derrickstolee wants to merge 3 commits intogitgitgadget:masterfrom
derrickstolee:maximal-faster
Open

rev-list: use merge-base --independent algorithm when possible#2082
derrickstolee wants to merge 3 commits intogitgitgadget:masterfrom
derrickstolee:maximal-faster

Conversation

@derrickstolee
Copy link
Copy Markdown

@derrickstolee derrickstolee commented Apr 6, 2026

The --maximal-only option was added to git rev-list in b4e8f60 (revision: add --maximal-only option, 2026-01-22) and the discussion [1] included talks of how 'git rev-list --maximal-only <refs>' acts the same as 'git merge-base --independent <refs>' assuming that no other walk modifiers are provided to the revision walk. And with those assumptions, the merge-base algorithm can be faster if the refs have most of their history shared.

[1] https://lore.kernel.org/git/pull.2032.v2.git.1769097958549.gitgitgadget@gmail.com/

This series updates the revision walk to use the merge-base algorithm when possible. This checks the rev_info struct for options that cause the walk to be different and also looks for negative references. If none of these appear, then the merge-base algorithm is used instead.

The series is broken into three patches that could theoretically be squashed into a single patch.

  1. The first demonstrates the equivalence of these two commands via some tests.
  2. The second creates a performance test and documents the current behavior.
  3. The third updates the implementation and demonstrates the improvement in the case of no walk modifiers.

Thanks,
-Stolee

cc: gitster@pobox.com
cc: j6t@kdbg.org

Add a test that verifies the 'git rev-list --maximal-only' option
produces the same set of commits as 'git merge-base --independent'. This
equivalence was noted when the feature was first created, but we are
about to update the implementation to use a common algorithm in this
case where the user intention is identical.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add a performance test that compares 'git rev-list --maximal-only'
against 'git merge-base --independent'. These two commands are asking
essentially the same thing, but the rev-list implementation is more
generic and hence slower. These performance tests will demonstrate that
in the current state and also be used to show the equivalence in the
future.

We also add a case with '--since' to force the generic walk logic for
rev-list even when we make that future change to use the merge-base
algorithm on a simple walk.

When run on my copy of git.git, I see these results:

  Test                                      HEAD
  ----------------------------------------------
  6011.2: merge-base --independent          0.03
  6011.3: rev-list --maximal-only           0.06
  6011.4: rev-list --maximal-only --since   0.06

These numbers are low, but the --independent calculation is interesting
due to having a lot of local branches that are actually independent.

Running the same test on a fresh clone of the Linux kernel repository
shows a larger difference between the algorithms, especially because the
--independent algorithm is extremely fast when there are no independent
references selected:

  Test                                      HEAD
  ----------------------------------------------
  6011.2: merge-base --independent          0.00
  6011.3: rev-list --maximal-only           0.70
  6011.4: rev-list --maximal-only --since   0.70

Signed-off-by: Derrick Stolee <stolee@gmail.com>
The 'git rev-list --maximal-only' option filters the output to only
independent commits. A commit is independent if it is not reachable from
other listed commits. Currently this is implemented by doing a full
revision walk and marking parents with CHILD_VISITED to skip non-maximal
commits.

The 'git merge-base --independent' command computes the same result
using reduce_heads(), which uses the more efficient remove_redundant()
algorithm. This is significantly faster because it avoids walking the
entire commit graph.

Add a fast path in rev-list that detects when --maximal-only is the only
interesting option and all input commits are positive (no revision
ranges). In this case, use reduce_heads() directly instead of doing a
full revision walk.

In order to preserve the rest of the output filtering, this computation
is done opportunistically in a new prepare_maximal_independent() method
when possible. If successful, it populates revs->commits with the list
of independent commits and set revs->no_walk to prevent any other walk
from occurring. This allows us to have any custom output be handled
using the existing output code hidden inside
traverse_commit_list_filtered(). A new test is added to demonstrate that
this output is preserved.

The fast path is only used when no other flags complicate the walk or
output format: no UNINTERESTING commits, no limiting options (max-count,
age filters, path filters, grep filters), no output formatting beyond
plain OIDs, and no object listing flags.

Running the p6011 performance test for my copy of git.git, I see the
following improvement with this change:

  Test                                     HEAD~1  HEAD
  ------------------------------------------------------------
  6011.2: merge-base --independent          0.03   0.03 +0.0%
  6011.3: rev-list --maximal-only           0.06   0.03 -50.0%
  6011.4: rev-list --maximal-only --since   0.06   0.06 +0.0%

And for a fresh clone of the Linux kernel repository, I see:

  Test                                     HEAD~1  HEAD
  ------------------------------------------------------------
  6011.2: merge-base --independent          0.00   0.00 =
  6011.3: rev-list --maximal-only           0.70   0.00 -100.0%
  6011.4: rev-list --maximal-only --since   0.70   0.70 +0.0%

In both cases, the performance is indeed matching the behavior of 'git
merge-base --independent', as expected.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
@derrickstolee
Copy link
Copy Markdown
Author

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 6, 2026

Submitted as pull.2082.git.1775482048.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2082/derrickstolee/maximal-faster-v1

To fetch this version to local tag pr-2082/derrickstolee/maximal-faster-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2082/derrickstolee/maximal-faster-v1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant