Skip to content

Commit c30871b

Browse files
Colin Stagnergitster
authored andcommitted
contrib/subtree: reduce recursion during split
On Debian-alikes, POSIX sh has a hardcoded recursion depth of 1000. This limit operates like bash's `$FUNCNEST` [1], but it does not actually respect `$FUNCNEST`. This is non-standard behavior. On other distros, the sh recursion depth is limited only by the available stack size. With certain history graphs, subtree splits are recursive—with one recursion per commit. Attempting to split complex repos that have thousands of commits, like [2], may fail on these distros. Reduce the amount of recursion required by eagerly discovering the complete range of commits to process. The recursion is a side-effect of the rejoin-finder in `find_existing_splits`. Rejoin mode, as in git subtree split --rejoin -b hax main ... improves the speed of later splits by merging the split history back into `main`. This gives the splitting algorithm a stopping point. The rejoin maps one commit on `main` to one split commit on `hax`. If we encounter this commit, we know that it maps to `hax`. But this is only a single point in the history. Many splits require history from before the rejoin. See patch content for examples. If pre-rejoin history is required, `check_parents` recursively discovers each individual parent, with one recursion per commit. The recursion deepens the entire tree, even if an older rejoin is available. This quickly overwhelms the Debian sh stack. Instead of recursively processing each commit, process *all* the commits back to the next obvious starting point: i.e., either the next-oldest --rejoin or the beginning of history. This is where the recursion is likely to stop anyway. While this still requires recursion, it is *considerably* less recursive. [1]: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#index-FUNCNEST [2]: https://github.com/christian-heusel/aur.git Signed-off-by: Colin Stagner <ask+git@howdoi.land> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 3b3ace4 commit c30871b

File tree

1 file changed

+54
-2
lines changed

1 file changed

+54
-2
lines changed

contrib/subtree/git-subtree.sh

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -315,15 +315,67 @@ cache_miss () {
315315
}
316316

317317
# Usage: check_parents [REVS...]
318+
#
319+
# During a split, check that every commit in REVS has already been
320+
# processed via `process_split_commit`. If not, deepen the history
321+
# until it is.
322+
#
323+
# Commits authored by `subtree split` have to be created in the
324+
# same order as every other git commit: ancestor-first, with new
325+
# commits building on old commits. The traversal order normally
326+
# ensures this is the case, but it also excludes --rejoins commits
327+
# by default.
328+
#
329+
# The --rejoin tells us, "this mainline commit is equivalent to
330+
# this split commit." The relationship is only known for that
331+
# exact commit---and not before or after it. Frequently, commits
332+
# prior to a rejoin are not needed... but, just as often, they
333+
# are! Consider this history graph:
334+
#
335+
# --D---
336+
# / \
337+
# A--B--C--R--X--Y main
338+
# / /
339+
# a--b--c / split
340+
# \ /
341+
# --e--/
342+
#
343+
# The main branch has commits A, B, and C. main is split into
344+
# commits a, b, and c. The split history is rejoined at R.
345+
#
346+
# There are at least two cases where we might need the A-B-C
347+
# history that is prior to R:
348+
#
349+
# 1. Commit D is based on history prior to R, but
350+
# it isn't merged into mainline until after R.
351+
#
352+
# 2. Commit e is based on old split history. It is merged
353+
# back into mainline with a subtree merge. Again, this
354+
# happens after R.
355+
#
356+
# check_parents detects these cases and deepens the history
357+
# to the next available rejoin.
318358
check_parents () {
319359
missed=$(cache_miss "$@") || exit $?
320360
local indent=$(($indent + 1))
321361
for miss in $missed
322362
do
323363
if ! test -r "$cachedir/notree/$miss"
324364
then
325-
debug "incorrect order: $miss"
326-
process_split_commit "$miss" ""
365+
debug "found commit excluded by --rejoin: $miss. skipping to the next --rejoin..."
366+
unrevs="$(find_existing_splits "$dir" "$miss" "$repository")" || exit 1
367+
368+
find_commits_to_split "$miss" "$unrevs" |
369+
while read -r rev parents
370+
do
371+
process_split_commit "$rev" "$parents"
372+
done
373+
374+
if ! test -r "$cachedir/$miss" &&
375+
! test -r "$cachedir/notree/$miss"
376+
then
377+
die "failed to deepen history at $miss"
378+
fi
327379
fi
328380
done
329381
}

0 commit comments

Comments
 (0)