Skip to content

Commit c759896

Browse files
pks-tgitster
authored andcommitted
builtin/maintenance: introduce "geometric" strategy
We have two different repacking strategies in Git: - The "gc" strategy uses git-gc(1). - The "incremental" strategy uses multi-pack indices and `git multi-pack-index repack` to merge together smaller packfiles as determined by a specific batch size. The former strategy is our old and trusted default, whereas the latter has historically been used for our scheduled maintenance. But both strategies have their shortcomings: - The "gc" strategy performs regular all-into-one repacks. Furthermore it is rather inflexible, as it is not easily possible for a user to enable or disable specific subtasks. - The "incremental" strategy is not a full replacement for the "gc" strategy as it doesn't know to prune stale data. So today, we don't have a strategy that is well-suited for large repos while being a full replacement for the "gc" strategy. Introduce a new "geometric" strategy that aims to fill this gap. This strategy invokes all the usual cleanup tasks that git-gc(1) does like pruning reflogs and rerere caches as well as stale worktrees. But where it differs from both the "gc" and "incremental" strategy is that it uses our geometric repacking infrastructure exposed by git-repack(1) to repack packfiles. The advantage of geometric repacking is that we only need to perform an all-into-one repack when the object count in a repo has grown significantly. One downside of this strategy is that pruning of unreferenced objects is not going to happen regularly anymore. Every geometric repack knows to soak up all loose objects regardless of their reachability, and merging two or more packs doesn't consider reachability, either. Consequently, the number of unreachable objects will grow over time. This is remedied by doing an all-into-one repack instead of a geometric repack whenever we determine that the geometric repack would end up merging all packfiles anyway. This all-into-one repack then performs our usual reachability checks and writes unreachable objects into a cruft pack. As cruft packs won't ever be merged during geometric repacks we can thus phase out these objects over time. Of course, this still means that we retain unreachable objects for far longer than with the "gc" strategy. But the maintenance strategy is intended especially for large repositories, where the basic assumption is that the set of unreachable objects will be significantly dwarfed by the number of reachable objects. If this assumption is ever proven to be too disadvantageous we could for example introduce a time-based strategy: if the largest packfile has not been touched for longer than $T, we perform an all-into-one repack. But for now, such a mechanism is deferred into the future as it is not clear yet whether it is needed in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent b67da71 commit c759896

3 files changed

Lines changed: 47 additions & 1 deletion

File tree

Documentation/config/maintenance.adoc

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@ The possible strategies are:
3232
strategy for scheduled maintenance.
3333
* `gc`: This strategy runs the `gc` task. This is the default strategy for
3434
manual maintenance.
35+
* `geometric`: This strategy performs geometric repacking of packfiles and
36+
keeps auxiliary data structures up-to-date. The strategy expires data in the
37+
reflog and removes worktrees that cannot be located anymore. When the
38+
geometric repacking strategy would decide to do an all-into-one repack, then
39+
the strategy generates a cruft pack for all unreachable objects. Objects that
40+
are already part of a cruft pack will be expired.
41+
+
42+
This repacking strategy is a full replacement for the `gc` strategy and is
43+
recommended for large repositories.
3544
* `incremental`: This setting optimizes for performing small maintenance
3645
activities that do not delete any data. This does not schedule the `gc`
3746
task, but runs the `prefetch` and `commit-graph` tasks hourly, the

builtin/gc.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1878,12 +1878,31 @@ static const struct maintenance_strategy incremental_strategy = {
18781878
},
18791879
};
18801880

1881+
static const struct maintenance_strategy geometric_strategy = {
1882+
.tasks = {
1883+
[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1884+
[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
1885+
[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1886+
[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
1887+
[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1888+
[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
1889+
[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1890+
[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
1891+
[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1892+
[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
1893+
[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
1894+
[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
1895+
},
1896+
};
1897+
18811898
static struct maintenance_strategy parse_maintenance_strategy(const char *name)
18821899
{
18831900
if (!strcasecmp(name, "incremental"))
18841901
return incremental_strategy;
18851902
if (!strcasecmp(name, "gc"))
18861903
return gc_strategy;
1904+
if (!strcasecmp(name, "geometric"))
1905+
return geometric_strategy;
18871906
die(_("unknown maintenance strategy: '%s'"), name);
18881907
}
18891908

t/t7900-maintenance.sh

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -931,11 +931,29 @@ test_expect_success 'maintenance.strategy is respected' '
931931
git gc --quiet --no-detach --skip-foreground-tasks
932932
EOF
933933
934-
test_strategy gc --schedule=weekly <<-\EOF
934+
test_strategy gc --schedule=weekly <<-\EOF &&
935935
git pack-refs --all --prune
936936
git reflog expire --all
937937
git gc --quiet --no-detach --skip-foreground-tasks
938938
EOF
939+
940+
test_strategy geometric <<-\EOF &&
941+
git pack-refs --all --prune
942+
git reflog expire --all
943+
git repack -d -l --geometric=2 --quiet --write-midx
944+
git commit-graph write --split --reachable --no-progress
945+
git worktree prune --expire 3.months.ago
946+
git rerere gc
947+
EOF
948+
949+
test_strategy geometric --schedule=weekly <<-\EOF
950+
git pack-refs --all --prune
951+
git reflog expire --all
952+
git repack -d -l --geometric=2 --quiet --write-midx
953+
git commit-graph write --split --reachable --no-progress
954+
git worktree prune --expire 3.months.ago
955+
git rerere gc
956+
EOF
939957
)
940958
'
941959

0 commit comments

Comments
 (0)