Skip to content

Commit 79888bf

Browse files
ttaylorrgitster
authored andcommitted
pack-objects: support excluded-open packs with --stdin-packs
In cd846ba (pack-objects: introduce '--stdin-packs=follow', 2025-06-23), pack-objects learned to traverse through commits in included packs when using '--stdin-packs=follow', rescuing reachable objects from unlisted packs into the output. When we encounter a commit in an excluded pack during this rescuing phase we will traverse through its parents. But because we set `revs.no_kept_objects = 1`, commit simplification will prevent us from showing it via `get_revision()`. (In practice, `--stdin-packs=follow` walks commits down to the roots, but only opens up trees for ones that do not appear in an excluded pack.) But there are certain cases where we *do* need to see the parents of an object in an excluded pack. Namely, if an object is rescue-able, but only reachable from object(s) which appear in excluded packs, then commit simplification will exclude those commits from the object traversal, and we will never see a copy of that object, and thus not rescue it. This is what causes the failure in the previous commit during repacking. When performing a geometric repack, packs above the geometric split that weren't part of the previous MIDX (e.g., packs pushed directly into `$GIT_DIR/objects/pack`) may not have full object closure. When those packs are listed as excluded via the '^' marker, the reachability traversal encounters the sequence described above, and may miss objects which we expect to rescue with `--stdin-packs=follow`. Introduce a new "excluded-open" pack prefix, '!'. Like '^'-prefixed packs, objects from '!'-prefixed packs are excluded from the resulting pack. But unlike '^', commits in '!'-prefixed packs *are* used as starting points for the follow traversal, and the traversal does not treat them as a closure boundary. In order to distinguish excluded-closed from excluded-open packs during the traversal, introduce a new `pack_keep_in_core_open` bit on `struct packed_git`, along with a corresponding `KEPT_PACK_IN_CORE_OPEN` flag for the kept-pack cache. In `add_object_entry_from_pack()`, move the `want_object_in_pack()` check to *after* `add_pending_oid()`. This is necessary so that commits from excluded-open packs are added as traversal tips even though their objects won't appear in the output. As a consequence, the caller `for_each_object_in_pack()` will always provide a non-NULL 'p', hence we are able to drop the "if (p)" conditional. The `include_check` and `include_check_obj` callbacks on `rev_info` are used to halt the walk at closed-excluded packs, since objects behind a '^' boundary are guaranteed to have closure and need not be rescued. The following commit will make use of this new functionality within the repack layer to resolve the test failure demonstrated in the previous commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent 4199cce commit 79888bf

5 files changed

Lines changed: 213 additions & 32 deletions

File tree

Documentation/git-pack-objects.adoc

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,24 @@ base-name::
9494
included packs (those not beginning with `^`), excluding any
9595
objects listed in the excluded packs (beginning with `^`).
9696
+
97-
When `mode` is "follow", objects from packs not listed on stdin receive
98-
special treatment. Objects within unlisted packs will be included if
99-
those objects are (1) reachable from the included packs, and (2) not
100-
found in any excluded packs. This mode is useful, for example, to
101-
resurrect once-unreachable objects found in cruft packs to generate
102-
packs which are closed under reachability up to the boundary set by the
103-
excluded packs.
97+
When `mode` is "follow" packs may additionally be prefixed with `!`,
98+
indicating that they are excluded but not necessarily closed under
99+
reachability. In addition to objects in included packs, the resulting
100+
pack may include additional objects based on the following:
101+
+
102+
--
103+
* If any packs are marked with `!`, then objects reachable from such
104+
packs or included ones via objects outside of excluded-closed packs
105+
will be included. In this case, all `^` packs are treated as closed
106+
under reachability.
107+
* Otherwise (if there are no `!` packs), objects within unlisted packs
108+
will be included if those objects are (1) reachable from the
109+
included packs, and (2) not found in any excluded packs.
110+
--
111+
+
112+
This mode is useful, for example, to resurrect once-unreachable
113+
objects found in cruft packs to generate packs which are closed under
114+
reachability up to the boundary set by the excluded packs.
104115
+
105116
Incompatible with `--revs`, or options that imply `--revs` (such as
106117
`--all`), with the exception of `--unpacked`, which is compatible.

builtin/pack-objects.c

Lines changed: 86 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ static int have_non_local_packs;
217217
static int incremental;
218218
static int ignore_packed_keep_on_disk;
219219
static int ignore_packed_keep_in_core;
220+
static int ignore_packed_keep_in_core_open;
220221
static int ignore_packed_keep_in_core_has_cruft;
221222
static int allow_ofs_delta;
222223
static struct pack_idx_option pack_idx_opts;
@@ -1618,7 +1619,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
16181619
/*
16191620
* Then handle .keep first, as we have a fast(er) path there.
16201621
*/
1621-
if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
1622+
if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core ||
1623+
ignore_packed_keep_in_core_open) {
16221624
/*
16231625
* Set the flags for the kept-pack cache to be the ones we want
16241626
* to ignore.
@@ -1632,6 +1634,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
16321634
flags |= KEPT_PACK_ON_DISK;
16331635
if (ignore_packed_keep_in_core)
16341636
flags |= KEPT_PACK_IN_CORE;
1637+
if (ignore_packed_keep_in_core_open)
1638+
flags |= KEPT_PACK_IN_CORE_OPEN;
16351639

16361640
/*
16371641
* If the object is in a pack that we want to ignore, *and* we
@@ -1643,6 +1647,8 @@ static int want_found_object(const struct object_id *oid, int exclude,
16431647
return 0;
16441648
if (ignore_packed_keep_in_core && p->pack_keep_in_core)
16451649
return 0;
1650+
if (ignore_packed_keep_in_core_open && p->pack_keep_in_core_open)
1651+
return 0;
16461652
if (has_object_kept_pack(p->repo, oid, flags))
16471653
return 0;
16481654
} else {
@@ -3742,36 +3748,42 @@ static int add_object_entry_from_pack(const struct object_id *oid,
37423748
void *_data)
37433749
{
37443750
off_t ofs;
3751+
struct object_info oi = OBJECT_INFO_INIT;
37453752
enum object_type type = OBJ_NONE;
37463753

37473754
display_progress(progress_state, ++nr_seen);
37483755

37493756
if (have_duplicate_entry(oid, 0))
37503757
return 0;
37513758

3752-
ofs = nth_packed_object_offset(p, pos);
3753-
if (!want_object_in_pack(oid, 0, &p, &ofs))
3754-
return 0;
3755-
3756-
if (p) {
3757-
struct object_info oi = OBJECT_INFO_INIT;
3759+
stdin_packs_found_nr++;
37583760

3759-
oi.typep = &type;
3760-
if (packed_object_info(p, ofs, &oi) < 0) {
3761-
die(_("could not get type of object %s in pack %s"),
3762-
oid_to_hex(oid), p->pack_name);
3763-
} else if (type == OBJ_COMMIT) {
3764-
struct rev_info *revs = _data;
3765-
/*
3766-
* commits in included packs are used as starting points for the
3767-
* subsequent revision walk
3768-
*/
3769-
add_pending_oid(revs, NULL, oid, 0);
3770-
}
3761+
ofs = nth_packed_object_offset(p, pos);
37713762

3772-
stdin_packs_found_nr++;
3763+
oi.typep = &type;
3764+
if (packed_object_info(p, ofs, &oi) < 0) {
3765+
die(_("could not get type of object %s in pack %s"),
3766+
oid_to_hex(oid), p->pack_name);
3767+
} else if (type == OBJ_COMMIT) {
3768+
struct rev_info *revs = _data;
3769+
/*
3770+
* commits in included packs are used as starting points
3771+
* for the subsequent revision walk
3772+
*
3773+
* Note that we do want to walk through commits that are
3774+
* present in excluded-open ('!') packs to pick up any
3775+
* objects reachable from them not present in the
3776+
* excluded-closed ('^') packs.
3777+
*
3778+
* However, we'll only add those objects to the packing
3779+
* list after checking `want_object_in_pack()` below.
3780+
*/
3781+
add_pending_oid(revs, NULL, oid, 0);
37733782
}
37743783

3784+
if (!want_object_in_pack(oid, 0, &p, &ofs))
3785+
return 0;
3786+
37753787
create_object_entry(oid, type, 0, 0, 0, p, ofs);
37763788

37773789
return 0;
@@ -3838,11 +3850,23 @@ static int pack_mtime_cmp(const void *_a, const void *_b)
38383850
return 0;
38393851
}
38403852

3853+
static int stdin_packs_include_check_obj(struct object *obj, void *data UNUSED)
3854+
{
3855+
return !has_object_kept_pack(to_pack.repo, &obj->oid,
3856+
KEPT_PACK_IN_CORE);
3857+
}
3858+
3859+
static int stdin_packs_include_check(struct commit *commit, void *data)
3860+
{
3861+
return stdin_packs_include_check_obj((struct object *)commit, data);
3862+
}
3863+
38413864
struct stdin_pack_info {
38423865
struct packed_git *p;
38433866
enum {
38443867
STDIN_PACK_INCLUDE = (1<<0),
38453868
STDIN_PACK_EXCLUDE_CLOSED = (1<<1),
3869+
STDIN_PACK_EXCLUDE_OPEN = (1<<2),
38463870
} kind;
38473871
};
38483872

@@ -3872,7 +3896,19 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
38723896
for_each_string_list_item(item, &keys) {
38733897
struct stdin_pack_info *info = item->util;
38743898

3875-
if (info->kind & STDIN_PACK_INCLUDE)
3899+
if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
3900+
/*
3901+
* When open-excluded packs ("!") are present, stop
3902+
* the parent walk at closed-excluded ("^") packs.
3903+
* Objects behind a "^" boundary are guaranteed to
3904+
* have closure and should not be rescued.
3905+
*/
3906+
revs->include_check = stdin_packs_include_check;
3907+
revs->include_check_obj = stdin_packs_include_check_obj;
3908+
}
3909+
3910+
if ((info->kind & STDIN_PACK_INCLUDE) ||
3911+
(info->kind & STDIN_PACK_EXCLUDE_OPEN))
38763912
for_each_object_in_pack(info->p,
38773913
add_object_entry_from_pack,
38783914
revs,
@@ -3882,7 +3918,8 @@ static void stdin_packs_add_pack_entries(struct strmap *packs,
38823918
string_list_clear(&keys, 0);
38833919
}
38843920

3885-
static void stdin_packs_read_input(struct rev_info *revs)
3921+
static void stdin_packs_read_input(struct rev_info *revs,
3922+
enum stdin_packs_mode mode)
38863923
{
38873924
struct strbuf buf = STRBUF_INIT;
38883925
struct strmap packs = STRMAP_INIT;
@@ -3894,7 +3931,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
38943931

38953932
if (!*key)
38963933
continue;
3897-
if (*key == '^')
3934+
if (*key == '^' ||
3935+
(*key == '!' && mode == STDIN_PACKS_MODE_FOLLOW))
38983936
key++;
38993937

39003938
info = strmap_get(&packs, key);
@@ -3905,6 +3943,8 @@ static void stdin_packs_read_input(struct rev_info *revs)
39053943

39063944
if (*buf.buf == '^')
39073945
info->kind |= STDIN_PACK_EXCLUDE_CLOSED;
3946+
else if (*buf.buf == '!' && mode == STDIN_PACKS_MODE_FOLLOW)
3947+
info->kind |= STDIN_PACK_EXCLUDE_OPEN;
39083948
else
39093949
info->kind |= STDIN_PACK_INCLUDE;
39103950

@@ -3942,6 +3982,20 @@ static void stdin_packs_read_input(struct rev_info *revs)
39423982
p->pack_keep_in_core = 1;
39433983
}
39443984

3985+
if (info->kind & STDIN_PACK_EXCLUDE_OPEN) {
3986+
/*
3987+
* Marking excluded open packs as kept in-core
3988+
* (open) for the same reason as we marked
3989+
* exclude closed packs as kept in-core.
3990+
*
3991+
* Use a separate flag here to ensure we don't
3992+
* halt our traversal at these packs, since they
3993+
* are not guaranteed to have closure.
3994+
*
3995+
*/
3996+
p->pack_keep_in_core_open = 1;
3997+
}
3998+
39453999
info->p = p;
39464000
}
39474001

@@ -3985,7 +4039,15 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
39854039

39864040
/* avoids adding objects in excluded packs */
39874041
ignore_packed_keep_in_core = 1;
3988-
stdin_packs_read_input(&revs);
4042+
if (mode == STDIN_PACKS_MODE_FOLLOW) {
4043+
/*
4044+
* In '--stdin-packs=follow' mode, additionally ignore
4045+
* objects in excluded-open packs to prevent them from
4046+
* appearing in the resulting pack.
4047+
*/
4048+
ignore_packed_keep_in_core_open = 1;
4049+
}
4050+
stdin_packs_read_input(&revs, mode);
39894051
if (rev_list_unpacked)
39904052
add_unreachable_loose_objects(&revs);
39914053

packfile.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2246,7 +2246,8 @@ struct packed_git **packfile_store_get_kept_pack_cache(struct packfile_store *st
22462246
struct packed_git *p = e->pack;
22472247

22482248
if ((p->pack_keep && (flags & KEPT_PACK_ON_DISK)) ||
2249-
(p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE))) {
2249+
(p->pack_keep_in_core && (flags & KEPT_PACK_IN_CORE)) ||
2250+
(p->pack_keep_in_core_open && (flags & KEPT_PACK_IN_CORE_OPEN))) {
22502251
ALLOC_GROW(packs, nr + 1, alloc);
22512252
packs[nr++] = p;
22522253
}

packfile.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ struct packed_git {
2828
unsigned pack_local:1,
2929
pack_keep:1,
3030
pack_keep_in_core:1,
31+
pack_keep_in_core_open:1,
3132
freshened:1,
3233
do_not_close:1,
3334
pack_promisor:1,
@@ -266,6 +267,7 @@ int packfile_store_freshen_object(struct packfile_store *store,
266267
enum kept_pack_type {
267268
KEPT_PACK_ON_DISK = (1 << 0),
268269
KEPT_PACK_IN_CORE = (1 << 1),
270+
KEPT_PACK_IN_CORE_OPEN = (1 << 2),
269271
};
270272

271273
/*

t/t5331-pack-objects-stdin.sh

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,4 +415,109 @@ test_expect_success '--stdin-packs=follow tolerates missing commits' '
415415
stdin_packs__follow_with_only HEAD HEAD^{tree}
416416
'
417417

418+
test_expect_success '--stdin-packs=follow with open-excluded packs' '
419+
test_when_finished "rm -fr repo" &&
420+
421+
git init repo &&
422+
(
423+
cd repo &&
424+
git config set maintenance.auto false &&
425+
426+
git branch -M main &&
427+
428+
# Create the following commit structure:
429+
#
430+
# A <-- B <-- D (main)
431+
# ^
432+
# \
433+
# C (other)
434+
test_commit A &&
435+
test_commit B &&
436+
git checkout -B other &&
437+
test_commit C &&
438+
git checkout main &&
439+
test_commit D &&
440+
441+
A="$(echo A | git pack-objects --revs $packdir/pack)" &&
442+
B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
443+
C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
444+
D="$(echo B..D | git pack-objects --revs $packdir/pack)" &&
445+
446+
C_ONLY="$(git rev-parse other | git pack-objects $packdir/pack)" &&
447+
448+
git prune-packed &&
449+
450+
# Create a pack using --stdin-packs=follow where:
451+
#
452+
# - pack D is included,
453+
# - pack C_ONLY is excluded, but open,
454+
# - pack B is excluded, but closed, and
455+
# - packs A and C are unknown
456+
#
457+
# The resulting pack should therefore contain:
458+
#
459+
# - objects from the included pack D,
460+
# - A.t (rescued via D^{tree}), and
461+
# - C^{tree} and C.t (rescued via pack C_ONLY)
462+
#
463+
# , but should omit:
464+
#
465+
# - C (excluded via C_ONLY),
466+
# - objects from pack B (trivially excluded-closed)
467+
# - A and A^{tree} (ancestors of B)
468+
P=$(git pack-objects --stdin-packs=follow $packdir/pack <<-EOF
469+
pack-$D.pack
470+
!pack-$C_ONLY.pack
471+
^pack-$B.pack
472+
EOF
473+
) &&
474+
475+
{
476+
objects_in_packs $D &&
477+
git rev-parse A:A.t "C^{tree}" C:C.t
478+
} >expect.raw &&
479+
sort expect.raw >expect &&
480+
481+
objects_in_packs $P >actual &&
482+
test_cmp expect actual
483+
)
484+
'
485+
486+
test_expect_success '--stdin-packs with !-delimited pack without follow' '
487+
test_when_finished "rm -fr repo" &&
488+
489+
git init repo &&
490+
(
491+
test_commit A &&
492+
test_commit B &&
493+
test_commit C &&
494+
495+
A="$(echo A | git pack-objects --revs $packdir/pack)" &&
496+
B="$(echo A..B | git pack-objects --revs $packdir/pack)" &&
497+
C="$(echo B..C | git pack-objects --revs $packdir/pack)" &&
498+
499+
cat >in <<-EOF &&
500+
!pack-$A.pack
501+
pack-$B.pack
502+
pack-$C.pack
503+
EOF
504+
505+
# Without --stdin-packs=follow, we treat the first
506+
# line of input as a literal packfile name, and thus
507+
# expect pack-objects to complain of a missing pack
508+
test_must_fail git pack-objects --stdin-packs --stdout \
509+
>/dev/null <in 2>err &&
510+
test_grep "could not find pack .!pack-$A.pack." err &&
511+
512+
# With --stdin-packs=follow, we treat the second line
513+
# of input as indicating pack-$A.pack is an excluded
514+
# open pack, and thus expect pack-objects to succeed
515+
P=$(git pack-objects --stdin-packs=follow $packdir/pack <in) &&
516+
517+
objects_in_packs $B $C >expect &&
518+
objects_in_packs $P >actual &&
519+
test_cmp expect actual
520+
)
521+
'
522+
418523
test_done

0 commit comments

Comments
 (0)