Skip to content

Commit 0318ea1

Browse files
pks-tgitster
authored andcommitted
commit-graph: fix writing generations with dates exceeding 34 bits
The `timestamp_t` type is declared as `uintmax_t` and thus typically has 64 bits of precision. Usually, the full precision of such dates is not required: it would be comforting to know that Git is still around in millions of years, but all in all the chance is rather low. We abuse this fact in the commit-graph: instead of storing the full 64 bits of precision, committer dates only store 34 bits. This is still plenty of headroom, as it means that we can represent dates until year 2514. Commits which are dated beyond that year will simply get a date whose remaining bits are masked. The result of this is somewhat curious: the committer date will be different depending on whether a commit gets parsed via the commit-graph or via the object database. This isn't really too much of an issue in general though, as we don't typically use the date parsed from the commit-graph in user-facing output. But with 024b4c9 (commit: make `repo_parse_commit_no_graph()` more robust, 2026-02-16) it started to become a problem when writing the commit-graph itself. This commit changed `repo_parse_commit_no_graph()` so that we re-parse the commit via the object database in case it was already parsed beforehand via the commit-graph. The consequence is that we may now act with two different commit dates at different stages: - Initially, we use the 34-bit precision timestamp when writing the chunk generation data. We thus correctly compute the offsets relative to the on-disk timestamp here. - Later, when writing the overflow data, we may end up with the full-precision timestamp. When the date is larger than 34 bits the result of this is an underflow when computing the offset. This causes a mismatch in the number of generation data overflow records we want to write, and that ultimately causes Git to die. Introduce a new helper function that computes the generation offset for a commit while correctly masking the date to 34 bits. This makes the previously-implicit assumptions about the commit date precision explicit and thus hopefully less fragile going forward. Adapt sites that compute the offset to use the function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent bb5da75 commit 0318ea1

File tree

2 files changed

+48
-3
lines changed

2 files changed

+48
-3
lines changed

commit-graph.c

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1319,6 +1319,31 @@ static int write_graph_chunk_data(struct hashfile *f,
13191319
return 0;
13201320
}
13211321

1322+
/*
1323+
* Compute the generation offset between the commit date and its generation.
1324+
* This is what's ultimately stored as generation number in the commit graph.
1325+
*
1326+
* Note that the computation of the commit date is more involved than you might
1327+
* think. Instead of using the full commit date, we're in fact masking bits so
1328+
* that only the 34 lowest bits are considered. This results from the fact that
1329+
* commit graphs themselves only ever store 34 bits of the commit date
1330+
* themselves.
1331+
*
1332+
* This means that if we have a commit date that exceeds 34 bits we'll end up
1333+
* in situations where depending on whether the commit has been parsed from the
1334+
* object database or the commit graph we'll have different dates, where the
1335+
* ones parsed from the object database would have full 64 bit precision.
1336+
*
1337+
* But ultimately, we only ever want the offset to be relative to what we
1338+
* actually end up storing on disk, and hence we have to mask all the other
1339+
* bits.
1340+
*/
1341+
static timestamp_t compute_generation_offset(struct commit *c)
1342+
{
1343+
timestamp_t masked_date = c->date & (((timestamp_t) 1 << 34) - 1);
1344+
return commit_graph_data_at(c)->generation - masked_date;
1345+
}
1346+
13221347
static int write_graph_chunk_generation_data(struct hashfile *f,
13231348
void *data)
13241349
{
@@ -1329,7 +1354,7 @@ static int write_graph_chunk_generation_data(struct hashfile *f,
13291354
struct commit *c = ctx->commits.items[i];
13301355
timestamp_t offset;
13311356
repo_parse_commit(ctx->r, c);
1332-
offset = commit_graph_data_at(c)->generation - c->date;
1357+
offset = compute_generation_offset(c);
13331358
display_progress(ctx->progress, ++ctx->progress_cnt);
13341359

13351360
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
@@ -1350,7 +1375,7 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f,
13501375
int i;
13511376
for (i = 0; i < ctx->commits.nr; i++) {
13521377
struct commit *c = ctx->commits.items[i];
1353-
timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
1378+
timestamp_t offset = compute_generation_offset(c);
13541379
display_progress(ctx->progress, ++ctx->progress_cnt);
13551380

13561381
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
@@ -1733,7 +1758,7 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
17331758

17341759
for (i = 0; i < ctx->commits.nr; i++) {
17351760
struct commit *c = ctx->commits.items[i];
1736-
timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
1761+
timestamp_t offset = compute_generation_offset(c);
17371762
if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
17381763
ctx->num_generation_data_overflows++;
17391764
}

t/t5318-commit-graph.sh

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,26 @@ test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow ch
417417
test_cmp full/.git/objects/info/commit-graph commit-graph-upgraded
418418
'
419419

420+
test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'overflow chunk when replacing commit-graph' '
421+
test_when_finished "rm -rf repo" &&
422+
git init repo &&
423+
(
424+
cd repo &&
425+
cat >commit <<-EOF &&
426+
tree $(test_oid empty_tree)
427+
author Example <committer@example.com> 9223372036854775 +0000
428+
committer Example <committer@example.com> 9223372036854775 +0000
429+
430+
Weird commit date
431+
EOF
432+
commit_id=$(git hash-object -t commit -w commit) &&
433+
git reset --hard "$commit_id" &&
434+
git commit-graph write --reachable &&
435+
git commit-graph write --reachable --split=replace &&
436+
git log
437+
)
438+
'
439+
420440
# the verify tests below expect the commit-graph to contain
421441
# exactly the commits reachable from the commits/8 branch.
422442
# If the file changes the set of commits in the list, then the

0 commit comments

Comments
 (0)