Skip to content

Commit a626ce1

Browse files
committed
Merge branch 'bc/sha1-256-interop-01' into seen
The beginning of SHA1-SHA256 interoperability work. Comments? * bc/sha1-256-interop-01: fixup! docs: add documentation for loose objects t: add a prerequisite for a compatibility hash Allow specifying compatibility hash fsck: consider gpgsig headers expected in tags rev-parse: allow printing compatibility hash docs: add documentation for loose objects docs: improve ambiguous areas of pack format documentation docs: reflect actual double signature for tags docs: update offset order for pack index v3 docs: update pack index v3 format
2 parents dd5735a + 1883ba6 commit a626ce1

13 files changed

Lines changed: 239 additions & 29 deletions

Documentation/fsck-msgids.adoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
`badFilemode`::
1111
(INFO) A tree contains a bad filemode entry.
1212

13+
`badGpgsig`::
14+
(ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header.
15+
16+
`badHeaderContinuation`::
17+
(ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated.
18+
1319
`badName`::
1420
(ERROR) An author/committer name is empty.
1521

Documentation/git-rev-parse.adoc

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -324,11 +324,12 @@ The following options are unaffected by `--path-format`:
324324
path of the current directory relative to the top-level
325325
directory.
326326

327-
--show-object-format[=(storage|input|output)]::
328-
Show the object format (hash algorithm) used for the repository
329-
for storage inside the `.git` directory, input, or output. For
330-
input, multiple algorithms may be printed, space-separated.
331-
If not specified, the default is "storage".
327+
--show-object-format[=(storage|input|output|compat)]::
328+
Show the object format (hash algorithm) used for the repository for storage
329+
inside the `.git` directory, input, output, or compatibility. For input,
330+
multiple algorithms may be printed, space-separated. If `compat` is
331+
requested and no compatibility algorithm is enabled, prints an empty line. If
332+
not specified, the default is "storage".
332333

333334
--show-ref-format::
334335
Show the reference storage format used for the repository.

Documentation/gitformat-loose.adoc

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
gitformat-loose(5)
2+
==================
3+
4+
NAME
5+
----
6+
gitformat-loose - Git loose object format
7+
8+
9+
SYNOPSIS
10+
--------
11+
[verse]
12+
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
13+
$GIT_DIR/objects/loose-object-idx
14+
$GIT_DIR/objects/loose-map/map-*.map
15+
16+
DESCRIPTION
17+
-----------
18+
19+
Loose objects are how Git initially stores most of its primary repository data.
20+
Over the lifetime of a repository, objects are usually written as loose objects
21+
initially and then converted into packs.
22+
23+
== Loose objects
24+
25+
Each loose object contains a prefix, followed immediately by the data of the
26+
object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`,
27+
`tree`, `commit`, or `tag` and `size` is the size of the data (without the
28+
prefix) as a decimal integer expressed in ASCII.
29+
30+
The entire contents, prefix and data concatenated, is then compressed with zlib
31+
and the compressed data is stored in the file. The object ID of the object is
32+
the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data.
33+
34+
The file for the loose object is stored under the `objects` directory, with the
35+
first two hex characters of the object ID being the directory and the remaining
36+
characters being the file name.
37+
38+
As an example, the empty tree contains the data (when uncompressed) `tree 0\0`
39+
and, in a SHA-256 repository, would have the object ID
40+
`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be
41+
stored under
42+
`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`.
43+
44+
Similarly, a blob containing the contents `abc` would have the uncompressed
45+
data of `blob 3\0abc`.
46+
47+
GIT
48+
---
49+
Part of the linkgit:git[1] suite

Documentation/gitformat-pack.adoc

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ In a repository using the traditional SHA-1, pack checksums, index checksums,
3232
and object IDs (object names) mentioned below are all computed using SHA-1.
3333
Similarly, in SHA-256 repositories, these values are computed using SHA-256.
3434
35+
CRC32 checksums are always computed over the entire packed object, including
36+
the header (n-byte type and length); the base object name or offset, if any;
37+
and the entire compressed object. The CRC32 algorithm used is that of zlib.
38+
3539
== pack-*.pack files have the following format:
3640

3741
- A header appears at the beginning and consists of the following:
@@ -80,6 +84,15 @@ Valid object types are:
8084

8185
Type 5 is reserved for future expansion. Type 0 is invalid.
8286

87+
=== Object encoding
88+
89+
Unlike loose objects, packed objects do not have a prefix containing the type,
90+
size, and a NUL byte. These are not necessary because they can be determined by
91+
the n-byte type and length that prefixes the data and so they are omitted from
92+
the compressed and deltified data.
93+
94+
The computation of the object ID still uses this prefix, however.
95+
8396
=== Size encoding
8497

8598
This document uses the following "size encoding" of non-negative
@@ -92,6 +105,11 @@ values are more significant.
92105
This size encoding should not be confused with the "offset encoding",
93106
which is also used in this document.
94107

108+
When encoding the size of an undeltified object in a pack, the size is that of
109+
the uncompressed raw object. For deltified objects, it is the size of the
110+
uncompressed delta. The base object name or offset is not included in the size
111+
computation.
112+
95113
=== Deltified representation
96114

97115
Conceptually there are only four object types: commit, tree, tag and

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ manpages = {
173173
'gitformat-chunk.adoc' : 5,
174174
'gitformat-commit-graph.adoc' : 5,
175175
'gitformat-index.adoc' : 5,
176+
'gitformat-loose.adoc' : 5,
176177
'gitformat-pack.adoc' : 5,
177178
'gitformat-signature.adoc' : 5,
178179
'githooks.adoc' : 5,

Documentation/technical/hash-function-transition.adoc

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -227,9 +227,9 @@ network byte order):
227227
** 4-byte length in bytes of shortened object names. This is the
228228
shortest possible length needed to make names in the shortened
229229
object name table unambiguous.
230-
** 4-byte integer, recording where tables relating to this format
230+
** 8-byte integer, recording where tables relating to this format
231231
are stored in this index file, as an offset from the beginning.
232-
* 4-byte offset to the trailer from the beginning of this file.
232+
* 8-byte offset to the trailer from the beginning of this file.
233233
* Zero or more additional key/value pairs (4-byte key, 4-byte
234234
value). Only one key is supported: 'PSRC'. See the "Loose objects
235235
and unreachable objects" section for supported values and how this
@@ -260,12 +260,10 @@ network byte order):
260260
compressed data to be copied directly from pack to pack during
261261
repacking without undetected data corruption.
262262

263-
* A table of 4-byte offset values. For an object in the table of
264-
sorted shortened object names, the value at the corresponding
265-
index in this table indicates where that object can be found in
266-
the pack file. These are usually 31-bit pack file offsets, but
267-
large offsets are encoded as an index into the next table with the
268-
most significant bit set.
263+
* A table of 4-byte offset values. The index of this table in pack order
264+
indicates where that object can be found in the pack file. These are
265+
usually 31-bit pack file offsets, but large offsets are encoded as
266+
an index into the next table with the most significant bit set.
269267

270268
* A table of 8-byte offset entries (empty for pack files less than
271269
2 GiB). Pack files are organized with heavily used objects toward
@@ -276,10 +274,10 @@ network byte order):
276274
up to and not including the table of CRC32 values.
277275
- Zero or more NUL bytes.
278276
- The trailer consists of the following:
279-
* A copy of the 20-byte SHA-256 checksum at the end of the
277+
* A copy of the 32-byte SHA-256 checksum at the end of the
280278
corresponding packfile.
281279

282-
* 20-byte SHA-256 checksum of all of the above.
280+
* 32-byte SHA-256 checksum of all of the above.
283281

284282
Loose object index
285283
~~~~~~~~~~~~~~~~~~
@@ -427,17 +425,19 @@ ordinary unsigned commit.
427425

428426
Signed Tags
429427
~~~~~~~~~~~
430-
We add a new field "gpgsig-sha256" to the tag object format to allow
431-
signing tags without relying on SHA-1. Its signed payload is the
432-
SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP
433-
SIGNATURE-----" delimited in-body signature removed.
434-
435-
This means tags can be signed
436-
437-
1. using SHA-1 only, as in existing signed tag objects
438-
2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body
439-
signature.
440-
3. using only SHA-256, by only using the gpgsig-sha256 field.
428+
We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to
429+
allow signing tags in both formats. The in-body signature is used for the
430+
signature in the current hash algorithm and the header is used for the
431+
signature in the other algorithm. Thus, a dual-signature tag will contain both
432+
an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an
433+
object or both an in-body signature and a gpgsig header for the SHA-256 format
434+
of and object.
435+
436+
The signed payload of the tag is the content of the tag in the current
437+
algorithm with both its gpgsig and gpgsig-sha256 fields and
438+
"-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed.
439+
440+
This means tags can be signed using one or both algorithms.
441441

442442
Mergetag embedding
443443
~~~~~~~~~~~~~~~~~~

builtin/rev-parse.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1107,11 +1107,20 @@ int cmd_rev_parse(int argc,
11071107
const char *val = arg ? arg : "storage";
11081108

11091109
if (strcmp(val, "storage") &&
1110+
strcmp(val, "compat") &&
11101111
strcmp(val, "input") &&
11111112
strcmp(val, "output"))
11121113
die(_("unknown mode for --show-object-format: %s"),
11131114
arg);
1114-
puts(the_hash_algo->name);
1115+
1116+
if (!strcmp(val, "compat")) {
1117+
if (the_repository->compat_hash_algo)
1118+
puts(the_repository->compat_hash_algo->name);
1119+
else
1120+
putchar('\n');
1121+
} else {
1122+
puts(the_hash_algo->name);
1123+
}
11151124
continue;
11161125
}
11171126
if (!strcmp(arg, "--show-ref-format")) {

fsck.c

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1067,6 +1067,24 @@ int fsck_tag_standalone(const struct object_id *oid, const char *buffer,
10671067
else
10681068
ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
10691069

1070+
if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) {
1071+
eol = memchr(buffer, '\n', buffer_end - buffer);
1072+
if (!eol) {
1073+
ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line");
1074+
goto done;
1075+
}
1076+
buffer = eol + 1;
1077+
1078+
while (buffer < buffer_end && starts_with(buffer, " ")) {
1079+
eol = memchr(buffer, '\n', buffer_end - buffer);
1080+
if (!eol) {
1081+
ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line");
1082+
goto done;
1083+
}
1084+
buffer = eol + 1;
1085+
}
1086+
}
1087+
10701088
if (buffer < buffer_end && !starts_with(buffer, "\n")) {
10711089
/*
10721090
* The verify_headers() check will allow

fsck.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,11 @@ enum fsck_msg_type {
2525
FUNC(NUL_IN_HEADER, FATAL) \
2626
FUNC(UNTERMINATED_HEADER, FATAL) \
2727
/* errors */ \
28+
FUNC(BAD_HEADER_CONTINUATION, ERROR) \
2829
FUNC(BAD_DATE, ERROR) \
2930
FUNC(BAD_DATE_OVERFLOW, ERROR) \
3031
FUNC(BAD_EMAIL, ERROR) \
32+
FUNC(BAD_GPGSIG, ERROR) \
3133
FUNC(BAD_NAME, ERROR) \
3234
FUNC(BAD_OBJECT_SHA1, ERROR) \
3335
FUNC(BAD_PACKED_REF_ENTRY, ERROR) \

t/t1450-fsck.sh

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,60 @@ test_expect_success 'tag with NUL in header' '
454454
test_grep "error in tag $tag.*unterminated header: NUL at offset" out
455455
'
456456

457+
test_expect_success 'tag accepts gpgsig header even if not validly signed' '
458+
test_oid_cache <<-\EOF &&
459+
header sha1:gpgsig-sha256
460+
header sha256:gpgsig
461+
EOF
462+
header=$(test_oid header) &&
463+
sha=$(git rev-parse HEAD) &&
464+
cat >good-tag <<-EOF &&
465+
object $sha
466+
type commit
467+
tag good
468+
tagger T A Gger <tagger@example.com> 1234567890 -0000
469+
$header -----BEGIN PGP SIGNATURE-----
470+
Not a valid signature
471+
-----END PGP SIGNATURE-----
472+
473+
This is a good tag.
474+
EOF
475+
476+
tag=$(git hash-object --literally -t tag -w --stdin <good-tag) &&
477+
test_when_finished "remove_object $tag" &&
478+
git update-ref refs/tags/good $tag &&
479+
test_when_finished "git update-ref -d refs/tags/good" &&
480+
git -c fsck.extraHeaderEntry=error fsck --tags
481+
'
482+
483+
test_expect_success 'tag rejects invalid headers' '
484+
test_oid_cache <<-\EOF &&
485+
header sha1:gpgsig-sha256
486+
header sha256:gpgsig
487+
EOF
488+
header=$(test_oid header) &&
489+
sha=$(git rev-parse HEAD) &&
490+
cat >bad-tag <<-EOF &&
491+
object $sha
492+
type commit
493+
tag good
494+
tagger T A Gger <tagger@example.com> 1234567890 -0000
495+
$header -----BEGIN PGP SIGNATURE-----
496+
Not a valid signature
497+
-----END PGP SIGNATURE-----
498+
junk
499+
500+
This is a bad tag with junk at the end of the headers.
501+
EOF
502+
503+
tag=$(git hash-object --literally -t tag -w --stdin <bad-tag) &&
504+
test_when_finished "remove_object $tag" &&
505+
git update-ref refs/tags/bad $tag &&
506+
test_when_finished "git update-ref -d refs/tags/bad" &&
507+
test_must_fail git -c fsck.extraHeaderEntry=error fsck --tags 2>out &&
508+
test_grep "error in tag $tag.*invalid format - extra header" out
509+
'
510+
457511
test_expect_success 'cleaned up' '
458512
git fsck >actual 2>&1 &&
459513
test_must_be_empty actual

0 commit comments

Comments
 (0)