You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
normative: draft the CBOR subset in canonical-encoding.md
Pin the structural canonical-CBOR rules ahead of the value domain: the
permitted major types, the ban on tags and indefinite lengths, the minimal
head rule, single-item framing, and the decode-depth bound (Section 7.3).
Cross-reference the existing reject/ vectors that already exercise them, and
defer the map-key type and ordering rules to their own section. Drop the
subset bullet from the to-draft list (now four).
Signed-off-by: Chris Raynor <chris@raynor.tech>
Copy file name to clipboardExpand all lines: canonical-encoding.md
+42-2Lines changed: 42 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,54 @@ The encoding is deterministic CBOR (RFC 8949). This document restricts CBOR to a
10
10
11
11
## Scope of this draft
12
12
13
-
This draft pins the **value domain**: the scalar values a canonical artifact may carry and how each reaches its single byte form. Four sections remain to be drafted, and the conformance vectors lead each:
13
+
This draft pins the **CBOR subset** and the **value domain**: the structural rules every canonical artifact obeys, and the scalar values it may carry with the single byte form each reaches. Four sections remain to be drafted, and the conformance vectors lead each:
14
14
15
-
- the permitted CBOR subset (the major types allowed, the ban on tags and indefinite lengths, the minimal-length rules), of which the existing `reject/` vectors already pin part;
16
15
- the map-key rules (single-type keys per map, integer keys for fixed schema fields, text keys for open name-indexed maps, no float, container, mixed, or duplicate keys, canonical ordering);
17
16
- the signed envelope and identifier layout, with the signed-input boundary;
18
17
- the algorithm-tag namespace and its reserved private range;
19
18
- the unit vocabulary that schema fields reference by code.
20
19
20
+
## CBOR subset
21
+
22
+
Deterministic CBOR is RFC 8949 with its degrees of freedom removed. Plain CBOR can encode one logical value many ways: a tag or no tag, a short head or a padded one, a definite or an indefinite length, a float that compares equal to another but differs in bytes. Each such choice is a second byte form of one value, and each breaks content-addressing. This subset removes the choices, so a value has one structure and one length encoding before the value conventions below even apply. The rules track the core deterministic encoding of RFC 8949, narrowed further to the major types Murmur uses.
23
+
24
+
### One data item
25
+
26
+
A canonical artifact is exactly one CBOR data item. A decoder MUST consume the whole input as that single item and MUST reject any trailing byte, rather than stop at the first complete item and ignore the rest. Trailing bytes are how a second, unread value rides inside one that verifies, so an artifact that does not account for every byte is refused. The `reject/trailing-bytes` vector pins this.
27
+
28
+
### Permitted major types
29
+
30
+
Only these CBOR major types appear in a canonical artifact:
31
+
32
+
-**0** and **1**, unsigned and negative integers (Integers, below);
33
+
-**2**, byte strings (Text and byte strings, below);
34
+
-**3**, text strings (Text and byte strings, below);
35
+
-**4**, arrays;
36
+
-**5**, maps;
37
+
-**7** restricted to the two simple values true and false (Booleans, and the absence of null, below).
38
+
39
+
Every other use of major type 7 is excluded: the half, single, and double floats (No floating point, below), null and undefined (Booleans, and the absence of null, below), and every remaining simple value. Major type 6, the tag, is excluded entirely (No tags, below). A decoder MUST reject a major type, or a major-type-7 value, outside this list. The decimal and the rational are arrays of integers, not tagged numbers, so they need no type beyond the array and the integer.
40
+
41
+
### No tags
42
+
43
+
A CBOR tag (major type 6) MUST NOT appear in a canonical artifact. A tag is an optional annotation a decoder is free to ignore, so a value and its tagged form are two encodings of one thing. The meaning of a Murmur field comes from its schema position, never from a tag on the wire (Domain-declared magnitudes, below). The tagged decimal-fraction, bigfloat, and bignum forms are excluded by this rule, which is why the decimal and the rational are bare two-element arrays. An algorithm a digest or a key names is carried by the tag mechanism of specification Section 7.1, a field in the schema, not a CBOR tag.
44
+
45
+
### No indefinite lengths
46
+
47
+
Every byte string, text string, array, and map MUST carry a definite length in its head. The indefinite-length forms, and the break stop that closes them, MUST NOT appear, and a decoder MUST reject them. An indefinite length lets one value arrive as several chunks, which is a second byte form and a streaming-decode hazard at once. The `reject/indefinite-length-array` vector pins this.
48
+
49
+
### Minimal encoding
50
+
51
+
The head of every item MUST use the shortest of CBOR's argument forms that holds its value: the immediate form for an argument under 24, then the one, two, four, and eight byte forms in turn, never a longer head where a shorter one fits. This is one rule with a wide reach. It governs an integer value, a string or container length, and the element count of an array or map alike, because each is a CBOR argument. The value 0 is the single byte `0x00`, never `0x18 0x00`. A length of ten lives in the head, never in a longer following field. A decoder MUST reject a non-minimal head. The `reject/non-minimal-uint` vector pins the integer case; the rule is identical for every length and count.
52
+
53
+
### Maps and ordering
54
+
55
+
A map is structurally a definite-length, minimally headed container like any other, and this subset governs that framing. Its keys carry further rules: the permitted key types, the single key type per map, the ban on duplicates, and the canonical sort by encoded key bytes. Those belong to the map-key section (still to draft, above), and the `reject/duplicate-map-key` and `reject/unsorted-map-keys` vectors already exercise them. The key rules are kept distinct from the framing rule because a key's type and order are value conventions, not container structure.
56
+
57
+
### Bounded structure
58
+
59
+
Nesting is finite, and decoding it is resource-bounded. A canonical artifact has no cyclic or self-referential structure, since CBOR has none, but adversarial input can still nest arrays and maps deeply enough to exhaust a constrained decoder. Decoding MUST be bounded in memory and time under a declared limit, and MUST fail to the declared safe state when the limit is exceeded, by the input-cost rule of specification Section 7.3. A bound on nesting depth is the structural half of that rule.
60
+
21
61
## Value domain
22
62
23
63
A declared deadline, rate, or threshold must mean one number and encode to one byte form. The rules below give each scalar value exactly one canonical encoding.
0 commit comments