@@ -5,6 +5,92 @@ Per `PLAN.md` §1 rule 6, every resolution of an ambiguity or deviation from
55
66---
77
8+ ## D23 — Phase 8 fuzzing is a seeded in-house harness; libFuzzer waits for Phase 11
9+
10+ ** Phase:** 8 · ** Status:** accepted
11+
12+ ` PLAN.md ` Phase 8's exit demands "fuzzer clean on adversarial input", while
13+ the dedicated fuzz harness (cargo-fuzz/libFuzzer) is a Phase 11 deliverable.
14+ Pulling ` cargo-fuzz ` forward would add a nightly-only toolchain and external
15+ deps that D4's reasoning avoids.
16+
17+ ** Decision:** Phase 8 ships a deterministic, seeded adversarial-input suite
18+ (` proto/tests/proto_fuzz.rs ` ): 100k random byte strings, 100k corpus
19+ mutations (bit flips, truncations, overwrites, splices), and hand-built
20+ hostile container shapes (depth bombs, claimed-giant containers,
21+ str32/bin32 length lies). Every decoded request must also survive a
22+ canonical re-encode/re-decode round-trip. Coverage-guided fuzzing arrives
23+ with Phase 11's harness; this suite stays as the fast deterministic gate.
24+
25+ ## D22 — Clause desugaring semantics: aggregates, distinct, having
26+
27+ ** Phase:** 8 · ** Status:** accepted
28+
29+ ` SPEC.md ` §5.4 fixes the clause order (FROM → WHERE → GROUP → HAVING →
30+ PROJECT → ORDER → LIMIT) but leaves three details open. The clause lowerer
31+ desugars into a pipeline and reuses the pipeline fold, so equivalence is by
32+ construction; these rules define the desugaring:
33+
34+ - ** Aggregates in the clause ` select ` list** become named ` group ` outputs:
35+ the alias names the output (` {as:["spent",{sum:…}]} ` → agg ` spent ` ), an
36+ unaliased aggregate gets its function name (` {count:1} ` → ` count ` ), and a
37+ name collision is a typed error. v1 allows aggregates only as the * whole*
38+ select item (no ` {add:[{sum:x},1]} ` ); grouping is implied by ` group_by `
39+ * or* select-list aggregates.
40+ - ** ` distinct ` is an IR operator** (` Plan::Distinct ` ), placed after PROJECT
41+ and before ORDER in the clause order. ` ARCHITECTURE.md ` §3.7's operator
42+ list omits it though the stage exists in ` SPEC.md ` §5.3/§11; an explicit
43+ operator beats desugaring into ` Aggregate ` , which would entangle lowering
44+ with planning. (Addition, not contradiction — flagged per rule 6.)
45+ - ** ` having ` without grouping** is rejected at lowering; ` {distinct:false} `
46+ is an explicit no-op stage. Structural shape (pipeline starts at a ` scan ` ,
47+ later sources arrive via ` join ` , ` group ` outputs really are aggregate
48+ calls) is also enforced at lowering — names/types/§6 safety stay in the
49+ Phase 9 validator.
50+
51+ ## D21 — Protocol version field and cursor-token envelope
52+
53+ ** Phase:** 8 · ** Status:** accepted
54+
55+ ` PLAN.md ` Phase 8 requires a protocol version field and a keyset cursor
56+ token; ` SPEC.md ` §5's grammar shows neither a version nor the token's bytes.
57+
58+ ** Decision:**
59+ - ** Version:** requests may carry a top-level ` v ` (int). Missing means
60+ version 1; any other value is a typed ` UnsupportedVersion ` error. Results
61+ always carry ` v:1 ` . Error results are ` {v, ok:false, code, error} ` with
62+ ` code ` the stable ` SPEC.md ` §9 category identifier (the shape §5.6 leaves
63+ implicit for the failure case).
64+ - ** Cursor token:** opaque bytes ` [version 0x01][crc32c(payload) BE][payload] ` .
65+ The payload (keyset position) is defined with the executor in Phase 9; the
66+ envelope is fixed now so a truncated or mangled token is a clean
67+ ` Validation ` error instead of a nonsense seek. CRC32C moved from ` pager `
68+ to ` common ` (same in-house routine, D3) so ` proto ` shares it without
69+ depending on storage crates.
70+
71+ ## D20 — Two-stage hardened decode: bytes → bounded Doc tree → AST
72+
73+ ** Phase:** 8 · ** Status:** accepted
74+
75+ ` ARCHITECTURE.md ` §6 requires limits enforced * before* allocating and no
76+ unbounded recursion, but does not fix the decoder's architecture.
77+
78+ ** Decision:** decoding is two stages. A small hardened reader produces a
79+ generic ` Doc ` tree (null/bool/int/float/str/bin/array/map) under
80+ ` DecodeLimits ` — max message size (checked before reading), max depth
81+ (explicit counter), max node count (budget charged per node; container item
82+ counts are validated against both the remaining bytes and the remaining
83+ budget before any ` Vec ` allocation). It rejects the reserved byte ` 0xC1 ` ,
84+ ext types, non-string and duplicate map keys, invalid UTF-8, ints outside
85+ ` i64 ` , and trailing bytes. The AST mapping then works on the already-safe
86+ tree and enforces the grammar (unknown ops/stages/expressions/fields are
87+ typed errors). Defaults: 1 MiB / depth 64 (matching ` types::MAX_JSON_DEPTH ` )
88+ / 100k nodes, embedder-configurable per ` SPEC.md ` §8. The node cap bounds
89+ the intermediate tree's memory, so the two-stage shape costs nothing
90+ adversarially and keeps all byte-level hardening in one ~ 200-line module.
91+ Insert-row values that are containers become ` json ` values (re-encoded
92+ canonical MessagePack), matching §5.5's ` data:{role:"admin"} ` example.
93+
894## D19 — Reclamation only runs inside a committing batch
995
1096** Phase:** 7 · ** Status:** accepted (bug fix of Phase 4 behavior)
0 commit comments