Commit bd628e3
committed
fix(quasicryth-research): codex P2 + coderabbit review — bug + 4 nits
Addresses PR #461 review feedback.
LOAD-BEARING BUG (codex P2 / coderabbit Critical):
CowArt silently dropped keys ≥ 256
==================================================
The original three-variant ART (Node4 / Node16 / Node256) was
byte-keyed at the leaf level — Node256 only handled values 0..255.
With u32 word-IDs, any corpus of 257+ unique words would silently
lose entries from the unigram trie. Result:
- Variant::Flat round-tripped correctly (HashMap-based)
- Variant::CowRadix produced OutOfVocabulary on word_id ≥ 256
even though the codebook was sized to include every unique word
Tests masked the bug because they used 5-word vocabularies.
Fix: replace the three-variant ArtNode enum with a single
sparse-children node:
struct ArtNode {
children: BTreeMap<u32, Arc<ArtNode>>,
leaf: Option<u32>,
}
- Loses the ART byte-keyed Node4/Node16/Node256 branch-free
optimization. The optimization assumed byte keys; u32 keys
don't fit it without per-byte decomposition (which would be a
much bigger refactor).
- Gains correctness for arbitrary u32 keys including word IDs
≥ 256 (which is most real text).
- Preserves the COW property — every insert returns a new root
via path-copy, prior roots stay valid. This is the
architectural point of the variant, and it's what the
workspace's append-only doctrine needs.
- BTreeMap (not HashMap) for deterministic iteration order,
useful for any future serialization or cross-impl comparison.
Two regression tests added so this bug can't recur silently:
- cow_art_handles_arbitrary_u32_keys
Inserts 302 keys spanning 0..300 + 1_000_000 + u32::MAX;
verifies every one round-trips. The original implementation
would have dropped 1_000_000 and u32::MAX silently.
- cow_radix_codebook_handles_large_vocabulary
Builds a 300-unique-word codebook via CowRadixCodebook; asserts
every word ID (including 256..299) is findable via
unigram_index(). This is the exact codex P2 scenario.
Total tests: 84 (was 83). +2 from the regression tests, +1 from a
renamed-and-tightened existing test.
SECONDARY FINDINGS
==================
coderabbit Critical — sanddrift_tiling docstring:
The module docstring claimed all generators satisfy the no-adjacent-S
invariant, but sanddrift's substitution L→LSSL produces SS pairs by
design (LL forbidden, not SS). The upstream gen_sanddrift_tiles in
fib.c also bypasses the SS→L merge for the same reason — preserving
the substitution structure.
Fix: update module docstring to name sanddrift as the documented
exception; rename + strengthen the sanddrift test to assert the
ACTUAL invariant (LL forbidden), not the wrong one (no-adjacent-S).
Behaviour unchanged — matches the C reference.
coderabbit Minor — Cargo.toml comments misrepresent crate scope:
Both workspace Cargo.toml and crate Cargo.toml had stale "algebraic
core only" comments from phase 0. Updated to reflect the full
pipeline shipped in phases 1-6 (arithmetic coder, tokenization,
codebook variants, compress/decompress).
coderabbit Minor — Sturmian assertion too loose:
tests/paper_theorems.rs::sturmian_factor_complexity_is_n_plus_1
asserted `factors.len() <= n + 1`, which would pass for degenerate
(sub-Sturmian, periodic) streams. Sturmian minimality (Paper §4.10,
Thm 7 corollary) requires EXACTLY n+1 distinct length-n factors.
Strengthened to assert_eq! with a clearer error message.
This catches drift toward either degenerate or super-Sturmian
streams.
Verification:
cargo test --manifest-path crates/quasicryth-research/Cargo.toml
→ 68 unit + 9 paper-theorem + 7 cross-variant = 84 passed
cargo clippy --all-targets -- -D warnings clean
cargo fmt clean
Zero deps preserved. No unsafe.1 parent 7fed9b9 commit bd628e3
5 files changed
Lines changed: 134 additions & 262 deletions
File tree
- crates/quasicryth-research
- src
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
56 | 60 | | |
57 | 61 | | |
58 | 62 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
12 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
13 | 14 | | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
19 | 25 | | |
20 | 26 | | |
21 | 27 | | |
| |||
0 commit comments