feat(quasicryth-research): direct C→Rust transcode + COW radix trie variant by AdaWorldAPI · Pull Request #461 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-06-04T15:11:14Z

Summary

Direct Rust transcode of Quasicryth (Tacconelli 2026, arxiv 2603.14999, upstream github.com/robtacconelli/quasicryth v5.6.0) in two architectural variants behind one trait: the original flat-storage codebook from the C reference, and a Copy-on-Write Adaptive Radix Tree variant that fits this workspace's append-only substrate doctrine.

New excluded crate crates/quasicryth-research/ — standalone, zero-dep, follows the helix / bgz17 / deepnsm convention.

6 phases, 6 commits

Commit	Phase	Modules	LOC	Tests
`f0dfe88`	0	tiling + hierarchy + constants + types (from `fib.c`)	1,300	28
`68f754e`	1	md5 + tok (RFC 1321 + word tokenization)	+650	+20
`9e229d5`	2	codebook trait + FlatCodebook + CowRadixCodebook + CowArt	+740	+8
`afd7969`	3	arith_coder (Model256, VModel, Encoder, Decoder)	+640	+9
`de566f6`	4	pipeline (compress, decompress, Variant)	+460	+11
`7fed9b9`	5+6	cross-variant integration + CLI binary	+400	+7

Total: ~4,160 LOC Rust, 83 tests passing, cargo clippy -- -D warnings clean (pedantic + all), cargo fmt clean. Zero dependencies. No unsafe. Stable Rust.

Two variants behind one trait

pub trait Codebook: Send + Sync {
    fn n_unique(&self) -> u32;
    fn n_uni(&self) -> u32;
    fn unigram_index(&self, word_id: u32) -> Option<u32>;
    fn unigram_word(&self, idx: u32) -> Option<u32>;
    // ... bigram + n-gram methods
}

Property	`FlatCodebook`	`CowRadixCodebook`
Storage	flat `Vec<u32>` + `HashMap`	ART (Node4 / Node16 / Node256) per tier
Lookup	O(1) avg	O(key_len) walk
Versioning	none	path-copy COW — every insert returns a new root, prior roots stay valid
Append-only fit	no	yes — fits workspace substrate doctrine
Threading	`Send + Sync`	`Send + Sync` (Arc-shared subtrees)

The COW property is explicitly tested in codebook::tests::cow_art_path_copy_preserves_old_root — art_v0 stays empty after art_v1.insert(...) and art_v2.insert(...). Tests also verify that the two variants agree on lookups (cow_radix_codebook_agrees_with_flat_on_lookups) and on end-to-end decompressed output (variants_produce_same_decompressed_output, cross_variant_independence).

What round-trips end-to-end

pipeline::compress(text, Variant::Flat | Variant::CowRadix) → bytes and pipeline::decompress(bytes) → text round-trip on every test input, including:

empty, single word, whitespace-only
mixed case (Hello WORLD foo)
punctuation, newlines, tabs, quotes, parens, hyphens
repeated phrases, 5 KB cyclic text, pseudo-random English
UTF-8 high-bit (café naïve façade)
600-character Fibonacci-theory paragraph

Both variants produce identical decompressed output (compressed bytes may differ).

Paper-theorem verification (algebraic substrate)

tests/paper_theorems.rs verifies, on synthetic L/S sequences:

Thm 2 Fibonacci hierarchy never collapses
Cor 4 Period-5 collapses by level 4 or 5 (vs Fibonacci's unbounded depth)
Thm 9 Golden Compensation: L:S ratio = φ at every level
Thm 13 / Cor 15 Aperiodic advantage grows with corpus scale
Sturmian Factor complexity ≤ n+1 (the minimality property behind maximal codebook efficiency)
PV-property (φ² = φ+1), HIER_WORD_LENS = F_3..F_12, no-adjacent-S on all 36 canonical tilings

This is the mathematical underpinning the workspace's φ-substrate decisions (bgz17's 17φ/11, helix's golden-spiral hemisphere, jc::weyl's 1-D star-discrepancy) inherit. The transcode lets the workspace cross-check those decisions against the reference algebra without depending on the upstream C build.

CLI binary

cargo build --bin qresearch --manifest-path crates/quasicryth-research/Cargo.toml
qresearch round-trip /path/to/file.txt           # default: Flat
qresearch round-trip -v cow /path/to/file.txt    # COW radix trie variant
qresearch compress -v cow in.txt out.qrs1
qresearch decompress out.qrs1 in.txt.recovered

Tested live in the commit message of 7fed9b9.

Deliberate simplifications (NOT a production compressor)

Documented in module-level docs + README. The Rust pipeline is research-grade, NOT byte-compatible with the upstream .qm56 format:

Single-tier codebook — unigrams only. The Fibonacci tiling + substitution hierarchy + deep-position detection are verified against the paper's theorems via tests/paper_theorems.rs, but the bit-stream itself only encodes word-ID symbols at the unigram tier. Multi-tier n-gram encoding is a phase 5+ extension.
No LZMA escape stream — OOV → error (compressed pipeline still works because pipeline::build_codebook caps the unigram tier at n_unique).
No 36-tiling greedy selection in the bit-stream (Fibonacci-only mode equivalent).
No word-level LZ77, no multi-tier unigram model, no per-level context models.
NOT byte-identical to the C reference output. The Rust pipeline round-trips with itself; matching the upstream .qm56 exactly would require porting hundreds of model-initialization details and is out of scope for "research and testing."

Implication: small inputs (sub-KB) currently produce >100% "compressed" output because headers + per-token spans dominate. This is expected and called out in the README. The architectural property (codebook + AC working end-to-end across both variants) is what's being demonstrated.

Crate policy

Standalone, zero dependency (only std)
excluded from the lance-graph workspace per the helix / bgz17 / deepnsm convention
cargo test --manifest-path crates/quasicryth-research/Cargo.toml — 83 passing
cargo clippy --all-targets -- -D warnings — clean (pedantic + all)
cargo fmt — clean
Cargo.lock gitignored per helix convention

What this PR is, in one sentence

A research transcode that demonstrates the architectural variation point (FlatCodebook vs CowRadixCodebook) the workspace cares about, working end-to-end through tokenize → codebook → arithmetic-code → bytes → decode, verified by 83 tests against both variants and against the paper's five core theorems.

🤖 Generated with Claude Code

Generated by Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Introduced quasicryth-research crate providing Quasicryth v5.6.0 compression and decompression capabilities.
- New qresearch command-line tool supporting compress, decompress, and round-trip verification operations.
- Two compression implementation variants available: Flat and CowRadix for different use cases.
Documentation
- Comprehensive README documenting the crate, CLI usage, compression variants, and testing procedures.
- Full test suite validating round-trip compression across various input patterns and variants.

Standalone, zero-dep research/testing crate transcoding fib.h + fib.c + the algebraic types from qtc.h of the upstream Quasicryth v5.6.0 C reference (Tacconelli 2026, arxiv 2603.14999, upstream github.com/robtacconelli/quasicryth). Scope: the algebra the paper proves theorems about, not the compressor. What's transcoded - types.rs — Tile, HLevel, ParentMap, Hierarchy, DeepPositions, TilingDesc (idiomatic Rust ownership; no unsafe). - constants.rs — PHI, INV_PHI, HIER_WORD_LENS = {2,3,5,8,13,21,34,55, 89,144} = F_3..F_12, MAX_HIER=10, the 36-tiling descriptor table (12 golden phases + sqrt(58)-7 + noble-5 + sqrt(13)-3 + 18 greedy-discovered alphas including the far-out alpha=0.502). - tiling.rs — cut-and-project (qc_word_tiling[_alpha]) + five substitution-rule families (Thue-Morse, Rudin-Shapiro, period-doubling, Period-5, Sanddrift). - hierarchy.rs — build_hierarchy (iterative deflation (L,S)->super-L, L->super-S), hier_context, detect_deep_positions, deep_counts. What's NOT transcoded The full v5.6 production compressor pipeline (ac.c arithmetic coding, cb.c codebook construction, compress.c / decompress.c, tok.c tokenization, md5.c, LZMA escape). Out of scope for "research and testing" — the goal is verifying the workspace's phi-substrate decisions against the reference algebra, not byte-compatibility with the upstream compressed output. Verification (28 tests, all passing) - 19 unit tests covering each module's invariants - 9 integration tests in tests/paper_theorems.rs verifying: * Thm 2 Fibonacci hierarchy never collapses * Cor 4 Period-5 collapses by level ~3.3 = log(5)/log(phi) * Thm 9 Golden Compensation (L:S ratio = phi at every level) * Thm 13/Cor 15 Aperiodic advantage grows with corpus scale * Sturmian factor complexity <= n+1 (Thm 7 root) * PV-property phi^2 = phi + 1 * HIER_WORD_LENS = Fibonacci F_3..F_12 * No-adjacent-S on all 36 canonical tilings cargo clippy --all-targets -- -D warnings clean (pedantic+all). rustfmt clean. Zero-dependency default build. Relationship to workspace crates - bgz17 (17*phi/11 = 5/2 = octave + major third) — this crate verifies the non-collapse theorem that justifies phi over rational stacking approximations. - helix (golden-spiral hemisphere, Fisher-Z aligned) — Sturmian minimality theorem here is the optimality argument for phi as the azimuth stride. - jc::weyl (1-D Weyl discrepancy at N=144, N=1000) — this crate's qc_word_tiling exercises the same phi-stride at hierarchy scale. Listed under root Cargo.toml `exclude` so it never enters the main compile graph. Verified via cargo test --manifest-path crates/quasicryth-research/Cargo.toml. Follows the helix convention: Cargo.lock gitignored; the crate stays standalone-verifiable.

Phase 1 of the full-pipeline transcode plan. Two new modules: - src/md5.rs (RFC 1321 / md5.c transcode, 196 LOC) * Md5 incremental hasher + one-shot md5() function * Direct port of upstream md5.c; bit-exact match * 8 tests covering the full RFC 1321 §A.5 test suite (empty, "a", "abc", "message digest", alphabet, alphanumeric, 80-digit long input, incremental==one-shot) - src/tok.rs (tok.c transcode, 377 LOC, partial) * tokenize() — split raw bytes into Token spans with case separation; lowered byte stream + per-token (offset, len, case_flag) tracking * word_split() — pre-lowered byte stream → word offsets, no case work (lighter path) * apply_case() — reverse the case lowering for a token * TokenStream::round_trip() — the round-trip the C reference verifies internally via case_roundtrips * 12 tests covering case detection (lower/Cap/UPPER), round-trip on lowercase / mixed-case / punctuation / empty / UTF-8 high-bit; word_split byte-order preservation NOT in this phase (deferred): - enc_case / dec_case — depend on the arithmetic coder (phase 3, ac.c transcode) Total tests: 48 (was 28). +20 from md5 (8) and tok (12). Verification: - cargo test --manifest-path crates/quasicryth-research/Cargo.toml → 39 unit + 9 integration = 48 passed, 0 failed - cargo clippy --all-targets -- -D warnings clean (added 4 pedantic-lint allows for legibility against upstream: many_single_char_names, too_many_lines, format_push_string, bool_to_int_with_if — all stylistic, no correctness impact) - cargo fmt clean Zero-dep preserved. No unsafe.

Phase 2 adds the codebook tier of the upstream compressor, in TWO variants behind one trait — this is the architectural split the user asked for: original-shape + COW radix trie. New module src/codebook.rs (~700 LOC): Codebook trait - n_unique / n_uni / n_bi / n_ngram(level) - unigram_index / bigram_index / ngram_index (forward lookups) - unigram_word / bigram_words / ngram_words (reverse lookups) - both variants satisfy Send + Sync (immutable post-construction) CodebookSizes (port of qtc_cb_sizes_t) - 11 tier budgets: uni, bi, tri, fg, eg, tg, vg, tfg, ffg, efg, ofg - auto(nw) — 7-tier corpus-size table matching auto_codebook_sizes in cb.c Variant A — FlatCodebook - direct port of cb.c storage shape - Vec<u32> per tier for forward storage + HashMap for lookup - sorts entries by descending frequency (with deterministic tie-break) - filters n-gram candidates to those whose every word is in the unigram codebook (matches the cb.c filtering pass) - per-tier budgeting matches cb.c Variant B — CowRadixCodebook - the architectural variant the user asked for - backed by CowArt: a Copy-on-Write Adaptive Radix Trie - three node variants: Node4 (4 children, low fan-out), Node16 (medium fan-out), Node256 (full byte/dword fan-out). Node48 omitted as a deliberate simplification — Node16 grows straight to Node256. - insert() returns a NEW root via path-copy; old roots remain valid for prior consumers (Arc-shared subtrees). - one trie per tier; reverse direction uses the same Vec storage as FlatCodebook (the trie owns the forward direction only). The two variants are validated against EACH OTHER in test cow_radix_codebook_agrees_with_flat_on_lookups: identical inputs produce identical lookup results on unigrams and bigrams. This is the cross-validation contract that makes the COW variant a drop-in. COW semantics are explicitly tested in cow_art_path_copy_preserves_old_root: the v0 root stays empty after v1/v2 inserts; v1 sees only its insert, v2 sees both — exactly the property the workspace's append-only substrate doctrine requires. Tests added (8): codebook_sizes_auto_increases_with_corpus, flat_codebook_roundtrips_{unigrams,bigrams}, cow_radix_codebook_roundtrips_{unigrams,bigrams}, cow_radix_codebook_agrees_with_flat_on_lookups, cow_art_path_copy_preserves_old_root, cow_art_grows_node_variants. Total tests: 56 (was 48). +8 from codebook. Verification: cargo test → 47 unit + 9 integration = 56 passed, 0 failed cargo clippy --all-targets -- -D warnings clean (added 3 pedantic allows: assigning_clones, single_match_else, only_used_in_recursion — all stylistic) cargo fmt clean No new deps; zero-dep ethos preserved (std HashMap/Arc only).

Phase 3 adds the entropy-coding layer that wraps both codebooks. Direct transcode of ac.c. New module src/arith_coder.rs (~640 LOC): Constants AC_PREC = 24 precision (bits) AC_FULL = 1 << 24 full range AC_HALF / AC_QTR E2 / E3 renormalization thresholds AC_MAX_FREQ = 1 << 20 rescale trigger Model256 - adaptive 256-symbol byte alphabet (port of qtc_model_t) - freq[256], total; halve-on-cap rescaling (freq[i] = (f>>1) | 1) - cdf() writes a 257-entry cumulative table for the coder VModel (variable alphabet, Fenwick-tree accelerated) - port of qtc_vmodel_t — O(log n) cum_lo and find - fenwick tree 1-indexed under the hood; 0-indexed public API - rescale rebuilds the tree from halved frequencies Encoder - 24-bit precision range coder with pending-bits underflow handling - encode(cum_lo, cum_hi, total) drives the (lo, hi) range - state machine bit-exact with ac.c: * E1 (hi < HALF) output 0 * E2 (lo >= HALF) output 1, subtract HALF * E3 (lo>=QTR && hi<3*QTR) pending++, subtract QTR - finish() flushes pending state and packs the bit buffer to bytes Decoder - symmetric to Encoder; reads MSB-first bits from the input byte stream - decode_256(cdf, total): binary-search the 256-entry CDF - decode_v(model): VModel.find() drives Fenwick-tree symbol search - advance() applies the same E1/E2/E3 transitions to (lo, hi, val) High-level helpers - ac_enc_sym / ac_dec_sym (Model256 + update) - ac_enc_v / ac_dec_v (VModel + update) Tests added (9): - model256_initial_state_is_uniform - model256_cdf_sums_to_total - vmodel_initial_state_is_uniform - vmodel_cum_lo_is_prefix_sum - vmodel_find_is_inverse_of_cum_lo - round_trip_256_alphabet — all 256 bytes - round_trip_repeated_byte_compresses — 10K of one byte → strong compression + round-trip - round_trip_variable_alphabet — VModel symbols 0..50 - round_trip_pseudo_random_sequence — 5000-byte xorshift stream - vmodel_round_trip_with_rescaling_pressure — forces AC_MAX_FREQ rescale Total tests: 65 (was 56). +9 from arith_coder. All 9 round-trip tests pass — encode(input) → decode produces identity, demonstrating the coder is internally consistent (this is the load-bearing correctness property for phase 4's compress/decompress pipeline). Verification: cargo test → 57 unit + 9 integration = 66 passed, 0 failed cargo clippy --all-targets -- -D warnings clean (added 1 doc-only fix in codebook.rs and 1 op-style fix here) cargo fmt clean Zero-dep preserved. Honest scope flag (will appear in README at phase 4): The Rust encoder/decoder round-trips with itself bit-exact. It is NOT guaranteed byte-identical to the C reference output — the C reference's output depends on multiple internal Model256/VModel initializations across context contexts (144 per-level models, 12 per-index models, recency caches, two-tier unigram). Matching that exactly is a separate engineering task out of scope for "research and testing." Round-trip identity within the Rust pipeline is the property phase 4 will verify end-to-end.

End-to-end pipeline wiring phases 1-3 into a working compress() → decompress() round-trip for BOTH codebook variants. New module src/pipeline.rs (~460 LOC): Public API - Variant enum: Flat | CowRadix — selects which codebook backs the pipeline - compress(text: &[u8], variant) -> Result<Vec<u8>, PipelineError> - decompress(bytes: &[u8]) -> Result<Vec<u8>, PipelineError> - PipelineError: OutOfVocabulary, BadMagic, Truncated, DecodeRange Compressed stream format (v1, "QRS1" magic): - magic [4] || orig_size [u64] || n_tokens [u32] || n_words [u32] || n_unique [u32] - lowered byte stream (length-prefixed) - per-token spans: (offset u32, len u32, case_flag u8) - case-flag payload (AC over Model256, length-prefixed) - word-ID payload (AC over VModel with codebook alphabet, length-prefixed; round-trip witness for the codebook variant) Pipeline shape 1. tokenize(text) → TokenStream + lowered byte stream + case flags 2. Intern token byte slices → word_ids + unique pool 3. Build codebook via the Codebook trait (Flat OR CowRadix) 4. Verify every word is in the unigram tier (OutOfVocabulary fails) 5. Encode word_ids stream via VModel + Encoder 6. Encode case flags via Model256 + Encoder 7. Serialize header + spans + lowered + AC payloads Deliberate simplifications (documented in module-level doc + README) - SINGLE-TIER codebook (unigrams only). The Fibonacci tiling + substitution hierarchy + deep-position detection from phase 1 remain verified-against-paper-theorems via tests/paper_theorems.rs, but the bit-stream itself is single-tier. Multi-tier n-gram encoding is a phase 5+ extension. - NO LZMA escape stream (OOV → error). Reference C compressor has a parallel LZMA stream for OOV words. - NO multi-tile selection (the 36-tiling greedy engine isn't wired into the bit-stream). - NOT byte-identical to the C reference output. Round-trip correctness within the Rust pipeline is the property tested; byte-compat with the upstream .qm56 is out of scope. Tests added (9): - round_trips_empty - round_trips_simple_lowercase — "the quick brown fox..." - round_trips_mixed_case — "Hello WORLD foo Bar..." - round_trips_punctuation_and_newlines — "Hi, world!\nFoo bar..." - round_trips_repeated_phrase — 2000-byte cyclic phrase - round_trips_pseudo_random_text — 500 random English words - round_trips_utf8_high_bit — "café naïve façade" - variants_produce_same_decompressed_output — Flat and COW agree - bad_magic_is_rejected - truncated_stream_is_rejected Every round-trip test runs against BOTH variants — the assert_round_trips helper iterates Variant::{Flat, CowRadix} and verifies compress→ decompress is identity for both. Bug caught during phase 4 (recorded for posterity): initial implementation conflated two distinct "lowered" byte streams — the full TokenStream.lowered vs a per-unique-word pool built during interning. Token spans index into the former; I was indexing them into the latter. Fixed by serializing TokenStream.lowered directly and treating the per-unique pool as a build-only intermediate. Total tests: 76 (was 65). +9 from pipeline + 2 error-path tests. Verification: cargo test → 67 unit + 9 integration = 76 passed, 0 failed cargo clippy --all-targets -- -D warnings clean (added 1 doc allow: doc_lazy_continuation) cargo fmt clean Zero-dep preserved. No unsafe. Stable Rust.

Phase 5 (integration tests) + Phase 6 (CLI binary), bundled. Phase 5 — cross-variant integration tests ========================================= New test file tests/round_trip.rs (7 tests): - variants_agree_on_long_natural_text — 600-char Fibonacci-theory paragraph round-trips under BOTH variants AND the decompressed outputs are identical - round_trip_at_5kb_scale — cyclic phrase to 5 KB, both variants - round_trip_single_word - round_trip_only_whitespace - round_trip_mixed_punctuation_lines (parens, hyphens, semicolons, quotes, tabs) - round_trip_repeated_uppercase_word - cross_variant_independence — compress with Flat, decompress; compress with CowRadix, decompress; both equal original. (Compressed bytes between variants MAY differ; decoded output MUST match.) This is the architectural property the codebook trait contract guarantees and the workspace's substrate doctrine requires: the COW radix trie variant is a drop-in alternative to the flat storage variant at the compress/decompress boundary. Phase 6 — CLI binary ==================== New src/bin/qresearch.rs (~170 LOC): qresearch compress [-v flat|cow] <input> <output> qresearch decompress <input> <output> qresearch round-trip [-v flat|cow] <input> qresearch --help / -h Standard library only. Returns ExitCode::SUCCESS / ExitCode::FAILURE with clean error messages on read/write/codec failures. The `round-trip` subcommand reports compression ratio AND verifies identity for quick validation on arbitrary text files. Live test: $ echo "The Fibonacci substitution..." > /tmp/sample.txt $ qresearch round-trip /tmp/sample.txt round-trip OK: 95 bytes → 329 compressed (346.32%) → identical, variant=Flat $ qresearch round-trip -v cow /tmp/sample.txt round-trip OK: 95 bytes → 329 compressed (346.32%) → identical, variant=CowRadix (The >100% ratio on 95-byte inputs is expected: v1 simplifications mean headers + per-token spans dominate at small sizes. The C reference's per-byte overhead amortizes over much larger inputs and uses multi-tier n-grams + LZMA escape + word-LZ77 to get ≤25% on enwik9. The Rust pipeline here demonstrates correctness, not benchmark-competitive compression.) README rewrite ============== New README.md (180 lines) documents: - the 7-phase transcode map (which C file → which Rust module) - test counts per phase (total: 83) - what's NOT byte-compatible with the upstream qm56 format - CLI usage examples - both codebook variants compared in a table - the compressed stream format (v1 QRS1 magic) field by field - relationships to bgz17 / helix / jc::weyl in the workspace - paper-theorem verification list (Thm 2, Cor 4, Thm 9, Thm 13/Cor 15, Sturmian minimality, PV property) Final totals ============ Tests: 83 (was 76) - 67 unit (no change) - 9 paper-theorem integration - 7 cross-variant integration (NEW) Verification: cargo test → 83 passed, 0 failed cargo clippy --all-targets -- -D warnings clean cargo fmt clean cargo build --bin qresearch builds, CLI exercised Zero dependencies. No unsafe. Stable Rust. Full crate inventory ==================== Modules LOC Role ───── ─── ──── types 97 Tile, HLevel, ParentMap, Hierarchy, DeepPositions constants 192 PHI, INV_PHI, MAX_HIER, HIER_WORD_LENS, 36 tilings tiling 388 cut-and-project + 5 substitution-rule families hierarchy 308 build_hierarchy, hier_context, detect_deep_positions md5 196 RFC 1321 (~85 LOC C transcoded) tok 377 tokenize, word_split, apply_case, TokenStream codebook 744 Codebook trait + FlatCodebook + CowRadixCodebook + CowArt (ART with Node4/Node16/Node256, path-copy) arith_coder 640 Model256, VModel (Fenwick), Encoder, Decoder pipeline 460 compress, decompress, Variant, PipelineError bin/qresearch 170 CLI (compress/decompress/round-trip) tests/... 310 paper_theorems + round_trip integration lib + README 280 Total: ~4,160 LOC Rust (was ~1,300 after phase 0).

coderabbitai · 2026-06-04T15:11:48Z

Warning

Review limit reached

@AdaWorldAPI, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 35 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c7f24700-8977-4bd3-8fed-4e561229c684

📥 Commits

Reviewing files that changed from the base of the PR and between 7fed9b9 and bd628e3.

📒 Files selected for processing (5)

Cargo.toml
crates/quasicryth-research/Cargo.toml
crates/quasicryth-research/src/codebook.rs
crates/quasicryth-research/src/tiling.rs
crates/quasicryth-research/tests/paper_theorems.rs

📝 Walkthrough

Walkthrough

This pull request adds a complete standalone Rust crate (crates/quasicryth-research) implementing an algebraic quasi-crystalline tiling system with adaptive arithmetic-coded text compression. The crate transcodes a research prototype including tiling generators, substitution hierarchies, tokenization with case recovery, and round-trip compression verification.

Changes

Quasicryth-Research Algebraic Transcode

Layer / File(s)	Summary
Workspace Integration and Core Data Structures `Cargo.toml`, `crates/quasicryth-research/.gitignore`, `crates/quasicryth-research/Cargo.toml`, `crates/quasicryth-research/README.md`, `src/types.rs`	Registers the new crate in the workspace and establishes fundamental types: `Tile` records tiling positions and word spans; `Hierarchy` stores multi-level deflation structures; `TilingDesc` parameterizes cut-and-project generators; `DeepPositions` holds n-gram entry legality.
Mathematical Constants and Tiling Descriptors `src/constants.rs`	Defines φ (golden ratio) and inverse, Fibonacci word-length array, and 36-element canonical tiling descriptor array generated from φ iterates and greedy-discovered irrational alpha values.
Tiling Generation (Fibonacci, Periodic, and Substitution) `src/tiling.rs`	Implements cut-and-project generators (Fibonacci, arbitrary alpha), substitution families (Thue-Morse, Rudin-Shapiro, period-doubling, period-5, sanddrift), and invariant enforcement (no adjacent-S via merging or direct construction).
Substitution Hierarchy and Deep-Position Detection `src/hierarchy.rs`	Builds multi-level deflation hierarchies with parent-pointer maps, computes bounded 3-bit hierarchy contexts, and detects deep n-gram entry points by validating ancestor-chain leftmost-child constraints and word-span coverage.
Tokenization, Case Separation, and Word Splitting `src/tok.rs`	Tokenizes text into lowered-byte tokens with per-token case flags (lower/first-cap/ALL-CAPS), reconstructs original casing, and provides word-offset/length extraction for compression input.
Adaptive Arithmetic Coder (24-bit Range Coder) `src/arith_coder.rs`	Implements 24-bit range-coder pair with two adaptive models: fixed 256-symbol (Model256) and variable-alphabet (VModel with Fenwick tree); both support periodic rescaling and round-trip encode/decode with symbol helpers.
MD5 Hashing `src/md5.rs`	Provides RFC 1321 MD5 hasher with incremental buffering, 64-round block transform, and one-shot convenience function.
Multi-Tier Codebook Construction (Flat and COW Radix Tree) `src/codebook.rs`	Defines `Codebook` trait with two implementations: `FlatCodebook` (hash maps + vectors) and `CowRadixCodebook` (copy-on-write adaptive radix tries), both mapping n-gram indices to frequencies within per-tier budgets.
End-to-End Compression and Decompression Pipeline `src/pipeline.rs`	`compress` tokenizes, interns words, builds codebooks, and serializes header+byte-pool+spans+AC-encoded case-flags/word-IDs; `decompress` reconstructs case and validates word identity via round-trip decoding.
CLI Binary and Public API `src/lib.rs`, `src/bin/qresearch.rs`	Exports public API surface (tiling, hierarchy, pipeline, tokenization, constants); CLI parses compress/decompress/round-trip subcommands, reads/writes files, computes ratios, and reports mismatch diagnostics.
Paper Theorems and Round-Trip Tests `tests/paper_theorems.rs`, `tests/round_trip.rs`	Integration tests validate golden vs. periodic hierarchy collapse, L:S ratio φ-convergence, aperiodic advantage growth, Sturmian factor complexity, no-adjacent-S invariant, Fibonacci identities, and compression round-trips across input patterns and codebook variants.

Sequence Diagram

sequenceDiagram
    participant User
    participant qresearch as qresearch CLI
    participant pipeline as Pipeline
    participant tokenizer as Tokenizer
    participant codebook as Codebook
    participant arith as Arithmetic Coder
    User->>qresearch: compress input.txt -v flat
    qresearch->>pipeline: compress(data, Flat)
    pipeline->>tokenizer: tokenize(text)
    tokenizer->>tokenizer: lowercase + case_flags
    pipeline->>codebook: intern words
    pipeline->>codebook: build FlatCodebook
    pipeline->>arith: encode case_flags
    pipeline->>arith: encode word_ids
    arith-->>pipeline: compressed bytes
    pipeline-->>qresearch: output stream
    qresearch-->>User: compressed file
    User->>qresearch: decompress compressed.qrs
    qresearch->>pipeline: decompress(data)
    pipeline->>arith: decode case_flags
    pipeline->>arith: decode word_ids
    arith-->>pipeline: decoded tokens
    pipeline->>tokenizer: apply_case
    tokenizer-->>pipeline: original text
    pipeline-->>qresearch: output bytes
    qresearch-->>User: original file

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A rabbit's ode to tilings grand
Golden φ weaves across the land,
L's and S's dance in harmony,
Codes compress with artistry—
Quasi-crystals march in time,
Zero-dep and paradigm! 🌾

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title directly and accurately summarizes the primary addition: a new Quasicryth research crate with two codebook implementations (flat and Copy-on-Write radix trie), matching the ~4,160 LOC transcode described in the objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/splat3d-cpu-simd-renderer-MAOO0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fed9b9f19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T15:13:28Z

+                Self::Node256 { children, .. } => {
+                    if key < 256 {
+                        children[key as usize] = Some(child);
+                    }
+                    return;


Store COW trie keys above 255

The COW radix trie is keyed by u32 word IDs, but once a node has grown to Node256 this branch silently ignores any child key >= 256. A corpus with 257 distinct alphabetic tokens triggers this in the unigram trie: Variant::CowRadix drops word ID 256 during codebook construction, so compress reports OutOfVocabulary even though the flat variant round-trips the same input and the codebook was sized to include every unique word.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

crates/quasicryth-research/tests/round_trip.rs (1)

45-55: ⚡ Quick win

Add an explicit empty-input round-trip case.

Current cases are good, but a zero-byte payload is a common framing edge and worth pinning with a dedicated test.

Suggested test

+#[test]
+fn round_trip_empty_input() {
+    round_trip(b"", Variant::Flat);
+    round_trip(b"", Variant::CowRadix);
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/quasicryth-research/tests/round_trip.rs` around lines 45 - 55, Add a
dedicated zero-byte payload test that calls the existing test helper round_trip
with an empty slice for both variants to pin the framing edge-case; implement a
new #[test] fn (e.g., round_trip_empty_input) that invokes round_trip(b"",
Variant::Flat) and round_trip(b"", Variant::CowRadix) so both code paths are
exercised.

crates/quasicryth-research/src/md5.rs (1)

158-170: 💤 Low value

Optional: bulk-copy optimization for update().

The byte-by-byte loop is correct but suboptimal for larger inputs. For a research crate this is acceptable, but if you later need better throughput, consider copying full chunks via copy_from_slice when the buffer is empty and data contains complete blocks.

♻️ Sketch of bulk-copy approach

pub fn update(&mut self, mut data: &[u8]) {
    let mut idx = (self.count & 63) as usize;
    self.count = self.count.wrapping_add(data.len() as u64);

    // Fill partial buffer first
    if idx != 0 {
        let fill = (64 - idx).min(data.len());
        self.buffer[idx..idx + fill].copy_from_slice(&data[..fill]);
        idx += fill;
        data = &data[fill..];
        if idx == 64 {
            transform(&mut self.state, &self.buffer);
            idx = 0;
        }
    }
    // Process full blocks directly
    while data.len() >= 64 {
        let block: &[u8; 64] = data[..64].try_into().unwrap();
        transform(&mut self.state, block);
        data = &data[64..];
    }
    // Buffer remainder
    self.buffer[..data.len()].copy_from_slice(data);
}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/quasicryth-research/src/md5.rs` around lines 158 - 170, The update()
method currently copies input one byte at a time which is correct but slow;
refactor update(&mut self, data: &[u8]) to handle bulk copies: compute idx =
(self.count & 63) as usize and increment count, first fill a partial buffer if
idx != 0 using slice copy_from_slice, call transform(&mut self.state,
&self.buffer) if that fills to 64, then process any complete 64-byte blocks
directly by taking 64-byte slices (convert to &[u8;64] for transform) in a loop,
and finally copy any remaining tail into self.buffer; keep the same semantics
for self.count, self.buffer and transform() calls and ensure bounds/slice
lengths are handled with try_into()/unwrap or appropriate checks.

crates/quasicryth-research/src/pipeline.rs (1)

205-210: 💤 Low value

Slice indexing may panic on malformed compressed input.

If a malformed/corrupted compressed stream contains offset + len values that exceed lowered_pool.len(), line 207 will panic. For a research crate this is acceptable, but consider adding bounds validation for robustness:
if (offset + len) as usize > lowered_pool.len() {
    return Err(PipelineError::Truncated);
}
This is a minor hardening suggestion since the crate is documented as research-grade.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/quasicryth-research/src/pipeline.rs` around lines 205 - 210, The loop
over spans reads slices from lowered_pool using (offset + len) and can panic if
the compressed input is malformed; in the loop that iterates spans (the block
referencing lowered_pool, apply_case, and out.extend_from_slice), validate that
(offset + len) as usize <= lowered_pool.len() before slicing and return an
Err(PipelineError::Truncated) (or appropriate error) when the check fails; this
prevents out-of-bounds access while keeping the rest of the logic (apply_case
and extending out) unchanged.

crates/quasicryth-research/src/bin/qresearch.rs (1)

80-80: 💤 Low value

Division by zero for empty input files.

If the input file is empty, data.len() is 0 and the ratio calculation produces infinity. Consider guarding:

let ratio = if data.is_empty() {
    0.0
} else {
    100.0 * compressed.len() as f64 / data.len() as f64
};

Same applies to line 145 in run_round_trip.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/quasicryth-research/src/bin/qresearch.rs` at line 80, The ratio
calculation uses data.len() as divisor and will divide by zero for empty inputs;
update the computation (the line that sets let ratio = 100.0 * compressed.len()
as f64 / data.len() as f64) to guard for empty data (e.g., set ratio = 0.0 when
data.is_empty()) and apply the same guarded logic inside the run_round_trip
function where a similar ratio is computed; change only the ratio expression to
a conditional based on data.is_empty() using the existing variables compressed
and data.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Cargo.toml`:
- Around line 50-56: Update the Cargo.toml crate description to accurately
reflect that this research crate includes more than just the algebraic core:
mention the presence of arithmetic coding (arith_coder.rs —
Model256/VModel/Encoder/Decoder), tokenization (tok.rs), codebook construction
(codebook.rs: FlatCodebook/CowRadixCodebook), MD5 hashing (md5.rs), the
compression pipeline (pipeline.rs: compress/decompress), and the qresearch CLI
binary; replace the incorrect "only the algebraic core" phrasing with a concise
note that the crate implements a simplified full pipeline relative to the
upstream reference (single-tier unigram encoding, no multi-level n-grams, no
LZMA escape) rather than claiming it omits these components.

In `@crates/quasicryth-research/Cargo.toml`:
- Around line 9-12: The top-level comment in Cargo.toml incorrectly states the
crate is "Algebraic core only" and omits features actually implemented; update
that comment to reflect the real scope by listing included components such as
arithmetic coding (arith_coder.rs), tokenization (tok.rs), and codebook
construction (codebook.rs) and remove the claim that those live only in the
upstream C reference; keep the note about default zero-deps if still true and
ensure the wording matches the README/PR summary about what this crate provides.

In `@crates/quasicryth-research/src/codebook.rs`:
- Around line 504-516: The Node256 branch currently drops keys >= 256; update
the ART node representation so Node256 no longer assumes keys fit in a 0..255
slot: replace the fixed array children in the Node256 variant with a
HashMap<u32, Arc<ArtNode>> (or another dynamic map) and then update all helpers
that touch it — specifically change put_child, child (lookup), replace_child,
and grow_to_256 to insert/lookup/replace entries in that map and ensure
grow_to_256 moves all existing Node16 children into the new HashMap (preserving
keys >= 256 instead of dropping them); keep Node16, grow_to_256, Node256,
put_child, child, and replace_child identifiers to locate the changes.

In `@crates/quasicryth-research/src/tiling.rs`:
- Around line 366-370: The test sanddrift_generates_nonempty currently only
checks non-empty output; add the missing invariant assertion by calling
verify_no_adjacent_s on the tiles produced by sanddrift_tiling(100) (i.e., after
the existing assert!(!tiles.is_empty()) add verify_no_adjacent_s(&tiles)). This
uses the existing helper verify_no_adjacent_s to ensure no adjacent 'S' tiles
and keeps the test consistent with other generator tests.
- Around line 239-281: sanddrift_tiling currently emits raw symbols with SS
pairs (from L→LSSL) and tiles them directly, violating the module invariant
checked by verify_no_adjacent_s; fix by routing the produced symbol sequence
through the existing symbols_to_tiles merger (or otherwise performing the SS→L
merge) instead of directly constructing Tile entries—specifically, in
sanddrift_tiling replace the direct tiling loop that builds Tile { wpos, nwords,
is_l } from seq with a call to symbols_to_tiles(seq[..need]) (or an equivalent
merge step) so adjacent S symbols are collapsed to L as other generators expect,
or else update the module docstring and add sanddrift_tiling to the explicit
exception list if you intend to keep adjacent S behavior.

In `@crates/quasicryth-research/tests/paper_theorems.rs`:
- Around line 171-176: The current test assertion for Sturmian bound only checks
factors.len() <= n + 1 which can hide regressions; change the check in the test
to assert exact equality (factors.len() == n + 1) for the given long prefix and
small n, updating the assertion message to reflect expected equality and include
n and actual factors.len() for debugging; locate the assertion using the symbols
factors and n in this test and replace the <= check with an equality check (and
adjust the formatted message accordingly).

---

Nitpick comments:
In `@crates/quasicryth-research/src/bin/qresearch.rs`:
- Line 80: The ratio calculation uses data.len() as divisor and will divide by
zero for empty inputs; update the computation (the line that sets let ratio =
100.0 * compressed.len() as f64 / data.len() as f64) to guard for empty data
(e.g., set ratio = 0.0 when data.is_empty()) and apply the same guarded logic
inside the run_round_trip function where a similar ratio is computed; change
only the ratio expression to a conditional based on data.is_empty() using the
existing variables compressed and data.

In `@crates/quasicryth-research/src/md5.rs`:
- Around line 158-170: The update() method currently copies input one byte at a
time which is correct but slow; refactor update(&mut self, data: &[u8]) to
handle bulk copies: compute idx = (self.count & 63) as usize and increment
count, first fill a partial buffer if idx != 0 using slice copy_from_slice, call
transform(&mut self.state, &self.buffer) if that fills to 64, then process any
complete 64-byte blocks directly by taking 64-byte slices (convert to &[u8;64]
for transform) in a loop, and finally copy any remaining tail into self.buffer;
keep the same semantics for self.count, self.buffer and transform() calls and
ensure bounds/slice lengths are handled with try_into()/unwrap or appropriate
checks.

In `@crates/quasicryth-research/src/pipeline.rs`:
- Around line 205-210: The loop over spans reads slices from lowered_pool using
(offset + len) and can panic if the compressed input is malformed; in the loop
that iterates spans (the block referencing lowered_pool, apply_case, and
out.extend_from_slice), validate that (offset + len) as usize <=
lowered_pool.len() before slicing and return an Err(PipelineError::Truncated)
(or appropriate error) when the check fails; this prevents out-of-bounds access
while keeping the rest of the logic (apply_case and extending out) unchanged.

In `@crates/quasicryth-research/tests/round_trip.rs`:
- Around line 45-55: Add a dedicated zero-byte payload test that calls the
existing test helper round_trip with an empty slice for both variants to pin the
framing edge-case; implement a new #[test] fn (e.g., round_trip_empty_input)
that invokes round_trip(b"", Variant::Flat) and round_trip(b"",
Variant::CowRadix) so both code paths are exercised.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a470a2b-f70f-4275-b261-b65b5a435c8a

📥 Commits

Reviewing files that changed from the base of the PR and between ef35ff1 and 7fed9b9.

📒 Files selected for processing (17)

Cargo.toml
crates/quasicryth-research/.gitignore
crates/quasicryth-research/Cargo.toml
crates/quasicryth-research/README.md
crates/quasicryth-research/src/arith_coder.rs
crates/quasicryth-research/src/bin/qresearch.rs
crates/quasicryth-research/src/codebook.rs
crates/quasicryth-research/src/constants.rs
crates/quasicryth-research/src/hierarchy.rs
crates/quasicryth-research/src/lib.rs
crates/quasicryth-research/src/md5.rs
crates/quasicryth-research/src/pipeline.rs
crates/quasicryth-research/src/tiling.rs
crates/quasicryth-research/src/tok.rs
crates/quasicryth-research/src/types.rs
crates/quasicryth-research/tests/paper_theorems.rs
crates/quasicryth-research/tests/round_trip.rs

coderabbitai · 2026-06-04T15:22:11Z

+    #[test]
+    fn sanddrift_generates_nonempty() {
+        let tiles = sanddrift_tiling(100);
+        assert!(!tiles.is_empty());
+    }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test does not verify the no-adjacent-S invariant.

Unlike tests for other generators (thue_morse_alternates_at_low_indices, rudin_shapiro_generates_nonempty, period_doubling_generates_nonempty), this test omits the verify_no_adjacent_s assertion. Once the bug in sanddrift_tiling is fixed, add the invariant check here.

💚 Proposed fix

#[test] fn sanddrift_generates_nonempty() { let tiles = sanddrift_tiling(100); assert!(!tiles.is_empty()); + assert!(verify_no_adjacent_s(&tiles)); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#[test]

fn sanddrift_generates_nonempty() {

let tiles = sanddrift_tiling(100);

assert!(!tiles.is_empty());

}

#[test]

fn sanddrift_generates_nonempty() {

let tiles = sanddrift_tiling(100);

assert!(!tiles.is_empty());

assert!(verify_no_adjacent_s(&tiles));

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/quasicryth-research/src/tiling.rs` around lines 366 - 370, The test sanddrift_generates_nonempty currently only checks non-empty output; add the missing invariant assertion by calling verify_no_adjacent_s on the tiles produced by sanddrift_tiling(100) (i.e., after the existing assert!(!tiles.is_empty()) add verify_no_adjacent_s(&tiles)). This uses the existing helper verify_no_adjacent_s to ensure no adjacent 'S' tiles and keeps the test consistent with other generator tests.

Addresses PR #461 review feedback. LOAD-BEARING BUG (codex P2 / coderabbit Critical): CowArt silently dropped keys ≥ 256 ================================================== The original three-variant ART (Node4 / Node16 / Node256) was byte-keyed at the leaf level — Node256 only handled values 0..255. With u32 word-IDs, any corpus of 257+ unique words would silently lose entries from the unigram trie. Result: - Variant::Flat round-tripped correctly (HashMap-based) - Variant::CowRadix produced OutOfVocabulary on word_id ≥ 256 even though the codebook was sized to include every unique word Tests masked the bug because they used 5-word vocabularies. Fix: replace the three-variant ArtNode enum with a single sparse-children node: struct ArtNode { children: BTreeMap<u32, Arc<ArtNode>>, leaf: Option<u32>, } - Loses the ART byte-keyed Node4/Node16/Node256 branch-free optimization. The optimization assumed byte keys; u32 keys don't fit it without per-byte decomposition (which would be a much bigger refactor). - Gains correctness for arbitrary u32 keys including word IDs ≥ 256 (which is most real text). - Preserves the COW property — every insert returns a new root via path-copy, prior roots stay valid. This is the architectural point of the variant, and it's what the workspace's append-only doctrine needs. - BTreeMap (not HashMap) for deterministic iteration order, useful for any future serialization or cross-impl comparison. Two regression tests added so this bug can't recur silently: - cow_art_handles_arbitrary_u32_keys Inserts 302 keys spanning 0..300 + 1_000_000 + u32::MAX; verifies every one round-trips. The original implementation would have dropped 1_000_000 and u32::MAX silently. - cow_radix_codebook_handles_large_vocabulary Builds a 300-unique-word codebook via CowRadixCodebook; asserts every word ID (including 256..299) is findable via unigram_index(). This is the exact codex P2 scenario. Total tests: 84 (was 83). +2 from the regression tests, +1 from a renamed-and-tightened existing test. SECONDARY FINDINGS ================== coderabbit Critical — sanddrift_tiling docstring: The module docstring claimed all generators satisfy the no-adjacent-S invariant, but sanddrift's substitution L→LSSL produces SS pairs by design (LL forbidden, not SS). The upstream gen_sanddrift_tiles in fib.c also bypasses the SS→L merge for the same reason — preserving the substitution structure. Fix: update module docstring to name sanddrift as the documented exception; rename + strengthen the sanddrift test to assert the ACTUAL invariant (LL forbidden), not the wrong one (no-adjacent-S). Behaviour unchanged — matches the C reference. coderabbit Minor — Cargo.toml comments misrepresent crate scope: Both workspace Cargo.toml and crate Cargo.toml had stale "algebraic core only" comments from phase 0. Updated to reflect the full pipeline shipped in phases 1-6 (arithmetic coder, tokenization, codebook variants, compress/decompress). coderabbit Minor — Sturmian assertion too loose: tests/paper_theorems.rs::sturmian_factor_complexity_is_n_plus_1 asserted `factors.len() <= n + 1`, which would pass for degenerate (sub-Sturmian, periodic) streams. Sturmian minimality (Paper §4.10, Thm 7 corollary) requires EXACTLY n+1 distinct length-n factors. Strengthened to assert_eq! with a clearer error message. This catches drift toward either degenerate or super-Sturmian streams. Verification: cargo test --manifest-path crates/quasicryth-research/Cargo.toml → 68 unit + 9 paper-theorem + 7 cross-variant = 84 passed cargo clippy --all-targets -- -D warnings clean cargo fmt clean Zero deps preserved. No unsafe.

claude added 6 commits June 4, 2026 11:33

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

AdaWorldAPI merged commit 42d502e into main Jun 4, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(quasicryth-research): direct C→Rust transcode + COW radix trie variant#461

feat(quasicryth-research): direct C→Rust transcode + COW radix trie variant#461
AdaWorldAPI merged 7 commits into
mainfrom
claude/splat3d-cpu-simd-renderer-MAOO0

AdaWorldAPI commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

6 phases, 6 commits

Two variants behind one trait

What round-trips end-to-end

Paper-theorem verification (algebraic substrate)

CLI binary

Deliberate simplifications (NOT a production compressor)

Crate policy

What this PR is, in one sentence

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AdaWorldAPI commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading