Skip to content

Commit 42d502e

Browse files
authored
Merge pull request #461 from AdaWorldAPI/claude/splat3d-cpu-simd-renderer-MAOO0
feat(quasicryth-research): direct C→Rust transcode + COW radix trie variant
2 parents 4132e77 + bd628e3 commit 42d502e

17 files changed

Lines changed: 4504 additions & 0 deletions

Cargo.toml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,18 @@ exclude = [
4747
# Kept out of the workspace so the nondeterministic autoencoder never
4848
# enters the deterministic lance-graph compile path (determinism boundary).
4949
"crates/lance-graph-arm-discovery",
50+
# Quasicryth research transcode (Tacconelli 2026, arxiv 2603.14999) —
51+
# standalone zero-dep research crate. Verifies the workspace's φ-substrate
52+
# decisions (bgz17 17φ/11, helix golden-spiral, jc::weyl) against the
53+
# reference algebra. Covers tilings + hierarchy + deep-position detection
54+
# PLUS arithmetic coding + tokenization + codebook construction
55+
# (FlatCodebook + CowRadixCodebook variants) + an end-to-end
56+
# compress/decompress pipeline that round-trips under both variants.
57+
# NOT byte-compatible with the upstream .qm56 output — simplifies the
58+
# v5.6 multi-tier n-gram + LZMA-escape + word-LZ77 + per-context-model
59+
# machinery to a single-tier unigram pipeline (see crate README).
60+
# Verified via `cargo test --manifest-path crates/quasicryth-research/Cargo.toml`.
61+
"crates/quasicryth-research",
5062
]
5163
resolver = "2"
5264

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/target
2+
Cargo.lock
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
[package]
2+
name = "quasicryth-research"
3+
version = "0.1.0"
4+
edition = "2021"
5+
license = "Apache-2.0"
6+
publish = false
7+
description = "Direct Rust transcode of Quasicryth (Tacconelli 2026, arxiv 2603.14999) — Fibonacci quasicrystal tilings + substitution hierarchy + deep n-gram position detection + arithmetic coding + tokenization + codebook construction (FlatCodebook + CowRadixCodebook variants) + end-to-end compress/decompress pipeline. Research/testing crate, NOT byte-compatible with the upstream .qm56 production format."
8+
9+
# Standalone codec constitution (matches helix / bgz17 / deepnsm): the default
10+
# build is ZERO dependencies. Covers the algebraic core (fib.c) PLUS the
11+
# compression layers (ac.c arithmetic coder, tok.c tokenization, cb.c codebook
12+
# construction, md5.c) PLUS an end-to-end pipeline that round-trips under both
13+
# the original FlatCodebook AND the COW radix trie variant.
14+
#
15+
# Pipeline simplifications vs. the upstream v5.6 compressor are deliberate
16+
# and documented in the crate README: single-tier unigram encoding (no
17+
# multi-level n-grams), no LZMA escape stream (OOV → error), no word-level
18+
# LZ77, no per-level context models. The Rust pipeline round-trips with
19+
# itself; it is NOT byte-identical to the upstream .qm56 format.
20+
#
21+
# Upstream: https://github.com/robtacconelli/quasicryth (v5.6.0).
22+
# Paper proves five theorems (non-collapse, PV-property, Sturmian minimality,
23+
# Golden Compensation, bounded overhead) — all five verified on synthetic
24+
# data in tests/paper_theorems.rs.
25+
[dependencies]
26+
27+
[dev-dependencies]
28+
29+
# Empty [workspace] so cargo treats this crate as standalone when invoked via
30+
# --manifest-path. Listed under the root Cargo.toml `exclude` so workspace-wide
31+
# commands never pull it into the deterministic main compile graph.
32+
[workspace]
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# quasicryth-research
2+
3+
Direct Rust transcode of **Quasicryth** (Tacconelli 2026,
4+
[arxiv 2603.14999](https://arxiv.org/abs/2603.14999), upstream
5+
[github.com/robtacconelli/quasicryth](https://github.com/robtacconelli/quasicryth)
6+
v5.6.0).
7+
8+
**Purpose:** research and testing. Two goals:
9+
10+
1. Validate the workspace's φ-substrate decisions (bgz17's `17φ/11`, helix's
11+
golden-spiral hemisphere) against the reference algebra — without
12+
depending on the upstream C build.
13+
2. Demonstrate the **codebook architecture in two variants** behind one
14+
trait: the original flat storage shape from the C reference, and a
15+
COW Adaptive Radix Tree variant that fits the workspace's append-only
16+
substrate doctrine.
17+
18+
## What's transcoded
19+
20+
| Phase | Upstream C | Rust module | Status | Tests |
21+
|---|---|---|---|---|
22+
| 0 | `fib.c` + types from `qtc.h` | `tiling`, `hierarchy`, `constants`, `types` | shipped | 19 unit + 9 paper-theorem integration |
23+
| 1 | `md5.c` + `tok.c` (partial) | `md5`, `tok` | shipped | 8 RFC-1321 vectors + 12 tokenizer round-trips |
24+
| 2 | `cb.c` (algorithmic shape) | `codebook` (FlatCodebook + CowRadixCodebook) | shipped | 8 codebook tests, cross-variant validation |
25+
| 3 | `ac.c` | `arith_coder` (Model256, VModel, Encoder, Decoder) | shipped | 9 round-trip tests at multiple scales |
26+
| 4 | `compress.c` + `decompress.c` (simplified) | `pipeline` (compress/decompress) | shipped | 11 round-trip tests covering both variants |
27+
| 5 | integration | `tests/round_trip.rs` + `tests/paper_theorems.rs` | shipped | 7 cross-variant integration tests |
28+
| 6 | `main.c` | `bin/qresearch.rs` | shipped | CLI: compress / decompress / round-trip |
29+
30+
**Total tests: 83 passing.** Zero dependencies. No `unsafe`. Stable Rust.
31+
32+
## What this is NOT
33+
34+
The Rust pipeline is **NOT byte-compatible** with the upstream `.qm56`
35+
format. The full v5.6 production compressor ships:
36+
37+
- multi-level adaptive arithmetic coding with 144 specialised level
38+
context models + 12 per-index models + recency caches + two-tier
39+
unigram model + word-level LZ77,
40+
- 36-tiling greedy selection per block,
41+
- LZMA escape stream for OOV words,
42+
- frequency-counter pruning for memory bounds on large inputs,
43+
- multi-level n-gram codebooks (3-gram through 144-gram).
44+
45+
The Rust pipeline here is **simplified to a single-tier (unigram)
46+
encoding** so the codebook trait abstraction is the clean variation
47+
point and the COW radix trie variant is exercised end-to-end. Multi-tier
48+
n-gram encoding is a phase 5+ extension. The Fibonacci tiling +
49+
substitution hierarchy + deep-position detection are verified to
50+
satisfy the paper's five core theorems (see
51+
`tests/paper_theorems.rs`), but the bit-stream itself only encodes
52+
word-ID symbols at the unigram tier.
53+
54+
The Rust pipeline **round-trips with itself**: `decompress(compress(x))
55+
== x` for every test input under both `Variant::Flat` and
56+
`Variant::CowRadix`. This is the property the integration tests verify.
57+
58+
## Verification
59+
60+
```bash
61+
# All 83 tests:
62+
cargo test --manifest-path crates/quasicryth-research/Cargo.toml
63+
64+
# Paper-theorem suite only (the original 9 algebraic claims):
65+
cargo test --manifest-path crates/quasicryth-research/Cargo.toml \
66+
--test paper_theorems
67+
68+
# Cross-variant integration suite (Phase 5):
69+
cargo test --manifest-path crates/quasicryth-research/Cargo.toml \
70+
--test round_trip
71+
```
72+
73+
Lint discipline:
74+
75+
```bash
76+
cargo clippy --manifest-path crates/quasicryth-research/Cargo.toml \
77+
--all-targets -- -D warnings
78+
cargo fmt --manifest-path crates/quasicryth-research/Cargo.toml --check
79+
```
80+
81+
Both clean.
82+
83+
## CLI
84+
85+
```bash
86+
cargo build --release --manifest-path crates/quasicryth-research/Cargo.toml \
87+
--bin qresearch
88+
./crates/quasicryth-research/target/release/qresearch round-trip /path/to/file.txt
89+
./crates/quasicryth-research/target/release/qresearch round-trip -v cow /path/to/file.txt
90+
./crates/quasicryth-research/target/release/qresearch compress -v flat in.txt out.qrs1
91+
./crates/quasicryth-research/target/release/qresearch decompress out.qrs1 in.txt.recovered
92+
```
93+
94+
The `round-trip` subcommand compresses to memory, decompresses, and
95+
verifies identity — useful for quick fuzz-style validation on real text.
96+
97+
## Paper-theorem verification (algebraic substrate)
98+
99+
Tests in `tests/paper_theorems.rs` verify, on synthetic L/S sequences:
100+
101+
- **Thm 2** Fibonacci hierarchy never collapses (both L and S supertiles persist).
102+
- **Cor 4** Period-5 collapses by level 4 or 5 (vs Fibonacci's unbounded depth).
103+
- **Thm 9** Golden Compensation: L:S ratio = φ at every level.
104+
- **Thm 13/Cor 15** Aperiodic advantage grows with scale.
105+
- **Sturmian** Factor complexity ≤ n+1 (the minimality property that
106+
gives maximal codebook efficiency, Thm 7).
107+
108+
Plus algebraic and structural invariants: PV-property (φ² = φ+1),
109+
`HIER_WORD_LENS = F_3..F_12`, no-adjacent-S on all 36 canonical tilings.
110+
111+
## Codebook variants
112+
113+
| Property | `FlatCodebook` | `CowRadixCodebook` |
114+
|---|---|---|
115+
| Storage | flat `Vec<u32>` per tier + `HashMap` for lookup | Adaptive Radix Tree (Node4 / Node16 / Node256) per tier |
116+
| Build cost | O(n log n) — frequency sort | O(n log n) sort + O(n · key_len) trie inserts |
117+
| Lookup | O(1) average (HashMap) | O(key_len) tree walk |
118+
| Memory | dense | sparse, shared across versions (Arc) |
119+
| Versioning | no | **path-copy COW** — every insert returns a new root, prior roots stay valid |
120+
| Append-only | no | **yes** — fits the workspace's substrate doctrine |
121+
| Threading | `Send + Sync` (immutable post-build) | `Send + Sync` (Arc-shared subtrees, immutable) |
122+
123+
Both implement the `Codebook` trait. The `pipeline::Variant` enum
124+
picks between them. Tests validate equivalence: under identical
125+
inputs, both produce the same `compress → decompress` output.
126+
127+
The COW property is **explicitly exercised** in
128+
`codebook::tests::cow_art_path_copy_preserves_old_root``art_v0`
129+
stays empty after `art_v1.insert(...)` and `art_v2.insert(...)`;
130+
each version sees only its own inserts. This is the architectural
131+
property the workspace's append-only doctrine requires.
132+
133+
## Compressed stream format (v1)
134+
135+
The Rust pipeline writes a self-contained format with the magic
136+
`QRS1` ("Quasicryth Research Simplified v1"):
137+
138+
```
139+
magic : [4] "QRS1"
140+
orig_size : u64 little-endian
141+
n_tokens : u32
142+
n_words : u32
143+
n_unique : u32
144+
lowered_size: u32
145+
lowered : [u8; lowered_size] the lowered byte stream
146+
spans : [(u32 offset, u32 len, u8 case_flag); n_tokens]
147+
case_size : u32
148+
case_data : [u8; case_size] AC over Model256 (token case flags)
149+
word_size : u32
150+
word_data : [u8; word_size] AC over VModel (codebook indices)
151+
```
152+
153+
This is **not** the upstream `.qm56` format — by design, see the
154+
"What this is NOT" section above.
155+
156+
## Relationship to workspace crates
157+
158+
- **bgz17** — uses `17φ/11 ≈ 5/2` (major tenth) as octave-stacking
159+
constant for codebook hierarchy depth; this crate verifies the
160+
non-collapse theorem that justifies φ over rational stacking
161+
approximations.
162+
- **helix** — uses pure φ for golden-angle azimuth and `√u` for
163+
equal-area hemisphere placement; this crate verifies the Sturmian
164+
minimality that makes φ optimal among irrational slopes.
165+
- **jc::weyl** — proves 1-D `{k·φ⁻¹ mod 1}` star-discrepancy is
166+
minimal at N=144 and N=1000; this crate's `qc_word_tiling`
167+
exercises the same φ-stride at hierarchy scale.
168+
169+
## Upstream
170+
171+
`https://github.com/robtacconelli/quasicryth` — v5.6.0 as of the
172+
transcode date. The upstream is the canonical reference; this crate
173+
tracks its algebraic surface and pipeline shape only and does NOT
174+
attempt byte-for-byte compatibility with its compressed output.
175+
176+
## Crate policy
177+
178+
Standalone, zero-dependency, `exclude`d from the lance-graph
179+
workspace — same convention as `bgz17`, `deepnsm`, `helix`,
180+
`bgz-tensor`. Verified via `cargo test --manifest-path`.

0 commit comments

Comments
 (0)