@@ -67,9 +67,10 @@ storage and bandwidth on the table. A first-class lossy quantization path
6767opens up new workloads — billions-scale ANN search, on-disk KV-cache, embedding
6868serving — where a lossless format is over-engineered.
6969
70- ** Vortex needs a clean answer to "where does lossy data live?"** The original
71- RFC modelled TurboQuant as a new physical encoding of the ` Vector ` extension
72- type. That model breaks down on three concrete questions:
70+ ** Vortex needs a clean answer to "where does lossy data live?"** A natural
71+ starting point would be to model TurboQuant as a new physical encoding of the
72+ ` Vector ` extension type — encoding-as-compression is how lossless schemes
73+ already work. That model breaks down on three concrete questions:
7374
7475- Are quantized vectors unit-normalized? After scalar quantization on a rotated
7576 unit vector, the inverse transform does not generally recover a unit vector.
@@ -136,6 +137,12 @@ and the result is named `Vector`, not "the original floats." `TQDecode(TQEncode(
136137returns values close to ` v ` (within MSE bounds), not ` v ` itself, and that
137138distinction is now visible in the operator chain.
138139
140+ ` TQDecode ` is a regular scalar function — analogous to a numeric cast or a
141+ parser — not a privileged canonicalization mechanism. It happens to be the
142+ _ only_ path back to ` Vector ` , but the routing is no different from any
143+ other function call: the user writes it, the planner sees it, the executor
144+ runs it.
145+
139146### Implications for the three-stage plan
140147
141148Both principles hold across every stage of the long-term plan:
@@ -438,28 +445,28 @@ shares. Stages 1, 2, and 3 in §6–§8 below extend or refine specific aspects;
438445nothing in this section ever goes away.
439446
440447``` text
441- ┌────────────────────────────────────────────────────────┐
442- │ User-facing API surface │
443- │ │
444- │ Vector<F, d> ──TQEncode──▶ Extension<TurboQuant> │
445- │ ◀──TQDecode── │
446- └────────────────────────────────────────────────────────┘
448+ ┌───────────────────────────────────────────────────────── ┐
449+ │ User-facing API surface │
450+ │ │
451+ │ Vector<F, d> ──TQEncode──▶ Extension<TurboQuant> │
452+ │ ◀──TQDecode── │
453+ └───────────────────────────────────────────────────────── ┘
447454 │
448455 │ ExtVTable
449456 ▼
450- ┌────────────────────────────────────────────────────────┐
451- │ Extension<TurboQuant> │
452- │ metadata (prost): {element_ptype, dimensions, │
453- │ bit_width, seed, num_rounds, [block_size]} │
454- │ │
455- │ storage: Struct { │
456- │ norms: Primitive<F> (Stage 1) │
457+ ┌───────────────────────────────────────────────────────── ┐
458+ │ Extension<TurboQuant> │
459+ │ metadata (prost): {element_ptype, dimensions, │
460+ │ bit_width, seed, num_rounds, [block_size]} │
461+ │ │
462+ │ storage: Struct { │
463+ │ norms: Primitive<F> (Stage 1) │
457464 │ | FSL<F, num_blocks> (Stage 2 k>1)│
458- │ codes: FSL<u8, padded_dim> │
459- │ | FSL<u8, num_blocks*block_size> │
460- │ | PDXArray<u8, ...> (Stage 3) │
461- │ } │
462- └────────────────────────────────────────────────────────┘
465+ │ codes: FSL<u8, padded_dim> │
466+ │ | FSL<u8, num_blocks*block_size> │
467+ │ | PDXArray<u8, ...> (Stage 3) │
468+ │ } │
469+ └───────────────────────────────────────────────────────── ┘
463470 ▲ ▲
464471 │ │
465472 │ derived (not stored) │ derived (not stored)
@@ -664,9 +671,11 @@ as a Stage 1 refinement:
664671 (after Max-Lloyd converges) using EDEN's optimization criterion. The
665672 implementer should consult EDEN [ 15] for the precise criterion — the note
666673 [ 14] defers to "methods described in the EDEN works" and does not
667- reproduce the algorithm itself. Reference implementation:
668- https://github.com/amitport/EDEN-Distributed-Mean-Estimation (MIT;
669- PyTorch and TensorFlow).
674+ reproduce the algorithm itself. The authors' official reproduction lives
675+ at https://github.com/amitport/EDEN-Distributed-Mean-Estimation (PyTorch
676+ and TensorFlow); note that this repository has ** no LICENSE file** , so
677+ it is reference reading only — Vortex's implementation must be
678+ clean-room from the EDEN paper.
670679- Cache ` (centroids, S) ` together under the existing `(padded_dim,
671680bit_width)` key in the ` DashMap`.
672681- Apply ` S ` at quantization time (encode: scale ` r * S ` before
@@ -695,13 +704,28 @@ and selecting at decode time. We recommend (a) — see §13 "Migration."
695704 diminishing returns vs. lossless schemes.
696705
697706``` rust
707+ // vortex-turboquant/src/config.rs, ff120401. Fields are private; access via
708+ // accessors (bit_width(), seed(), num_rounds()). All three are required at
709+ // construction — Stage 1 does not currently expose a default-config helper.
698710pub struct TurboQuantConfig {
699- pub bit_width : u8 , // 1..=8
700- pub seed : Option <u64 >, // default 42 (or session-configurable)
701- pub num_rounds : u8 , // default 3
711+ bit_width : u8 , // 1..=8 (validated in try_new)
712+ seed : u64 , // SORF seed
713+ num_rounds : u8 , // > 0 (validated in try_new)
714+ }
715+
716+ impl TurboQuantConfig {
717+ pub fn try_new (bit_width : u8 , seed : u64 , num_rounds : u8 ) -> VortexResult <Self >;
718+ pub fn bit_width (& self ) -> u8 ;
719+ pub fn seed (& self ) -> u64 ;
720+ pub fn num_rounds (& self ) -> u8 ;
702721}
703722```
704723
724+ Stage 1 stabilization may add a ` Default ` impl with the recommended values
725+ (bit_width 8, num_rounds 3, a session-derived seed). The seed is not
726+ currently optional — callers pass a concrete ` u64 ` and the session is
727+ responsible for choosing one.
728+
705729### Power-of-2 padding
706730
707731SORF requires power-of-2 input dimension. Non-power-of-2 dimensions are
@@ -1581,9 +1605,11 @@ a prost message; new fields are added as `optional` so that older readers
15811605treat them as their default value.
15821606
15831607Stage 1 → Stage 2: add ` block_size: Option<u32> ` . Stage 1 writers leave it
1584- as ` None ` ; Stage 1 readers see ` None ` and treat the array as a single
1585- padded block. Stage 2 writers populate it when k > 1; Stage 2 readers
1586- honor it.
1608+ unset; Stage 1 readers see unset and treat the array as a single padded
1609+ block. Stage 2 writers populate it ** only when k > 1** ; for k = 1
1610+ (power-of-2 dimensions) Stage 2 writers also leave it unset, producing a
1611+ wire-format-identical artifact to Stage 1 at the same dimension. Stage 2
1612+ readers accept both forms but writers converge on this rule.
15871613
15881614Stage 2 → Stage 3: no metadata change. The codes child's physical encoding
15891615shifts from ` FixedSizeListArray ` to ` PDXArray ` ; readers detect via
@@ -1699,7 +1725,7 @@ The ICLR 2026 camera-ready proceedings may use different numbering._
16991725
17001726[ 1] Zandieh, A., Daliri, M., Hadian, M. and Mirrokni, V. "TurboQuant: Online
17011727Vector Quantization with Near-optimal Distortion Rate." ICLR 2026.
1702- arXiv:2504.19874 , April 2025.
1728+ arXiv:2504.19874v1 , April 2025.
17031729
17041730[ 2] Ailon, N. and Chazelle, B. "The Fast Johnson-Lindenstrauss Transform and
17051731Approximate Nearest Neighbors." SIAM J. Comput. 39(1):302-322, 2009.
@@ -1901,45 +1927,57 @@ intentionally leaves open.
19011927
19021928### D.2 Extension type metadata (prost)
19031929
1904- The on-disk metadata is the prost-encoded form of :
1930+ ** Current schema ** at ` vortex-turboquant/src/vtable.rs ` (ff120401) :
19051931
19061932``` rust
1907- // Reproduced from vortex-turboquant/src/vtable.rs at ff120401.
1933+ #[derive( Clone , PartialEq , Message )]
19081934struct TurboQuantMetadataProto {
1909- element_ptype : PType , // tag 1, enum
1910- dimensions : u32 , // tag 2
1911- bit_width : u32 , // tag 3 (fits in u8 at the type level)
1912- seed : u64 , // tag 4
1913- num_rounds : u32 , // tag 5 (fits in u8 at the type level)
1914- // Stage 2 addition (not in current source):
1915- block_size : optional u32 , // tag 6
1916- // Future:
1917- unbiased : optional bool , // tag 7 — reserve for EDEN unbiased mode
1935+ #[prost(enumeration = " PType" , tag = " 1" )]
1936+ element_ptype : i32 ,
1937+ #[prost(uint32, tag = " 2" )]
1938+ dimensions : u32 ,
1939+ #[prost(uint32, tag = " 3" )]
1940+ bit_width : u32 , // u8 at the type level
1941+ #[prost(uint64, tag = " 4" )]
1942+ seed : u64 ,
1943+ #[prost(uint32, tag = " 5" )]
1944+ num_rounds : u32 , // u8 at the type level
19181945}
19191946```
19201947
1921- The current ` TurboQuantMetadataProto ` in ` vortex-turboquant/src/vtable.rs `
1922- defines tags 1–5. Stage 2 adds ` block_size ` at tag 6; future stages
1923- add additional optional tags as needed. Prost optional-field semantics
1924- mean older readers ignore unknown tags, so the wire format is forward-
1925- compatible by construction.
1948+ ** This RFC proposes** adding the following tags, in this order, as the
1949+ corresponding stages land:
1950+
1951+ - ** Tag 6 — ` block_size: Option<u32> ` ** (Stage 2). Stage 1 writers leave
1952+ this unset; readers treat unset as "single padded block." Stage 2
1953+ writers set it when k > 1; for k = 1 (power-of-2 dimensions) writers
1954+ also leave it unset, preserving bit-identical wire format with Stage 1
1955+ at d = 1024 (see Open Questions resolution #4 in §16).
1956+ - ** Tag 7 — reserved** for a future ` unbiased: Option<bool> ` flag when (if)
1957+ EDEN's native b-bit unbiased mode is added (§15). Reserving the tag now
1958+ keeps the field-add policy clean.
1959+
1960+ Prost optional-field semantics mean older readers ignore unknown tags, so
1961+ the schema is forward-compatible by construction.
19261962
19271963** Constraint:** tags 1–5 are part of Vortex's stable contract once Stage 1
19281964ships in a release. Renumbering or repurposing them would be a wire-format
1929- break.
1965+ break. Tag 6's semantics are locked once Stage 2 ships.
19301966
19311967### D.3 Validation
19321968
1933- The ` ExtVTable::validate_dtype ` implementation enforces:
1934-
1935- - ` dimensions >= MIN_DIMENSION ` (` = 128 ` ). On violation: error
1936- ` "TurboQuant dimensions must be >= 128, got {N}" ` .
1937- - ` 1 ≤ bit_width ≤ MAX_BIT_WIDTH ` (` = 8 ` ). On violation: error
1938- ` "TurboQuant bit_width must be 1-8, got {N}" ` .
1939- - ` num_rounds > 0 ` . On violation: error
1940- ` "TurboQuant num_rounds must be > 0, got 0" ` .
1941- - ` element_ptype.is_float() ` (one of F16, F32, F64). On violation: error
1942- ` "TurboQuant element_ptype must be a float, got {ptype}" ` .
1969+ The ` ExtVTable::validate_dtype ` implementation enforces (error templates
1970+ shown with ` vortex_ensure! ` placeholder substitution, where ` {var} ` is
1971+ filled with the offending value at runtime):
1972+
1973+ - ` dimensions >= MIN_DIMENSION ` (` = 128 ` ). On violation:
1974+ ` "TurboQuant dimensions must be >= 128, got {dimensions}" ` .
1975+ - ` 1 ≤ bit_width ≤ MAX_BIT_WIDTH ` (` = 8 ` ). On violation:
1976+ ` "TurboQuant bit_width must be 1-8, got {bit_width}" ` .
1977+ - ` num_rounds > 0 ` . On violation:
1978+ ` "TurboQuant num_rounds must be > 0, got {num_rounds}" ` .
1979+ - ` element_ptype.is_float() ` (one of F16, F32, F64). On violation:
1980+ ` "TurboQuant element_ptype must be a float, got {element_ptype}" ` .
19431981- Storage dtype is `Struct { norms: Primitive<element_ptype>, codes:
19441982FixedSizeList<u8, padded_dim> }` with matching row-validity propagation.
19451983
@@ -1954,9 +1992,9 @@ fits in `u32` (i.e., `2^31`).
19541992
19551993``` text
19561994fn tq_encode(v: Vector<F, d>, cfg: &TurboQuantConfig) -> TurboQuantArray:
1957- padded_dim = next_power_of_two(d)
1958- if padded_dim does not fit u32:
1959- return Err(OverflowError)
1995+ if d > MAX_DIMENSION: # MAX_DIMENSION = 2^31
1996+ return Err(OverflowError) # next_power_of_two would overflow
1997+ padded_dim = next_power_of_two(d) # checked: ≤ 2^31
19601998
19611999 n = ‖v‖₂ # in input dtype F (f16/f32/f64)
19622000 if n > 0:
@@ -1985,6 +2023,11 @@ fn tq_encode(v: Vector<F, d>, cfg: &TurboQuantConfig) -> TurboQuantArray:
19852023` centroids = get_centroids(padded_dim, bit_width) ` and ` S = get_eden_scale(padded_dim, bit_width) `
19862024both come from the process-local cache keyed on ` (padded_dim, bit_width) ` .
19872025
2026+ ** Licensing note.** The EDEN-` S ` derivation must be implemented clean-room
2027+ from the EDEN paper [ 15] . The authors' official reproduction at
2028+ https://github.com/amitport/EDEN-Distributed-Mean-Estimation has no
2029+ LICENSE file and therefore cannot be copied or adapted into Vortex.
2030+
19882031### D.5 Decode (Stage 1) pseudocode
19892032
19902033``` text
@@ -2162,22 +2205,39 @@ Stages 2 and 3 add:
21622205
21632206### D.11 Performance budgets
21642207
2165- Goals (from §11) become budgets here, with verification commands:
2166-
2167- - ** Encode throughput, Stage 1, AVX-512, d = 768, b = 8** : ≥ 1 M vectors/sec.
2168- Verify with ` cargo run -p vortex-turboquant --release --bench encode_decode ` .
2169- - ** Decode throughput, Stage 1, AVX-512, d = 768, b = 8** : ≥ 1 M vectors/sec.
2170- - ** Encode throughput, Stage 2, AVX-512, d = 768, k = 3, B = 256, b = 8** :
2171- ≥ 1.3 M vectors/sec (≥ 30% faster than Stage 1 padded, matching the FLOP
2172- ratio in §11).
2173- - ** Compression ratio, b = 8** : 3.0× (Stage 1 padded at d = 768), 3.9×
2174- (Stage 2 k = 3 at d = 768), 4.0× (any stage at d = 1024).
2175- - ** Normalized MSE, b = 8, d ≥ 128** : < 5e-5 on Gaussian inputs.
2176- - ** Stage 3 PDX scan throughput, AVX-512, b = 4, d = 768** : ≥ 1.5×
2177- Stage 2's row-major kernel throughput on 1 M-row scan.
2178-
2179- These are starting budgets; the Experimental plan (§12) refines them with
2180- real workloads.
2208+ The goals in §11 become measurable budgets once Stage 1 stabilization has
2209+ baseline numbers. Concrete budgets are deliberately ** not yet pinned** in
2210+ this RFC — pinning them with fabricated round numbers would mislead the
2211+ implementer about what "on track" means. The plan is:
2212+
2213+ 1 . ** Baseline.** Run the existing benchmark
2214+ ` cargo bench -p vortex-turboquant --bench encode_decode --release ` on a
2215+ reference machine (e.g., AVX-512 Sapphire Rapids, ARM Neoverse-V2) at
2216+ d ∈ {768, 1024, 1536} and b ∈ {4, 8}, recording encode and decode
2217+ throughput in vectors/sec.
2218+ 2 . ** Pin the baselines as budgets** in this appendix, with the machine
2219+ and command used to produce them.
2220+ 3 . ** Stage 2 and Stage 3 budgets** then become "X% of Stage 1 padded" with
2221+ ` X ` chosen against the FLOP-ratio analysis in §11 (Stage 2 should be
2222+ ~ 40% faster than padded Stage 1 at d=768; Stage 3 PDX should be 1.5–2×
2223+ Stage 2 row-major per [ 4, Table 4] ).
2224+
2225+ Until those baselines land, the only budget the RFC commits to is the
2226+ ** compression ratio** (which is determined by the wire format and is
2227+ exact, not measured):
2228+
2229+ | Configuration | Per-vec bits (f32 input) | Ratio |
2230+ | -------------------------------- | ------------------------ | ----- |
2231+ | Stage 1 padded, d=768, b=8 | 8,224 | 3.0× |
2232+ | Stage 2 (k=3, B=256), d=768, b=8 | 6,240 | 3.9× |
2233+ | Stage 1 or 2 (k=1), d=1024, b=8 | 8,224 | 4.0× |
2234+
2235+ And the ** normalized MSE bound** (which is from Theorem 1 — see §4 and
2236+ Appendix A): ` ≤ 2.72 / 4^b ` per vector in expectation on Haar-random
2237+ rotations; the SORF approximation observes ~ 20% slack over this bound at
2238+ d=1024 in ` vortex-turboquant ` 's test suite, so the operational assertion
2239+ is ` normalized_mse_per_vector ≤ 3.3 / 4^b ` until tighter empirical bounds
2240+ land.
21812241
21822242### D.12 Registry / dispatch wiring
21832243
@@ -2245,13 +2305,7 @@ work, not blockers on the design:
22452305 exact mixing function is a Stage 2 implementation detail that should be
22462306 pinned before the first Stage 2 file is written; the wire format
22472307 becomes load-bearing once any file is shipped.
2248- 4 . ** Wire-format identity at d=1024 across Stage 1 → Stage 2 writers.** When
2249- a Stage 2 writer emits a power-of-2-dimension TurboQuant array (so
2250- k = 1), should it write ` block_size = None ` (matching Stage 1 readers'
2251- default exactly) or ` block_size = Some(padded_dim) ` ? Stage 2 readers
2252- accept both; Stage 1 readers only accept ` None ` . Writers must converge
2253- to one or the other to preserve cross-version write/read interop.
2254- 5 . ** Whether to lower ` MIN_DIMENSION ` after Stage 1 experimental
2308+ 4 . ** Whether to lower ` MIN_DIMENSION ` after Stage 1 experimental
22552309 validation.** If the experimental plan supports lowering to 64–96, the
22562310 change is a wire-format break in the sense that files written at
22572311 d < 128 by a new writer would be rejected by an old reader. Surface
@@ -2281,3 +2335,11 @@ re-litigate.
22812335 Stage 1 cleanup task in §14 "Current state and known gaps." Removal
22822336 follows once ` vortex/examples/turboquant_vector_search.rs ` migrates to
22832337 the new ` TQEncode ` / ` TQDecode ` path.
2338+ - ** Wire-format identity at d = 1024 across Stage 1 → Stage 2 writers** —
2339+ resolved: Stage 2 writers leave ` block_size ` unset whenever k = 1
2340+ (i.e., for any power-of-2 dimension that doesn't require decomposition).
2341+ This produces a wire-format-identical artifact to a Stage 1 writer at
2342+ d = 1024 and remains readable by Stage 1 readers. Stage 2 readers accept
2343+ both ` unset ` (treat as single padded block) and ` Some(padded_dim) ` (an
2344+ encoder bug to be flagged in validation), but writers must converge on
2345+ ` unset ` . See §13 "Migration and compatibility."
0 commit comments