Skip to content

Commit ea8ab73

Browse files
lwwmanningclaude
andcommitted
[RFC 60] Address second /rfc-review round: license accuracy, code-claim parity, framing
Fixes a BLOCKER, two MAJORs, and 9 MINOR/NIT items identified in the second review of the RFC. BLOCKER: - amitport/EDEN-Distributed-Mean-Estimation is unlicensed (no LICENSE file). Removed the false "(MIT; PyTorch and TensorFlow)" annotation in §6 and added an explicit "no LICENSE — reference reading only, clean-room re-implementation required" note. Same note now also in Appendix D.4 for implementer-facing visibility. MAJORs: - §2 Motivation: replaced "The original RFC modelled TurboQuant as a new physical encoding..." with framing that doesn't reference the abandoned RFC 33 draft. - §6 TurboQuantConfig Rust snippet: aligned with the actual code at ff120401 — private fields, seed: u64 (not Option<u64>), explicit try_new constructor and getters. Noted that a Default impl may come in Stage 1 stabilization. - Appendix D.11 performance budgets: replaced fabricated round numbers with an explicit "TBD pending Stage 1 stabilization benchmarking" plan; only the compression-ratio budgets (which are exact from the wire format) remain pinned, plus the MSE bound from Theorem 1 with the SORF approximation slack. MINOR/NIT: - [1] arXiv version pinned to v1 in the reference entry (consistent with [4] and [12]). - Appendix D.2: reframed prost schema as "current schema (tags 1–5)" + "this RFC proposes (tag 6 for Stage 2, tag 7 reserved)" instead of the misleading "Reproduced from..." that added unsourced fields. - Appendix D.4: reordered the overflow check (validate `d <= MAX_DIMENSION` before calling `next_power_of_two`, which would otherwise overflow first). - Resolved Open Question #4 (wire-format identity at d=1024): Stage 2 writers leave `block_size` unset for k=1 so the artifact is bit-identical to a Stage 1 file. Documented the resolution in §13 and in the "Resolved during initial drafting" list. - §5 architecture diagram: fixed the cosmetic alignment on the box right borders. - §3.2 added a clarifying sentence that TQDecode is a regular scalar function (not a privileged canonicalization mechanism). - Appendix D.3: tightened error-message text to match the actual vortex_ensure! placeholder format ({dimensions}, {bit_width}, etc., not "got 0" / "got N" constants). Signed-off-by: Will Manning <will@willmanning.io> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Will Manning <will@willmanning.io>
1 parent eff7709 commit ea8ab73

1 file changed

Lines changed: 146 additions & 84 deletions

File tree

rfcs/0060-block-turboquant.md

Lines changed: 146 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,10 @@ storage and bandwidth on the table. A first-class lossy quantization path
6767
opens up new workloads — billions-scale ANN search, on-disk KV-cache, embedding
6868
serving — where a lossless format is over-engineered.
6969

70-
**Vortex needs a clean answer to "where does lossy data live?"** The original
71-
RFC modelled TurboQuant as a new physical encoding of the `Vector` extension
72-
type. That model breaks down on three concrete questions:
70+
**Vortex needs a clean answer to "where does lossy data live?"** A natural
71+
starting point would be to model TurboQuant as a new physical encoding of the
72+
`Vector` extension type — encoding-as-compression is how lossless schemes
73+
already work. That model breaks down on three concrete questions:
7374

7475
- Are quantized vectors unit-normalized? After scalar quantization on a rotated
7576
unit vector, the inverse transform does not generally recover a unit vector.
@@ -136,6 +137,12 @@ and the result is named `Vector`, not "the original floats." `TQDecode(TQEncode(
136137
returns values close to `v` (within MSE bounds), not `v` itself, and that
137138
distinction is now visible in the operator chain.
138139

140+
`TQDecode` is a regular scalar function — analogous to a numeric cast or a
141+
parser — not a privileged canonicalization mechanism. It happens to be the
142+
_only_ path back to `Vector`, but the routing is no different from any
143+
other function call: the user writes it, the planner sees it, the executor
144+
runs it.
145+
139146
### Implications for the three-stage plan
140147

141148
Both principles hold across every stage of the long-term plan:
@@ -438,28 +445,28 @@ shares. Stages 1, 2, and 3 in §6–§8 below extend or refine specific aspects;
438445
nothing in this section ever goes away.
439446

440447
```text
441-
┌────────────────────────────────────────────────────────┐
442-
│ User-facing API surface │
443-
│ │
444-
│ Vector<F, d> ──TQEncode──▶ Extension<TurboQuant> │
445-
│ ◀──TQDecode── │
446-
└────────────────────────────────────────────────────────┘
448+
┌────────────────────────────────────────────────────────
449+
│ User-facing API surface
450+
451+
│ Vector<F, d> ──TQEncode──▶ Extension<TurboQuant>
452+
│ ◀──TQDecode──
453+
└────────────────────────────────────────────────────────
447454
448455
│ ExtVTable
449456
450-
┌────────────────────────────────────────────────────────┐
451-
│ Extension<TurboQuant> │
452-
│ metadata (prost): {element_ptype, dimensions, │
453-
│ bit_width, seed, num_rounds, [block_size]} │
454-
│ │
455-
│ storage: Struct { │
456-
│ norms: Primitive<F> (Stage 1) │
457+
┌────────────────────────────────────────────────────────
458+
│ Extension<TurboQuant>
459+
│ metadata (prost): {element_ptype, dimensions,
460+
│ bit_width, seed, num_rounds, [block_size]}
461+
462+
│ storage: Struct {
463+
│ norms: Primitive<F> (Stage 1) │
457464
│ | FSL<F, num_blocks> (Stage 2 k>1)│
458-
│ codes: FSL<u8, padded_dim> │
459-
│ | FSL<u8, num_blocks*block_size> │
460-
│ | PDXArray<u8, ...> (Stage 3) │
461-
│ } │
462-
└────────────────────────────────────────────────────────┘
465+
│ codes: FSL<u8, padded_dim>
466+
│ | FSL<u8, num_blocks*block_size>
467+
│ | PDXArray<u8, ...> (Stage 3)
468+
│ }
469+
└────────────────────────────────────────────────────────
463470
▲ ▲
464471
│ │
465472
│ derived (not stored) │ derived (not stored)
@@ -664,9 +671,11 @@ as a Stage 1 refinement:
664671
(after Max-Lloyd converges) using EDEN's optimization criterion. The
665672
implementer should consult EDEN [15] for the precise criterion — the note
666673
[14] defers to "methods described in the EDEN works" and does not
667-
reproduce the algorithm itself. Reference implementation:
668-
https://github.com/amitport/EDEN-Distributed-Mean-Estimation (MIT;
669-
PyTorch and TensorFlow).
674+
reproduce the algorithm itself. The authors' official reproduction lives
675+
at https://github.com/amitport/EDEN-Distributed-Mean-Estimation (PyTorch
676+
and TensorFlow); note that this repository has **no LICENSE file**, so
677+
it is reference reading only — Vortex's implementation must be
678+
clean-room from the EDEN paper.
670679
- Cache `(centroids, S)` together under the existing `(padded_dim,
671680
bit_width)` key in the `DashMap`.
672681
- Apply `S` at quantization time (encode: scale `r * S` before
@@ -695,13 +704,28 @@ and selecting at decode time. We recommend (a) — see §13 "Migration."
695704
diminishing returns vs. lossless schemes.
696705

697706
```rust
707+
// vortex-turboquant/src/config.rs, ff120401. Fields are private; access via
708+
// accessors (bit_width(), seed(), num_rounds()). All three are required at
709+
// construction — Stage 1 does not currently expose a default-config helper.
698710
pub struct TurboQuantConfig {
699-
pub bit_width: u8, // 1..=8
700-
pub seed: Option<u64>, // default 42 (or session-configurable)
701-
pub num_rounds: u8, // default 3
711+
bit_width: u8, // 1..=8 (validated in try_new)
712+
seed: u64, // SORF seed
713+
num_rounds: u8, // > 0 (validated in try_new)
714+
}
715+
716+
impl TurboQuantConfig {
717+
pub fn try_new(bit_width: u8, seed: u64, num_rounds: u8) -> VortexResult<Self>;
718+
pub fn bit_width(&self) -> u8;
719+
pub fn seed(&self) -> u64;
720+
pub fn num_rounds(&self) -> u8;
702721
}
703722
```
704723

724+
Stage 1 stabilization may add a `Default` impl with the recommended values
725+
(bit_width 8, num_rounds 3, a session-derived seed). The seed is not
726+
currently optional — callers pass a concrete `u64` and the session is
727+
responsible for choosing one.
728+
705729
### Power-of-2 padding
706730

707731
SORF requires power-of-2 input dimension. Non-power-of-2 dimensions are
@@ -1581,9 +1605,11 @@ a prost message; new fields are added as `optional` so that older readers
15811605
treat them as their default value.
15821606

15831607
Stage 1 → Stage 2: add `block_size: Option<u32>`. Stage 1 writers leave it
1584-
as `None`; Stage 1 readers see `None` and treat the array as a single
1585-
padded block. Stage 2 writers populate it when k > 1; Stage 2 readers
1586-
honor it.
1608+
unset; Stage 1 readers see unset and treat the array as a single padded
1609+
block. Stage 2 writers populate it **only when k > 1**; for k = 1
1610+
(power-of-2 dimensions) Stage 2 writers also leave it unset, producing a
1611+
wire-format-identical artifact to Stage 1 at the same dimension. Stage 2
1612+
readers accept both forms but writers converge on this rule.
15871613

15881614
Stage 2 → Stage 3: no metadata change. The codes child's physical encoding
15891615
shifts from `FixedSizeListArray` to `PDXArray`; readers detect via
@@ -1699,7 +1725,7 @@ The ICLR 2026 camera-ready proceedings may use different numbering._
16991725

17001726
[1] Zandieh, A., Daliri, M., Hadian, M. and Mirrokni, V. "TurboQuant: Online
17011727
Vector Quantization with Near-optimal Distortion Rate." ICLR 2026.
1702-
arXiv:2504.19874, April 2025.
1728+
arXiv:2504.19874v1, April 2025.
17031729

17041730
[2] Ailon, N. and Chazelle, B. "The Fast Johnson-Lindenstrauss Transform and
17051731
Approximate Nearest Neighbors." SIAM J. Comput. 39(1):302-322, 2009.
@@ -1901,45 +1927,57 @@ intentionally leaves open.
19011927

19021928
### D.2 Extension type metadata (prost)
19031929

1904-
The on-disk metadata is the prost-encoded form of:
1930+
**Current schema** at `vortex-turboquant/src/vtable.rs` (ff120401):
19051931

19061932
```rust
1907-
// Reproduced from vortex-turboquant/src/vtable.rs at ff120401.
1933+
#[derive(Clone, PartialEq, Message)]
19081934
struct TurboQuantMetadataProto {
1909-
element_ptype: PType, // tag 1, enum
1910-
dimensions: u32, // tag 2
1911-
bit_width: u32, // tag 3 (fits in u8 at the type level)
1912-
seed: u64, // tag 4
1913-
num_rounds: u32, // tag 5 (fits in u8 at the type level)
1914-
// Stage 2 addition (not in current source):
1915-
block_size: optional u32, // tag 6
1916-
// Future:
1917-
unbiased: optional bool, // tag 7 — reserve for EDEN unbiased mode
1935+
#[prost(enumeration = "PType", tag = "1")]
1936+
element_ptype: i32,
1937+
#[prost(uint32, tag = "2")]
1938+
dimensions: u32,
1939+
#[prost(uint32, tag = "3")]
1940+
bit_width: u32, // u8 at the type level
1941+
#[prost(uint64, tag = "4")]
1942+
seed: u64,
1943+
#[prost(uint32, tag = "5")]
1944+
num_rounds: u32, // u8 at the type level
19181945
}
19191946
```
19201947

1921-
The current `TurboQuantMetadataProto` in `vortex-turboquant/src/vtable.rs`
1922-
defines tags 1–5. Stage 2 adds `block_size` at tag 6; future stages
1923-
add additional optional tags as needed. Prost optional-field semantics
1924-
mean older readers ignore unknown tags, so the wire format is forward-
1925-
compatible by construction.
1948+
**This RFC proposes** adding the following tags, in this order, as the
1949+
corresponding stages land:
1950+
1951+
- **Tag 6 — `block_size: Option<u32>`** (Stage 2). Stage 1 writers leave
1952+
this unset; readers treat unset as "single padded block." Stage 2
1953+
writers set it when k > 1; for k = 1 (power-of-2 dimensions) writers
1954+
also leave it unset, preserving bit-identical wire format with Stage 1
1955+
at d = 1024 (see Open Questions resolution #4 in §16).
1956+
- **Tag 7 — reserved** for a future `unbiased: Option<bool>` flag when (if)
1957+
EDEN's native b-bit unbiased mode is added (§15). Reserving the tag now
1958+
keeps the field-add policy clean.
1959+
1960+
Prost optional-field semantics mean older readers ignore unknown tags, so
1961+
the schema is forward-compatible by construction.
19261962

19271963
**Constraint:** tags 1–5 are part of Vortex's stable contract once Stage 1
19281964
ships in a release. Renumbering or repurposing them would be a wire-format
1929-
break.
1965+
break. Tag 6's semantics are locked once Stage 2 ships.
19301966

19311967
### D.3 Validation
19321968

1933-
The `ExtVTable::validate_dtype` implementation enforces:
1934-
1935-
- `dimensions >= MIN_DIMENSION` (`= 128`). On violation: error
1936-
`"TurboQuant dimensions must be >= 128, got {N}"`.
1937-
- `1 ≤ bit_width ≤ MAX_BIT_WIDTH` (`= 8`). On violation: error
1938-
`"TurboQuant bit_width must be 1-8, got {N}"`.
1939-
- `num_rounds > 0`. On violation: error
1940-
`"TurboQuant num_rounds must be > 0, got 0"`.
1941-
- `element_ptype.is_float()` (one of F16, F32, F64). On violation: error
1942-
`"TurboQuant element_ptype must be a float, got {ptype}"`.
1969+
The `ExtVTable::validate_dtype` implementation enforces (error templates
1970+
shown with `vortex_ensure!` placeholder substitution, where `{var}` is
1971+
filled with the offending value at runtime):
1972+
1973+
- `dimensions >= MIN_DIMENSION` (`= 128`). On violation:
1974+
`"TurboQuant dimensions must be >= 128, got {dimensions}"`.
1975+
- `1 ≤ bit_width ≤ MAX_BIT_WIDTH` (`= 8`). On violation:
1976+
`"TurboQuant bit_width must be 1-8, got {bit_width}"`.
1977+
- `num_rounds > 0`. On violation:
1978+
`"TurboQuant num_rounds must be > 0, got {num_rounds}"`.
1979+
- `element_ptype.is_float()` (one of F16, F32, F64). On violation:
1980+
`"TurboQuant element_ptype must be a float, got {element_ptype}"`.
19431981
- Storage dtype is `Struct { norms: Primitive<element_ptype>, codes:
19441982
FixedSizeList<u8, padded_dim> }` with matching row-validity propagation.
19451983

@@ -1954,9 +1992,9 @@ fits in `u32` (i.e., `2^31`).
19541992

19551993
```text
19561994
fn tq_encode(v: Vector<F, d>, cfg: &TurboQuantConfig) -> TurboQuantArray:
1957-
padded_dim = next_power_of_two(d)
1958-
if padded_dim does not fit u32:
1959-
return Err(OverflowError)
1995+
if d > MAX_DIMENSION: # MAX_DIMENSION = 2^31
1996+
return Err(OverflowError) # next_power_of_two would overflow
1997+
padded_dim = next_power_of_two(d) # checked: ≤ 2^31
19601998
19611999
n = ‖v‖₂ # in input dtype F (f16/f32/f64)
19622000
if n > 0:
@@ -1985,6 +2023,11 @@ fn tq_encode(v: Vector<F, d>, cfg: &TurboQuantConfig) -> TurboQuantArray:
19852023
`centroids = get_centroids(padded_dim, bit_width)` and `S = get_eden_scale(padded_dim, bit_width)`
19862024
both come from the process-local cache keyed on `(padded_dim, bit_width)`.
19872025

2026+
**Licensing note.** The EDEN-`S` derivation must be implemented clean-room
2027+
from the EDEN paper [15]. The authors' official reproduction at
2028+
https://github.com/amitport/EDEN-Distributed-Mean-Estimation has no
2029+
LICENSE file and therefore cannot be copied or adapted into Vortex.
2030+
19882031
### D.5 Decode (Stage 1) pseudocode
19892032

19902033
```text
@@ -2162,22 +2205,39 @@ Stages 2 and 3 add:
21622205

21632206
### D.11 Performance budgets
21642207

2165-
Goals (from §11) become budgets here, with verification commands:
2166-
2167-
- **Encode throughput, Stage 1, AVX-512, d = 768, b = 8**: ≥ 1 M vectors/sec.
2168-
Verify with `cargo run -p vortex-turboquant --release --bench encode_decode`.
2169-
- **Decode throughput, Stage 1, AVX-512, d = 768, b = 8**: ≥ 1 M vectors/sec.
2170-
- **Encode throughput, Stage 2, AVX-512, d = 768, k = 3, B = 256, b = 8**:
2171-
≥ 1.3 M vectors/sec (≥ 30% faster than Stage 1 padded, matching the FLOP
2172-
ratio in §11).
2173-
- **Compression ratio, b = 8**: 3.0× (Stage 1 padded at d = 768), 3.9×
2174-
(Stage 2 k = 3 at d = 768), 4.0× (any stage at d = 1024).
2175-
- **Normalized MSE, b = 8, d ≥ 128**: < 5e-5 on Gaussian inputs.
2176-
- **Stage 3 PDX scan throughput, AVX-512, b = 4, d = 768**: ≥ 1.5×
2177-
Stage 2's row-major kernel throughput on 1 M-row scan.
2178-
2179-
These are starting budgets; the Experimental plan (§12) refines them with
2180-
real workloads.
2208+
The goals in §11 become measurable budgets once Stage 1 stabilization has
2209+
baseline numbers. Concrete budgets are deliberately **not yet pinned** in
2210+
this RFC — pinning them with fabricated round numbers would mislead the
2211+
implementer about what "on track" means. The plan is:
2212+
2213+
1. **Baseline.** Run the existing benchmark
2214+
`cargo bench -p vortex-turboquant --bench encode_decode --release` on a
2215+
reference machine (e.g., AVX-512 Sapphire Rapids, ARM Neoverse-V2) at
2216+
d ∈ {768, 1024, 1536} and b ∈ {4, 8}, recording encode and decode
2217+
throughput in vectors/sec.
2218+
2. **Pin the baselines as budgets** in this appendix, with the machine
2219+
and command used to produce them.
2220+
3. **Stage 2 and Stage 3 budgets** then become "X% of Stage 1 padded" with
2221+
`X` chosen against the FLOP-ratio analysis in §11 (Stage 2 should be
2222+
~40% faster than padded Stage 1 at d=768; Stage 3 PDX should be 1.5–2×
2223+
Stage 2 row-major per [4, Table 4]).
2224+
2225+
Until those baselines land, the only budget the RFC commits to is the
2226+
**compression ratio** (which is determined by the wire format and is
2227+
exact, not measured):
2228+
2229+
| Configuration | Per-vec bits (f32 input) | Ratio |
2230+
| -------------------------------- | ------------------------ | ----- |
2231+
| Stage 1 padded, d=768, b=8 | 8,224 | 3.0× |
2232+
| Stage 2 (k=3, B=256), d=768, b=8 | 6,240 | 3.9× |
2233+
| Stage 1 or 2 (k=1), d=1024, b=8 | 8,224 | 4.0× |
2234+
2235+
And the **normalized MSE bound** (which is from Theorem 1 — see §4 and
2236+
Appendix A): `≤ 2.72 / 4^b` per vector in expectation on Haar-random
2237+
rotations; the SORF approximation observes ~20% slack over this bound at
2238+
d=1024 in `vortex-turboquant`'s test suite, so the operational assertion
2239+
is `normalized_mse_per_vector ≤ 3.3 / 4^b` until tighter empirical bounds
2240+
land.
21812241

21822242
### D.12 Registry / dispatch wiring
21832243

@@ -2245,13 +2305,7 @@ work, not blockers on the design:
22452305
exact mixing function is a Stage 2 implementation detail that should be
22462306
pinned before the first Stage 2 file is written; the wire format
22472307
becomes load-bearing once any file is shipped.
2248-
4. **Wire-format identity at d=1024 across Stage 1 → Stage 2 writers.** When
2249-
a Stage 2 writer emits a power-of-2-dimension TurboQuant array (so
2250-
k = 1), should it write `block_size = None` (matching Stage 1 readers'
2251-
default exactly) or `block_size = Some(padded_dim)`? Stage 2 readers
2252-
accept both; Stage 1 readers only accept `None`. Writers must converge
2253-
to one or the other to preserve cross-version write/read interop.
2254-
5. **Whether to lower `MIN_DIMENSION` after Stage 1 experimental
2308+
4. **Whether to lower `MIN_DIMENSION` after Stage 1 experimental
22552309
validation.** If the experimental plan supports lowering to 64–96, the
22562310
change is a wire-format break in the sense that files written at
22572311
d < 128 by a new writer would be rejected by an old reader. Surface
@@ -2281,3 +2335,11 @@ re-litigate.
22812335
Stage 1 cleanup task in §14 "Current state and known gaps." Removal
22822336
follows once `vortex/examples/turboquant_vector_search.rs` migrates to
22832337
the new `TQEncode` / `TQDecode` path.
2338+
- **Wire-format identity at d = 1024 across Stage 1 → Stage 2 writers**
2339+
resolved: Stage 2 writers leave `block_size` unset whenever k = 1
2340+
(i.e., for any power-of-2 dimension that doesn't require decomposition).
2341+
This produces a wire-format-identical artifact to a Stage 1 writer at
2342+
d = 1024 and remains readable by Stage 1 readers. Stage 2 readers accept
2343+
both `unset` (treat as single padded block) and `Some(padded_dim)` (an
2344+
encoder bug to be flagged in validation), but writers must converge on
2345+
`unset`. See §13 "Migration and compatibility."

0 commit comments

Comments
 (0)