Skip to content

Commit fc8f347

Browse files
lwwmanningclaude
andcommitted
[RFC 33] Rewrite around extension-type model and EDEN prior art
Substantial rewrite of RFC 33 to reflect the architecture that landed in vortex-turboquant (PR #7829) and the design principles recorded in tracking issue vortex-data/vortex#7830: - TurboQuant is a logical extension type (vortex.turboquant), not a physical encoding of Vector. Storage is a 2-field Struct {norms, codes} with all numeric metadata in the prost-serialized extension dtype. - Decompression is explicit via TQDecode; there is no canonicalization back to Vector. The default compressor passes TurboQuant arrays through opaquely. - vortex-turboquant lives outside the main vortex dependency tree; opt-in via vortex_turboquant::initialize. Stages 2 (block decomposition) and 3 (PDX physical layout) are re-derived through the extension-type lens and kept in scope as the long-term plan, not demoted to future work. Also incorporates the EDEN paper (arXiv:2604.18555, ICML 2022) as prior art the original RFC missed: EDEN predates TurboQuant, uses the same RHT + Lloyd-Max scalar quantizer family, and demonstrates that TurboQuant is a suboptimal special case. We adopt EDEN's optimized scalar scale S as a Stage 1 drop-in refinement (no storage/metadata change) and recommend EDEN's native b-bit unbiased mode over TurboQuant's MSE+QJL stacking for any future unbiased path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5cf675e commit fc8f347

1 file changed

Lines changed: 1724 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)