You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Continuation of #7167,
authored by @lwwmanning
## Summary
Lossy quantization for vector data (e.g., embeddings) based on
TurboQuant (https://arxiv.org/abs/2504.19874). Implements the MSE-only
variant (Stage 1 of RFC 0033) at 1-8 bits per coordinate (0 for empty
arrays), defaulting to 8-bit near-lossless compression (but still a
small amount because we use SRHT instead of random orthogonal rotation
matrix, something about not satisfying the Haar assumption?).
Key components:
- TurboQuant array encoding with 4 slots: quantized codes, norms,
centroids, and rotation signs. Note that we should probably abstract the
codes and centroids as a dictionary encoded thing so we don't have to
duplicate pushdown rules, and we might want to make a matrix
multiplication `ScalarFn` expression for the rotations.
- Structured Random Hadamard Transform (SRHT) for `O(d log d)` rotation,
fully self-contained with no external linear algebra library. This is
what claude came up with, but we can see in testing that while this is
practical and more efficient, we lose some of the assumptions that a
Haar-random orthogonal matrix gives us. I think this is something we can
play around with because it's abstracted into a discrete step of the
algorithm.
- Max-Lloyd centroid computation on the Beta distribution for the given
dimension.
- Approximate cosine similarity and dot product computed directly on
quantized arrays without full decompression.
- Pluggable `TurboQuantScheme` for the cascading compressor.
- Minimum dimension of 128 (`TurboQuant::MIN_DIMENSION`) for SRHT
quality guarantees.
- Default 8-bit encoding (MSE ~4e-5, exact 4x compression on `f32`).
- Adds `vortex_tensor::initialize()` for session registration of tensor
types, encodings, and scalar functions.
## API Changes
- Adds `TurboQuant` encoding in `vortex-tensor` with
`turboquant_encode()` and `TurboQuantConfig`, and new types
`TurboQuantData` and `TurboQuantArray`.
- Adds `TurboQuantScheme` for compressor integration.
- Adds `TurboQuant::MIN_DIMENSION` (128) constant.
- Adds `float_from_f32<T: Float + FromPrimitive>` shared helper for
infallible f32-to-float conversion.
## Testing (claude-generated)
- Roundtrip tests across bit widths (1-8) and dimensions (128, 256, 768,
1024).
- MSE quality bounds verified against the theoretical bound from Theorem
1.
- Edge cases: empty arrays, single-row arrays, all-zero vectors,
dimension rejection below 128.
- Float type coverage: f16, f32, f64 input encoding and roundtrip.
- Nullable vector support: validity propagation through encode, decode,
slice, take, L2 norm readthrough.
- Quantized-domain cosine similarity and dot product accuracy tests.
- Serde roundtrip (serialize/deserialize).
- Compute pushdown tests: slice, take, scalar_at.
- Compression ratio estimates for typical embedding dimensions.
- Centroid correctness: count, sorted, symmetric, within bounds,
caching, boundary rejection.
- SRHT rotation: determinism, roundtrip, norm preservation, sign
export/import roundtrip.
---------
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Co-authored-by: Will Manning <will@willmanning.io>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
impl vortex_array::array::vtable::operations::OperationsVTable<vortex_tensor::encodings::turboquant::TurboQuant> for vortex_tensor::encodings::turboquant::TurboQuant
impl vortex_array::array::vtable::validity::ValidityChild<vortex_tensor::encodings::turboquant::TurboQuant> for vortex_tensor::encodings::turboquant::TurboQuant
0 commit comments