Add `Lossy` extension type by connortsui20 · Pull Request #7755 · vortex-data/vortex

connortsui20 · 2026-05-01T16:51:18Z

Summary

Tracking issue: #7754

Note that right now, this is just a prototype built by claude, the end product might look very different.

API Changes

TODO

Testing

TODO

- New `Lossy<X>` ext type at vortex-array/src/extension/lossy.rs - Float-only recursive validation; rejects bool, struct, non-float prim, nested Lossy - ExtVTable::can_be_lossy default-false hook for ext types to opt in - DType::peel_lossy and ExtensionArray::peel_lossy helpers - Vector::can_be_lossy = true to enable Lossy<Vector<f32>> Foundation chunk; compressor/kernel/turboquant changes are separate chunks. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Tensor scalar-fn kernels (inner_product, l2_norm, l2_denorm, sorf_transform) and the AnyTensor matcher previously called metadata_opt::<AnyTensor> / metadata_opt::<AnyVector> directly on the outer ExtDTypeRef, which would silently fail to match when the dtype was Lossy<Vector<f32>>. Peel one Lossy layer before metadata lookup, and at runtime peel the outer Lossy ExtensionArray so the kernel uniformly operates on the inner Vector / FixedShapeTensor storage. - AnyTensor::try_match peels AnyLossy and re-dispatches on the inner ext - validate_tensor_float_input peels at the dtype level - inner_product / l2_norm / l2_denorm executors peel both the dtype and (where they materialize the storage FSL) the ExtensionArray itself - l2_norm constant fast path peels nested extension scalars to the list - SorfTransform::return_dtype peels at the dtype level - New utils helper peel_lossy_extension_array unifies the array-side peel - Tests assert inner_product and l2_norm produce identical results on Lossy<Vector<f64>> as on the underlying Vector<f64> arrays Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

- Add CompressorContext.lossy_allowed (default false) and with_lossy_allowed builder - Add Scheme::is_lossy default-false trait method - Compressor peels Lossy<X> at the Extension arm, recurses with lossy_allowed=true, re-wraps result with original Lossy<X> dtype - Candidate scheme filter excludes is_lossy() schemes when !lossy_allowed - Remove the two AnyScalarFn size-check HACKs in compressor.rs; the lossy branch is now the consent, so size-bypass is no longer needed Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…or<f32>> - TurboQuantScheme::is_lossy() = true so the cascading compressor's lossy filter applies. - Add TurboQuantScheme to ALL_SCHEMES so it is eligible by default; gating now happens at the dtype level via the user's Lossy<X> consent. - Remove the with_turboquant() builder method; vortex-btrblocks's unstable_encodings feature no longer pulls in vortex-tensor as an optional dep (vortex-tensor is now a hard dep). - Register vortex_array::extension::Lossy in the default DTypeSession so Lossy-wrapped columns can be deserialized out of vortex-file. - E2E test in vortex-file/tests: Lossy<Vector<f32>> triggers TurboQuant on the default builder, file round-trip preserves the Lossy dtype, and decompressed values fall within a per-vector normalized MSE bound of 0.05; bare Vector<f32> does not trigger TurboQuant. - Update the turboquant_vector_search example and vector-search-bench to wrap the embedding column (and the cosine-similarity query literal) in Lossy<Vector<f32>> so TurboQuant fires under the default builder. Closes the lossy-storage prototype with end-to-end coverage. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

codspeed-hq · 2026-05-01T16:58:47Z

Merging this PR will degrade performance by 25.03%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 3 regressed benchmarks
✅ 1165 untouched benchmarks
⏩ 138 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`varbinview_zip_fragmented_mask`	6.5 ms	7.3 ms	-10.28%
❌	Simulation	`varbinview_zip_block_mask`	2.9 ms	3.7 ms	-21.64%
❌	Simulation	`new_bp_prim_test_between[i64, 32768]`	177.3 µs	236.5 µs	-25.03%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	275.3 ns	246.1 ns	+11.85%

_{Comparing ct/lossy (d1edd41) with develop (5e5572b)}

138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

claude added 4 commits May 1, 2026 12:49

connortsui20 added the changelog/feature A new feature label May 1, 2026

connortsui20 closed this May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Lossy` extension type#7755

Add `Lossy` extension type#7755
connortsui20 wants to merge 4 commits intodevelopfrom
ct/lossy

connortsui20 commented May 1, 2026

Uh oh!

codspeed-hq Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

connortsui20 commented May 1, 2026

Summary

API Changes

Testing

Uh oh!

codspeed-hq Bot commented May 1, 2026

Merging this PR will degrade performance by 25.03%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants