[claude] fastlanes: allow signed integers in Delta encoding#7918
Closed
joseph-isaacs wants to merge 5 commits into
Closed
[claude] fastlanes: allow signed integers in Delta encoding#7918joseph-isaacs wants to merge 5 commits into
joseph-isaacs wants to merge 5 commits into
Conversation
Lifts the `is_unsigned_int` gate on `DeltaArray` so `i8` / `i16` / `i32` / `i64` columns can be delta-encoded. The upstream FastLanes kernels (`Delta::delta`, `Transpose::transpose`) are bounded on `T: FastLanes: Unsigned`, so signed inputs are processed by reinterpret-casting the underlying buffer to the same-width unsigned counterpart, running the existing kernel, and reinterpret-casting back. `wrapping_sub`/`wrapping_add` are bit-identical for signed and unsigned operands under two's-complement, so the round-trip is exact. Note that the encoded delta bytes for inputs that cross zero have the high bits set (e.g. delta `-1i8` = `0xFF`); naively bit-packing those would force the bit width to `T`. A follow-up should compose `Delta` with `FoR` so the deltas are stored as `value - min(delta)` before bit-packing. See encodings/fastlanes/src/delta/FUSED_DECODE.md for a design note on a fused triple-kernel (unpack + add-reference + undelta) that addresses the decode bandwidth. Also guards `Delta::cast` against signed sources: value-preserving casts of signed deltas (e.g. `-1i8` -> `4294967295u32`) break the wrapping-add invariant during decompression, so signed sources fall back to the decompress-and-reencode path. Signed-off-by: Claude <noreply@anthropic.com>
Measures the encoded byte budget under three bit-packing strategies for four representative signed `i32` shapes (monotone, sensor-like wobble around zero, large-negative offset, near-monotone with backtracks): | Workload | range | Wnaive | Wffor | Wzz | ratio | |-----------------------------------|----------------|-------:|------:|----:|--------:| | monotone i32 (0..N) | [0, 1] | 1 | 1 | 2 | 15.97x | | sensor i32 in [-100, 100] | [-196, 199] | 32 | 9 | 9 | 3.20x | | offset i32 base=-1e9 | [0, 1] | 1 | 1 | 2 | 15.97x | | near-monotone i32 (5% backtrack) | [-2, 1] | 32 | 2 | 3 | 10.65x | The "naive" column is the OR-mask of the raw delta bit-patterns: a single negative delta sets every high bit and forces `W = T`, which is why the two workloads with negative deltas (`sensor`, `near-monotone`) blow up to 32 bits. FFoR brings them to 9 and 2 bits. ZigZag matches FFoR only on the symmetric `sensor` workload and loses on every asymmetric column. Asserts that FFoR never exceeds naive, drops below `T` whenever a negative delta is present, and beats ZigZag on the asymmetric workloads. Run with `--nocapture` to see the table. Signed-off-by: Claude <noreply@anthropic.com>
Extends the synthetic workload report with two extra columns: bases byte size and the FFoR bit-width those bases would pack to. For 8K-element i32 inputs the bases buffer is ~50% of the FFoR total on monotone-like columns, and the bases sequence inherits the smoothness of the input, so recursively packing the bases with FoR gives a further ~1.4x on top of FFoR(deltas): workload FFoR (B) ratio bases (B) Wb +bcomp ratio monotone i32 (0..N) 2052 15.97x 1024 13 1448 22.63x sensor i32 in [-100, 100] 10244 3.20x 1024 8 9480 3.46x offset i32 base=-1e9 2052 15.97x 1024 13 1448 22.63x near-monotone i32 (5% backtrack) 3076 10.65x 1024 13 2472 13.26x This is already structurally enabled: the bases child is an `ArrayRef`, and the btrblocks compressor at vortex-btrblocks/src/schemes/integer.rs:917 already routes bases through `compress_child` so the cascading compressor picks whatever encoding fits (typically FoR + BitPacked). Signed-off-by: Claude <noreply@anthropic.com>
REUSE compliance — markdown files outside the patterns in REUSE.toml need inline SPDX comments. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will degrade performance by 32.11%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | decompress_rd[f32, (100000, 0.01)] |
413.1 µs | 585.8 µs | -29.48% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.01)] |
668.8 µs | 1,023.4 µs | -34.65% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.1)] |
413.2 µs | 585.8 µs | -29.47% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.1)] |
668.8 µs | 1,023.4 µs | -34.65% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/vortex-delta-negative-values-yQh1m (48d5602) with develop (da19bca)
Pre-merge polish across the three things a reviewer would notice: * DeltaArray docstring: add a signed `i32` example next to the unsigned one so users see signed support is first-class. Verified by doctest. * Conformance: extend `test_delta_consistency` and `test_delta_binary_numeric` with i32 / i64 / i8 cases (crossing zero, all-negative, single-negative). These run the array-trait conformance harness, so any operation that's silently broken for signed inputs surfaces here. * cast.rs: expand the comment justifying why signed sources fall back to decompress-and-re-encode (the wrapping-add invariant breaks under value-preserving widening; the same hazard applies to cross-signedness). * synthetic_workload_compression table: rename duplicate "ratio" columns to `FFoR x` / `+bcomp x` so the report is unambiguous. 256 -> 263 tests, all pass. Clippy clean. Fmt clean. Signed-off-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lifts the
is_unsigned_intgate onDeltaArraysoi8/i16/i32/i64columns can be delta-encoded.The upstream FastLanes
Delta::delta/Transpose::transposekernels are bounded onT: FastLanes: Unsigned, so signed inputs are processed by reinterpret-casting the underlying buffer to the same-width unsigned counterpart, running the existing kernel, then reinterpret-casting back.wrapping_sub/wrapping_addare bit-identical for signed and unsigned operands under two's-complement, so the round-trip is exact.Also:
Delta::castnow bails on signed sources. Value-preserving casts of signed deltas (e.g.-1i8 → 4294967295u32) break the wrapping-add invariant during decompression — the slow decompress-and-re-encode path handles those.encodings/fastlanes/src/delta/FUSED_DECODE.mddescribing a triple-fusedunpack + add-reference + undeltakernel for future work. Today decoding aDelta(FoR(BitPacked))makes three passes over the buffer; a fused kernel cuts that to one.Measured impact
Synthetic workloads (32 KiB raw,
i32, 8 × 1024 elements each):Wnaive = 32on the two workloads with negative deltas confirms the bit-packing collapse on raw signed deltas (any negative two's-complement delta sets the high bits and the OR mask forcesW = T). FFoR brings them to 9 and 2 bits.Test plan
cargo build -p vortex-fastlanes(clean)cargo nextest run -p vortex-fastlanes— 256/256 pass, including 7 new signedrstestcases (i8_full_range,i32_crossing_zero,i32_all_negative,i16_crossing_zero,i64_large_negative,nullable_i32_crossing,i32_non_negative)cargo clippy -p vortex-fastlanes --all-targets --all-features— no warningscargo fmt --all -- --check— clean./scripts/public-api.sh— not run locally (no public API surface added;unsigned_counterpartispub(crate)). Worth verifying in CI.Generated by Claude Code