Skip to content

[claude] fastlanes: allow signed integers in Delta encoding#7918

Closed
joseph-isaacs wants to merge 5 commits into
developfrom
claude/vortex-delta-negative-values-yQh1m
Closed

[claude] fastlanes: allow signed integers in Delta encoding#7918
joseph-isaacs wants to merge 5 commits into
developfrom
claude/vortex-delta-negative-values-yQh1m

Conversation

@joseph-isaacs
Copy link
Copy Markdown
Contributor

Summary

Lifts the is_unsigned_int gate on DeltaArray so i8 / i16 / i32 / i64 columns can be delta-encoded.

The upstream FastLanes Delta::delta / Transpose::transpose kernels are bounded on T: FastLanes: Unsigned, so signed inputs are processed by reinterpret-casting the underlying buffer to the same-width unsigned counterpart, running the existing kernel, then reinterpret-casting back. wrapping_sub / wrapping_add are bit-identical for signed and unsigned operands under two's-complement, so the round-trip is exact.

Also:

  • Delta::cast now bails on signed sources. Value-preserving casts of signed deltas (e.g. -1i8 → 4294967295u32) break the wrapping-add invariant during decompression — the slow decompress-and-re-encode path handles those.
  • Added encodings/fastlanes/src/delta/FUSED_DECODE.md describing a triple-fused unpack + add-reference + undelta kernel for future work. Today decoding a Delta(FoR(BitPacked)) makes three passes over the buffer; a fused kernel cuts that to one.

Measured impact

Synthetic workloads (32 KiB raw, i32, 8 × 1024 elements each):

Workload Δ range Wnaive Wffor ratio
monotone i32 (0..N) [0, 1] 1 1 15.97x
sensor i32 in [-100, 100] [-196, 199] 32 9 3.20x
offset i32 base=-1e9 [0, 1] 1 1 15.97x
near-monotone i32 (5% backtrack) [-2, 1] 32 2 10.65x

Wnaive = 32 on the two workloads with negative deltas confirms the bit-packing collapse on raw signed deltas (any negative two's-complement delta sets the high bits and the OR mask forces W = T). FFoR brings them to 9 and 2 bits.

Test plan

  • cargo build -p vortex-fastlanes (clean)
  • cargo nextest run -p vortex-fastlanes — 256/256 pass, including 7 new signed rstest cases (i8_full_range, i32_crossing_zero, i32_all_negative, i16_crossing_zero, i64_large_negative, nullable_i32_crossing, i32_non_negative)
  • cargo clippy -p vortex-fastlanes --all-targets --all-features — no warnings
  • cargo fmt --all -- --check — clean
  • ./scripts/public-api.sh — not run locally (no public API surface added; unsigned_counterpart is pub(crate)). Worth verifying in CI.

Generated by Claude Code

claude added 3 commits May 13, 2026 21:12
Lifts the `is_unsigned_int` gate on `DeltaArray` so `i8` / `i16` / `i32` / `i64`
columns can be delta-encoded. The upstream FastLanes kernels (`Delta::delta`,
`Transpose::transpose`) are bounded on `T: FastLanes: Unsigned`, so signed
inputs are processed by reinterpret-casting the underlying buffer to the
same-width unsigned counterpart, running the existing kernel, and
reinterpret-casting back. `wrapping_sub`/`wrapping_add` are bit-identical for
signed and unsigned operands under two's-complement, so the round-trip is
exact.

Note that the encoded delta bytes for inputs that cross zero have the high
bits set (e.g. delta `-1i8` = `0xFF`); naively bit-packing those would force
the bit width to `T`. A follow-up should compose `Delta` with `FoR` so the
deltas are stored as `value - min(delta)` before bit-packing. See
encodings/fastlanes/src/delta/FUSED_DECODE.md for a design note on a fused
triple-kernel (unpack + add-reference + undelta) that addresses the decode
bandwidth.

Also guards `Delta::cast` against signed sources: value-preserving casts of
signed deltas (e.g. `-1i8` -> `4294967295u32`) break the wrapping-add
invariant during decompression, so signed sources fall back to the
decompress-and-reencode path.

Signed-off-by: Claude <noreply@anthropic.com>
Measures the encoded byte budget under three bit-packing strategies for four
representative signed `i32` shapes (monotone, sensor-like wobble around zero,
large-negative offset, near-monotone with backtracks):

| Workload                          | range          | Wnaive | Wffor | Wzz | ratio   |
|-----------------------------------|----------------|-------:|------:|----:|--------:|
| monotone i32 (0..N)               | [0, 1]         |      1 |     1 |   2 | 15.97x  |
| sensor i32 in [-100, 100]         | [-196, 199]    |     32 |     9 |   9 |  3.20x  |
| offset i32 base=-1e9              | [0, 1]         |      1 |     1 |   2 | 15.97x  |
| near-monotone i32 (5% backtrack)  | [-2, 1]        |     32 |     2 |   3 | 10.65x  |

The "naive" column is the OR-mask of the raw delta bit-patterns: a single
negative delta sets every high bit and forces `W = T`, which is why the two
workloads with negative deltas (`sensor`, `near-monotone`) blow up to 32 bits.
FFoR brings them to 9 and 2 bits. ZigZag matches FFoR only on the symmetric
`sensor` workload and loses on every asymmetric column.

Asserts that FFoR never exceeds naive, drops below `T` whenever a negative
delta is present, and beats ZigZag on the asymmetric workloads. Run with
`--nocapture` to see the table.

Signed-off-by: Claude <noreply@anthropic.com>
Extends the synthetic workload report with two extra columns: bases byte size
and the FFoR bit-width those bases would pack to. For 8K-element i32 inputs
the bases buffer is ~50% of the FFoR total on monotone-like columns, and the
bases sequence inherits the smoothness of the input, so recursively packing
the bases with FoR gives a further ~1.4x on top of FFoR(deltas):

  workload                              FFoR (B)  ratio   bases (B) Wb +bcomp ratio
  monotone i32 (0..N)                       2052  15.97x       1024  13   1448 22.63x
  sensor i32 in [-100, 100]                10244   3.20x       1024   8   9480  3.46x
  offset i32 base=-1e9                      2052  15.97x       1024  13   1448 22.63x
  near-monotone i32 (5% backtrack)          3076  10.65x       1024  13   2472 13.26x

This is already structurally enabled: the bases child is an `ArrayRef`, and
the btrblocks compressor at vortex-btrblocks/src/schemes/integer.rs:917
already routes bases through `compress_child` so the cascading compressor
picks whatever encoding fits (typically FoR + BitPacked).

Signed-off-by: Claude <noreply@anthropic.com>
@joseph-isaacs joseph-isaacs added the changelog/feature A new feature label May 13, 2026 — with Claude
REUSE compliance — markdown files outside the patterns in REUSE.toml need
inline SPDX comments.

Signed-off-by: Claude <noreply@anthropic.com>
@joseph-isaacs joseph-isaacs added the do not merge Pull requests that are not intended to merge label May 13, 2026
@joseph-isaacs joseph-isaacs changed the title fastlanes: allow signed integers in Delta encoding [claude] fastlanes: allow signed integers in Delta encoding May 13, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 13, 2026

Merging this PR will degrade performance by 32.11%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

❌ 4 regressed benchmarks
✅ 1206 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation decompress_rd[f32, (100000, 0.01)] 413.1 µs 585.8 µs -29.48%
Simulation decompress_rd[f64, (100000, 0.01)] 668.8 µs 1,023.4 µs -34.65%
Simulation decompress_rd[f32, (100000, 0.1)] 413.2 µs 585.8 µs -29.47%
Simulation decompress_rd[f64, (100000, 0.1)] 668.8 µs 1,023.4 µs -34.65%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/vortex-delta-negative-values-yQh1m (48d5602) with develop (da19bca)

Open in CodSpeed

Pre-merge polish across the three things a reviewer would notice:

* DeltaArray docstring: add a signed `i32` example next to the unsigned one
  so users see signed support is first-class. Verified by doctest.
* Conformance: extend `test_delta_consistency` and `test_delta_binary_numeric`
  with i32 / i64 / i8 cases (crossing zero, all-negative, single-negative).
  These run the array-trait conformance harness, so any operation that's
  silently broken for signed inputs surfaces here.
* cast.rs: expand the comment justifying why signed sources fall back to
  decompress-and-re-encode (the wrapping-add invariant breaks under
  value-preserving widening; the same hazard applies to cross-signedness).
* synthetic_workload_compression table: rename duplicate "ratio" columns
  to `FFoR x` / `+bcomp x` so the report is unambiguous.

256 -> 263 tests, all pass. Clippy clean. Fmt clean.

Signed-off-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature do not merge Pull requests that are not intended to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants