Conversation
Merging this PR will degrade performance by 14.84%
Performance Changes
Comparing Footnotes
|
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.921x ➖ datafusion / vortex-file-compressed (0.921x ➖, 1↑ 0↓)
|
File Sizes: PolarSignals ProfilingFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.050x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.024x ➖, 0↑ 1↓)
datafusion / parquet (1.072x ➖, 1↑ 6↓)
datafusion / arrow (1.038x ➖, 1↑ 4↓)
duckdb / vortex-file-compressed (1.081x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.067x ➖, 0↑ 4↓)
duckdb / parquet (1.036x ➖, 3↑ 5↓)
duckdb / duckdb (1.003x ➖, 1↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (11 files changed, +0.0% overall, 2↑ 9↓)
Totals:
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.998x ➖, 1↑ 1↓)
datafusion / vortex-compact (0.971x ➖, 1↑ 0↓)
datafusion / parquet (0.985x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.998x ➖, 2↑ 1↓)
duckdb / vortex-compact (1.008x ➖, 0↑ 1↓)
duckdb / parquet (0.942x ➖, 3↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeFile Size Changes (2 files changed, +0.0% overall, 1↑ 1↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.893x ✅, 59↑ 0↓)
datafusion / vortex-compact (0.902x ➖, 48↑ 0↓)
datafusion / parquet (0.909x ➖, 39↑ 0↓)
duckdb / vortex-file-compressed (0.913x ➖, 32↑ 0↓)
duckdb / vortex-compact (0.925x ➖, 19↑ 0↓)
duckdb / parquet (0.927x ➖, 25↑ 0↓)
duckdb / duckdb (0.933x ➖, 19↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMEFile Size Changes (27 files changed, +0.5% overall, 9↑ 18↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.000x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.965x ➖, 2↑ 0↓)
datafusion / parquet (0.983x ➖, 0↑ 0↓)
datafusion / arrow (0.976x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.971x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.971x ➖, 0↑ 0↓)
duckdb / parquet (0.934x ➖, 2↑ 0↓)
duckdb / duckdb (0.999x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMEFile Size Changes (38 files changed, +0.0% overall, 14↑ 24↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.089x ➖, 1↑ 3↓)
datafusion / vortex-compact (1.052x ➖, 0↑ 0↓)
datafusion / parquet (0.914x ➖, 4↑ 2↓)
duckdb / vortex-file-compressed (1.006x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.082x ➖, 0↑ 1↓)
duckdb / parquet (1.011x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.046x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.086x ➖, 0↑ 2↓)
datafusion / parquet (1.061x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.967x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.996x ➖, 0↑ 0↓)
duckdb / parquet (1.023x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.053x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.053x ➖, 0↑ 1↓)
duckdb / parquet (1.018x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsFile Size Changes (2 files changed, +5.0% overall, 1↑ 1↓)
Totals:
|
Benchmarks: Random AccessVortex (geomean): 0.900x ✅ unknown / unknown (0.990x ➖, 6↑ 0↓)
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.999x ➖, 0↑ 1↓)
datafusion / parquet (1.003x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.962x ➖, 4↑ 0↓)
duckdb / parquet (0.997x ➖, 0↑ 0↓)
duckdb / duckdb (0.977x ➖, 3↑ 0↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (201 files changed, +0.5% overall, 130↑ 71↓)
Totals:
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.988x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.056x ➖, 0↑ 1↓)
datafusion / parquet (1.094x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (1.076x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.011x ➖, 0↑ 2↓)
duckdb / parquet (1.087x ➖, 0↑ 3↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.003x ➖ unknown / unknown (1.007x ➖, 0↑ 0↓)
|
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
58b212d to
b03e192
Compare
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental “patched array” pathway for FastLanes BitPacked data, enabling legacy BitPacked-with-interior-patches payloads to deserialize as a Patched wrapper, and wiring the same mode into compression + file encoding allowlists.
Changes:
- Add a BitPacked deserialization plugin that rewrites interior patches into a
Patchedwrapper whenVORTEX_EXPERIMENTAL_PATCHED_ARRAYis set. - Update integer bitpacking compression to emit
Patched(instead of interior patches) in experimental mode and update file allowed-encodings accordingly. - Enable the experimental mode in benchmark GitHub Actions workflows via env var.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
vortex-file/src/strategy.rs |
Conditionally allows Patched encoding for file writing when experimental mode is enabled. |
vortex-btrblocks/src/schemes/integer.rs |
Switches BitPacking scheme output to Patched in experimental mode; keeps legacy interior-patches mode otherwise. |
encodings/fastlanes/src/lib.rs |
Adds USE_EXPERIMENTAL_PATCHES flag and conditionally registers a custom BitPacked plugin. |
encodings/fastlanes/src/bitpacking/plugin.rs |
New plugin to deserialize legacy BitPacked-with-patches as a Patched wrapper around a patchless BitPacked array. |
encodings/fastlanes/src/bitpacking/mod.rs |
Wires the new plugin module into the bitpacking module. |
encodings/fastlanes/public-api.lock |
Records new public static USE_EXPERIMENTAL_PATCHES. |
.github/workflows/sql-benchmarks.yml |
Forces experimental patched-array mode during SQL benchmark data generation. |
.github/workflows/bench.yml |
Forces experimental patched-array mode during benchmark runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let array = if *USE_EXPERIMENTAL_PATCHES { | ||
| let patches = parts.patches.take(); | ||
| // Transpose patches into G-ALP style PatchedArray, wrapping an inner BitPackedArray. | ||
| let array = BitPacked::try_new( | ||
| parts.packed, | ||
| ptype, | ||
| parts.validity, | ||
| None, | ||
| parts.bit_width, | ||
| parts.len, | ||
| parts.offset, | ||
| )? | ||
| .into_array(); | ||
|
|
||
| match patches { | ||
| None => array, | ||
| Some(p) => Patched::from_array_and_patches( | ||
| array, | ||
| &p, | ||
| &mut LEGACY_SESSION.create_execution_ctx(), | ||
| )? | ||
| .into_array(), | ||
| } | ||
| } else { |
There was a problem hiding this comment.
When USE_EXPERIMENTAL_PATCHES is enabled, this branch no longer reapplies packed_stats to the returned array (neither the inner patchless BitPacked nor the outer Patched). This drops statistics that were previously preserved, which can regress downstream optimizations/decisions that rely on stats. Consider propagating packed_stats onto the final returned array in both branches (e.g., set stats on the BitPacked before wrapping, and/or on the resulting Patched array).
| .deserialize(dtype, len, metadata, buffers, children, session)? | ||
| .try_downcast::<BitPacked>() | ||
| .map_err(|_| { | ||
| vortex_err!("BitPacked plugin should only deserialize vortex.bitpacked") |
There was a problem hiding this comment.
The error message mentions vortex.bitpacked, but this encoding’s ID is fastlanes.bitpacked (see BitPacked::ID). Using the correct ID (or formatting the BitPacked::ID value) will make failures easier to diagnose.
| vortex_err!("BitPacked plugin should only deserialize vortex.bitpacked") | |
| vortex_err!("BitPacked plugin should only deserialize {}", BitPacked::ID) |
| let packed = bitpacked.packed().clone(); | ||
| let ptype = bitpacked.dtype().as_ptype(); | ||
| let validity = bitpacked.validity()?; | ||
| let bw = bitpacked.bit_width; |
There was a problem hiding this comment.
This accesses bitpacked.bit_width directly even though BitPackedArrayExt is already in scope and provides bit_width(). Using the accessor avoids relying on internal struct fields and reduces the chance of breakage if the BitPacked data representation changes.
| let bw = bitpacked.bit_width; | |
| let bw = bitpacked.bit_width(); |
| //! A custom [`ArrayPlugin`] that lets you load in and deserialize a `BitPacked` array with interior | ||
| //! patches as a `PatchedArray` that wraps a patchless `BitPacked` array. | ||
| //! | ||
| //! This enables zero-cost backward compatibility with previously written datasets. | ||
|
|
There was a problem hiding this comment.
This introduces a new deserialization pathway that conditionally rewrites BitPacked (with interior patches) into a Patched wrapper based on runtime configuration, but there’s no test exercising the behavior. Adding a unit/integration test that round-trips a BitPacked with patches through serialization/deserialization and asserts the resulting encoding (BitPacked vs Patched) would help prevent regressions.
Summary
Part of roll out of Patched array. This PR adds a new environment variable,
VORTEX_EXPERIMENTAL_PATCHED_ARRAY.When the env var is set
BitPackedarray with interior patches will be decoded as aPatchedarray wrapping aBitPackedwith no interior patchesThis allows us to do some more soak testing of the encoding before we switch it to be the default.
Testing
On a previous commit in this PR, I've setup the benchmarks to run with the experimental Patched encoding enabled, the benchmark results all seem within the margin of error. The file sizes are generally the same but in the worst case, 2-3% larger.