Use 1024-value pco pages in btrblocks float/integer schemes#7922
Use 1024-value pco pages in btrblocks float/integer schemes#7922joseph-isaacs wants to merge 2 commits into
Conversation
Adds a divan benchmark that compares pcodec with the default page size against pcodec configured with a 1024-value page size on f64 and i64 data. The bench reports the compression ratio of each variant on startup and times compression, full decompression, and per-element scalar_at access for both, so we can quantify the size/random-access tradeoff of smaller pco pages. Signed-off-by: Claude <noreply@anthropic.com>
The btrblocks PcoScheme for floats and integers compressed with an 8192- value page size. Shrinking the page to 1024 keeps the per-row size essentially unchanged (~0.03 bits/value for f64, ~0.07 bits/value for i64 in the new pcodec_page_size bench) while shrinking the decode window 8x for random access via scalar_at. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will improve performance by 20.33%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | Simulation | pcodec_compress_i64_1k_page |
N/A | 9.7 ms | N/A |
| 🆕 | Simulation | pcodec_compress_i64_default_page |
N/A | 10.6 ms | N/A |
| 🆕 | Simulation | pcodec_decompress_f64_1k_page |
N/A | 3.2 ms | N/A |
| 🆕 | Simulation | pcodec_decompress_f64_default_page |
N/A | 2.3 ms | N/A |
| 🆕 | Simulation | pcodec_compress_f64_1k_page |
N/A | 10.6 ms | N/A |
| 🆕 | Simulation | pcodec_compress_f64_default_page |
N/A | 11.2 ms | N/A |
| 🆕 | Simulation | pcodec_scalar_at_i64_default_page |
N/A | 1.4 s | N/A |
| 🆕 | Simulation | pcodec_scalar_at_f64_1k_page |
N/A | 34.4 ms | N/A |
| 🆕 | Simulation | pcodec_scalar_at_f64_default_page |
N/A | 1.6 s | N/A |
| 🆕 | Simulation | pcodec_scalar_at_i64_1k_page |
N/A | 32.3 ms | N/A |
| ⚡ | Simulation | new_bp_prim_test_between[i16, 32768] |
134.1 µs | 120.2 µs | +11.64% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 32768] |
169.9 µs | 141 µs | +20.51% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 16384] |
109.1 µs | 94.7 µs | +15.21% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 16384] |
144.4 µs | 115.1 µs | +25.52% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 32768] |
236.7 µs | 177.9 µs | +33.03% |
| ⚡ | Simulation | new_alp_prim_test_between[f64, 16384] |
148.8 µs | 126.9 µs | +17.31% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/pcodec-1k-page-benchmark-mc8sI (617f690) with develop (7349cd6)
Footnotes
-
24 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.008x ➖ datafusion / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.989x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.018x ➖, 0↑ 1↓)
datafusion / parquet (1.014x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.968x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.004x ➖, 0↑ 0↓)
duckdb / parquet (0.992x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeFile Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.970x ➖, 4↑ 1↓)
datafusion / vortex-compact (1.142x ❌, 0↑ 15↓)
datafusion / parquet (1.039x ➖, 0↑ 2↓)
datafusion / arrow (0.890x ✅, 12↑ 3↓)
duckdb / vortex-file-compressed (0.991x ➖, 1↑ 1↓)
duckdb / vortex-compact (1.165x ❌, 0↑ 13↓)
duckdb / parquet (0.978x ➖, 2↑ 0↓)
duckdb / duckdb (0.958x ➖, 4↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (7 files changed, +0.5% overall, 7↑ 0↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.011x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.005x ➖, 0↑ 2↓)
datafusion / parquet (1.016x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (1.039x ➖, 0↑ 8↓)
duckdb / vortex-compact (1.070x ➖, 1↑ 30↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
duckdb / duckdb (1.034x ➖, 0↑ 5↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMEFile Size Changes (15 files changed, +0.8% overall, 15↑ 0↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.156x ➖, 0↑ 2↓)
datafusion / vortex-compact (0.948x ➖, 1↑ 0↓)
datafusion / parquet (1.247x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (1.042x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.044x ➖, 0↑ 0↓)
duckdb / parquet (1.052x ➖, 0↑ 0↓)
Full attributed analysis
|
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.980x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.015x ➖, 0↑ 1↓)
duckdb / parquet (0.994x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsFile Size Changes (1 files changed, +1.5% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.026x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.104x ❌, 0↑ 9↓)
datafusion / parquet (1.031x ➖, 0↑ 0↓)
datafusion / arrow (0.960x ➖, 4↑ 0↓)
duckdb / vortex-file-compressed (1.026x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.125x ❌, 0↑ 15↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
duckdb / duckdb (1.011x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMEFile Size Changes (22 files changed, +0.5% overall, 22↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.140x ➖, 0↑ 5↓)
datafusion / vortex-compact (1.053x ➖, 0↑ 1↓)
datafusion / parquet (1.057x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.993x ➖, 0↑ 0↓)
duckdb / parquet (1.043x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 1.006x ➖ unknown / unknown (1.003x ➖, 0↑ 1↓)
|
Benchmarks: CompressionVortex (geomean): 0.998x ➖ unknown / unknown (0.999x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.080x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.222x ➖, 0↑ 6↓)
datafusion / parquet (1.104x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.178x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.145x ➖, 0↑ 1↓)
duckdb / parquet (1.078x ➖, 0↑ 0↓)
Full attributed analysis
|
Summary
vortex-btrblocks'PcoSchemefor floats and integers configured pcodecwith an 8192-value page size. Drop that to 1024 so random access through
scalar_atdecodes an 8x smaller window.vortex/benches/pcodec_page_size.rs— a divan bench that comparespcodec with the default page size against pcodec configured with 1024-value
pages, on
f64andi64data. It reports compression ratios at startup andtimes compression, full decompression, and
scalar_atfor both variants.Why 1024
Sample output of the new bench (100K values, level 3):
The per-row overhead of 1k pages is small (~0.03 bits/value f64, ~0.07
bits/value i64) and
scalar_atonly has to decode 1024 values instead of afull chunk.
Test plan
cargo clippy -p vortex-btrblocks --all-features --all-targetscargo test -p vortex-btrblocks— 35 unit tests + 3 doctests passcargo bench -p vortex --bench pcodec_page_size --no-run— bench buildspcodec_page_size --test— all 10 bench cases executehttps://claude.ai/code/session_01Hr3v3EaG9XjKypGyyq2BWF
Generated by Claude Code