Skip to content

perf[gpu]: export arrow device validity on the gpu#8440

Merged
0ax1 merged 10 commits into
developfrom
ad/cuda-validity-export
Jun 17, 2026
Merged

perf[gpu]: export arrow device validity on the gpu#8440
0ax1 merged 10 commits into
developfrom
ad/cuda-validity-export

docs

637a07d
Select commit
Loading
Failed to load commit list.
CodSpeed HQ / CodSpeed Performance Analysis failed Jun 17, 2026 in 0s

Performance Regression: -4.18%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 8 improved benchmarks
❌ 12 regressed benchmarks
✅ 1524 untouched benchmarks
🆕 3 new benchmarks
⏩ 11 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(1000, 10)] 20.6 µs 35.7 µs -42.29%
Simulation chunked_dict_primitive_into_canonical[u32, (1000, 10, 10)] 120.7 µs 182.9 µs -34%
Simulation encode_varbin[(1000, 2)] 176.1 µs 236 µs -25.4%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 161.8 µs 198.1 µs -18.29%
Simulation chunked_varbinview_into_canonical[(1000, 10)] 177.1 µs 214 µs -17.25%
Simulation bench_many_codes_few_values[1024] 393.2 µs 468.7 µs -16.1%
Simulation decompress_rd[f64, (100000, 0.0)] 845.5 µs 982.8 µs -13.97%
Simulation varbinview_large 112.2 µs 130.3 µs -13.89%
Simulation bitwise_not_vortex_buffer_mut[128] 186.1 ns 215.3 ns -13.55%
Simulation chunked_varbinview_canonical_into[(100, 100)] 273.8 µs 308.8 µs -11.33%
Simulation bitwise_not_vortex_buffer_mut[1024] 246.4 ns 275.6 ns -10.58%
Simulation chunked_varbinview_into_canonical[(100, 100)] 326.4 µs 364.9 µs -10.55%
Simulation sum_i32_nullable_all_valid 69.2 µs 35.3 µs +95.96%
Simulation null_count_run_end[(10000, 4, 0.01)] 125.4 µs 91.6 µs +36.92%
Simulation encode_varbinview[(1000, 2)] 189 µs 156.7 µs +20.57%
Simulation take_10k_contiguous 252.8 µs 218.1 µs +15.89%
Simulation and_bool_nullable 93.7 µs 82.7 µs +13.21%
Simulation baseline_lt[4, 1024] 78.5 µs 69.6 µs +12.76%
Simulation decompress_rd[f64, (100000, 0.01)] 981.2 µs 890.4 µs +10.2%
Simulation decompress_rd[f64, (100000, 0.1)] 981.2 µs 890.4 µs +10.19%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/cuda-validity-export (637a07d) with develop (679e2c5)2

Open in CodSpeed

Footnotes

  1. 11 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on develop (8058097) during the generation of this report, so 679e2c5 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.