More realistic RunEnd take benchmark params#8484
Conversation
Signed-off-by: Robert Kruszewski <github@robertk.io>
Merging this PR will improve performance by 15.67%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.7 µs | 213.9 µs | -16.94% |
| ⚡ | Simulation | take_10k_random |
255.8 µs | 197.8 µs | +29.27% |
| ⚡ | Simulation | take_10k_contiguous |
276.3 µs | 218.5 µs | +26.46% |
| ⚡ | Simulation | patched_take_10k_contiguous_patches |
291 µs | 232.3 µs | +25.26% |
| ⚡ | Simulation | patched_take_10k_random |
303 µs | 244.2 µs | +24.07% |
| ⚡ | WallTime | cuda/bitpacked_u8/unpack/3bw[100M] |
353.5 µs | 301.1 µs | +17.41% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 186.1 ns | +15.67% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
275.6 ns | 246.4 ns | +11.84% |
| 🆕 | Simulation | take[dense_table_len32768_run4_take2048] |
N/A | 263.5 µs | N/A |
| 🆕 | Simulation | take[nullable_dense_table_len32768_run4_take2048] |
N/A | 265.4 µs | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing rk/runendtakeparams (bc9a0fa) with develop (69ce1ed)
Footnotes
-
3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Average run length of 1 is not a real run end array