register_splits to get both offset and relative row_range#7913
Conversation
|
yea I thought about that but that would introduce unnecessary mutable state. We would keep changing the btree offset before passing down to each child |
|
It would be a stack alloced wrapper |
|
yea true, it is &mut now anyway. Still I think I prefer this because it is more explicit and if projection eval or pruning eval methods in the reader decide to use the global offset in the future, we can use this SplitRange struct to give them this information |
Signed-off-by: Onur Satici <onur@spiraldb.com>
Merging this PR will improve performance by 20.24%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | new_bp_prim_test_between[i16, 32768] |
134.1 µs | 120.3 µs | +11.54% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 16384] |
109.1 µs | 94.8 µs | +15.07% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 32768] |
236.7 µs | 178.1 µs | +32.94% |
| ⚡ | Simulation | new_bp_prim_test_between[i32, 32768] |
169.9 µs | 141.1 µs | +20.42% |
| ⚡ | Simulation | new_bp_prim_test_between[i64, 16384] |
144.4 µs | 115.2 µs | +25.4% |
| ⚡ | Simulation | new_alp_prim_test_between[f64, 16384] |
148.8 µs | 126.9 µs | +17.26% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing os/row-ranges (e6024a0) with develop (7349cd6)
Footnotes
-
24 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Summary
register_splits had a row_range that was local, so nested chunked layouts failed to get the right global split registered. Add the global offset and send it with the local row_range so layout readers have all information to register the global correct split offset.
I could have done this without changing the API but then the outer chunk layout has to create a new local btreemap and send that then merge with the global btreemap with the chunk offset applied, and that felt expensive