Combine overlapping runs in REE (take kernel)#9865
Conversation
…ir of merging runs
|
@alamb could you take a look at this, thx |
|
run benchmark take_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/optimize-take-REE (780bbe0) to c194e54 (merge-base) diff File an issue against this benchmark runner |
Jefffrey
left a comment
There was a problem hiding this comment.
take changes make sense to me. For interleave, we have a PR which aims to add a specialized path for run arrays which I hope to review soon:
So it might turn out this compact_runs fix might not be needed if we want a specialized path 🤔
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
I removed the interleave portion of the PR, should be good now 👍🏿 thx |
|
run benchmark take_kernels |
Thanks for this. My only concern now is that it seems to have a noticeable impact on the microbenchmark we have 🤔 Not sure if theres a way around the slicing approach used |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/optimize-take-REE (58a62c0) to c194e54 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
@Jefffrey I think since we are adding in an extra comparison in a hot loop it make sense that performance would go down. One idea I do have is to pre-compute these booleans all at once and then refer to them in the hot loop instead of doing a comparison. is there any other way to compare |
|
I tried to move the comparisons of physical_indices to their own tight loop but the |
|
This would really just be replicating behavior already provided by |
|
original - this causes the issues that #7710 aims to resolve. First approach - using values.to_data().slice(i,1) in the hot loop Final approach - moved around some code in ord.rs so that arrow-select & arrow-ord dont cause a depency chain. This uses This approach is still slower than the initial approach but this is expected. comparing values takes time. |
|
@Jefffrey could you take another look? thank you! |
asubiotto
left a comment
There was a problem hiding this comment.
Nice, yes this make_comparator approach is what I had in mind. I think this regression is reasonable
|
|
||
| assert_eq!(result.len(), 11); | ||
| assert_eq!(result.run_ends().values(), &[5, 8, 11]); | ||
| assert_eq!(result.values().as_string::<i32>().value(0), "bob"); |
There was a problem hiding this comment.
I think you might be able to keep the same coverage but have fewer assertions if you just compare the values slice by content. On a related note, could you use rstest to write these three new tests as three test cases of the same test? Not sure if arrow uses that crate
There was a problem hiding this comment.
yea that make sense, Ill update the PR
|
I'll try take a look at this soon |
|
While the benchmark results look good (compared to the slicing of arraydata approach), I'm not certain on this movement of comparator code into I wonder if we have more justification for this, like if it might unblock other issues we are facing in how comparison logic is in |
|
Yeah, in general I don't think we should be moving large amounts of code (and the comparator code gets used a lot) into the arrow-array module as that will have the effect of increasing binary sizes / compile times for downstream users who are are not compiling the entire arrow crate and not using REE How about this:
|
|
I think thats reasonable |
…crate gated by run_end_encoded feature Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
im honestly not 100% sure if this was done correctly. It seems to be correct. could you give it another look? @Jefffrey |
|
As an extra data point, I've had a need for comparator code to live somewhere else so that it can be used e.g. for dictionary interning. This is related to what I posted in the discord last week: https://discord.com/channels/885562378132000778/1314936346653102080/1501935260987031572 |
|
ill wait to resolve the git conflicts until we agree that this is a good approach to take |
|
@Jefffrey just wanted to bump this PR |
|
(I do have this PR on my todolist to review 😅) |
There was a problem hiding this comment.
i think moving the cmp code to a separate crate like this is a good idea, especially as @asubiotto highlighted they have a use case for it and i think it could even help out for
if this approach looks good to @alamb as well we'll need to update the release instructions (can do as a followup so long as its before next release). i dont think we need to wait for a major release but i could be wrong 🤔
| //! | ||
| //! The only public surface is [`make_comparator`] (with [`DynComparator`] as the | ||
| //! returned function type). `arrow-ord` re-exports both from here, so its | ||
| //! public API is unchanged. |
There was a problem hiding this comment.
| //! | |
| //! The only public surface is [`make_comparator`] (with [`DynComparator`] as the | |
| //! returned function type). `arrow-ord` re-exports both from here, so its | |
| //! public API is unchanged. |
Probably can omit this part
| #[cfg(test)] | ||
| use arrow_schema::{ArrowError, SortOptions}; | ||
| #[cfg(test)] | ||
| use std::cmp::Ordering; |
There was a problem hiding this comment.
we should probably move all the test code as well, leaving this file to just re-export make_comparator to maintain the API
| arrow-array = { workspace = true } | ||
| arrow-buffer = { workspace = true } | ||
| arrow-data = { workspace = true } | ||
| arrow-ord-basic = { workspace = true } |
There was a problem hiding this comment.
i wonder if we should name the new crate something like arrow-cmp?
| # Enables the run-end-encoded `take` fast path that merges adjacent | ||
| # physical indices whose underlying values compare equal. Pulls in | ||
| # `arrow-ord-basic` for the slot-wise comparator. | ||
| run_end_encoded = ["dep:arrow-ord-basic"] |
There was a problem hiding this comment.
i know @alamb suggested this feature, but i wonder if its entirely necessary? the new arrow-ord-basic crate is minimal enough we could probably just get away with always including it?
There was a problem hiding this comment.
As long as it isn't a lot of code, I think including it always is fine
In this case I think it will be unlikely that anyone will use these features / disable them, so just keeping the code simpler sounds like a good idea



take and interleave on RunEndEncoded arrays now merge adjacent runs with equal values
Which issue does this PR close?
#7710
Rationale for this change
for a logical representation of an ree
when the result should be
runs: [4], values: [1]both answers are correct but ree's should be as compact as possible.
What changes are included in this PR?
I updated the loop in take_run() to check the prev_value before created a new run. if the values are the same continue building the run.
added a compaction function in interleave.rs to join runs that share the same value.
Are these changes tested?
Yes, ive updated past test to be aligned with this behavior as well as added three of my own test.
Are there any user-facing changes?
no