perf: optimize array_remove for scalar needle#22390
Open
lyne7-sc wants to merge 4 commits into
Open
Conversation
neilconway
reviewed
May 20, 2026
Contributor
neilconway
left a comment
There was a problem hiding this comment.
Nice performance win!
| ); | ||
| } | ||
| }; | ||
| let original_data = list_array.values().to_data(); |
Contributor
There was a problem hiding this comment.
This will be inefficient for sliced arrays.
| let list_array = array.as_list::<i64>(); | ||
| general_remove_with_scalar::<i64>(list_array, needle, arr_n) | ||
| } | ||
| array_type => exec_err!("array_remove does not support type '{array_type}'."), |
Contributor
There was a problem hiding this comment.
This is called by more than just array_remove; can we improve the error message?
Comment on lines
+596
to
+607
| for (i, keep) in eq_array.iter().enumerate() { | ||
| if keep == Some(false) && removed < max_removals { | ||
| if let Some(bs) = pending_batch_to_retain { | ||
| mutable.extend(0, start + bs, start + i); | ||
| copied += i - bs; | ||
| pending_batch_to_retain = None; | ||
| } | ||
| removed += 1; | ||
| } else if pending_batch_to_retain.is_none() { | ||
| pending_batch_to_retain = Some(i); | ||
| } | ||
| } |
Contributor
There was a problem hiding this comment.
I wonder if it would be possible to iterate only over the "false" bits, e.g., by negating the buffer and looking at BooleanBuffer::set_indices.
| let mut copied = 0usize; | ||
| let mut pending_batch_to_retain: Option<usize> = None; | ||
| for (i, keep) in eq_array.iter().enumerate() { | ||
| if keep == Some(false) && removed < max_removals { |
Contributor
There was a problem hiding this comment.
Can we break from the loop once we hit max_removals?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Similar to #22387 (array_replace scalar optimization)
array_remove/array_remove_n/array_remove_allperform element-wise comparison by invokingcompare_element_to_listagainst each row's sub-array individually. When the needle is a scalar, this can be optimized by performing a single vectorizeddistinctcomparison over the entire flattened values buffer.What changes are included in this PR?
general_remove_with_scalar) that usesarrow_ord::cmp::distinctwithScalarwrapper for a single bulk comparison pass over the flat values buffer.nvalues, and LargeList type coverage.Benchmarks
Are these changes tested?
Yes, existing and new SLT edge-case tests in
array_remove.slt.Are there any user-facing changes?
No.