Array has optim backport#151
Open
freakyzoidberg wants to merge 2 commits into
Open
Conversation
`array_has(array, element)` returns, for each row, whether the array contains the element. When the `element` (needle) is an array rather than a scalar -- the needle argument is a column with one value per row, e.g. `array_has(t1.tags, t2.key)` in a join filter -- execution goes through `array_has_dispatch_for_array` (the `ColumnarValue::Array` needle branch), which compared each row by invoking the Arrow `eq` kernel once per row. That kernel allocates a `BooleanArray` and pays downcast and dispatch overhead on every row. (The scalar-needle branch was optimized separately in apache#20374.) Add a fast path for primitive and string element types, preserving the Arrow `eq` kernel semantics (total-order float equality; null elements never match). With all-valid elements each row is a single branchless OR-reduction over the native values. When primitive elements contain nulls -- whose backing values are arbitrary -- the per-element equality bitmap is ANDed with the validity bitmap (one word-parallel op, no per-element branch) before the per-row reduction, so null slots never match regardless of their value. Past a moderate average list length (`NULL_FAST_PATH_MAX_LEN`) this bitmap's extra passes lose to the per-row kernel, so the element-null branch bails to it there (an O(1) check); the all-valid fold has no such crossover. Nested and other element types keep using the per-row `eq` kernel. What this removes is the fixed per-row kernel overhead, not the element comparison itself, so the gain is largest for short lists and shrinks as lists grow. Criterion microbenchmark, array needle on a 100k-row batch vs the per-row `eq` kernel (NOT end-to-end query time): i64, all-valid elements: ~13x (64-elem lists), ~1.9x at 1024-elem i64, ~30% null elements: ~10x (8-elem) ... ~1x (512-elem); falls back beyond string elements: ~1.45x Every shape improves with no regression. End-to-end impact depends on how much of a query `array_has` accounts for. For a query dominated by an array-needle `array_has` join filter (a `NestedLoopJoinExec` with `filter=array_has(tags, key)` over 3000x3000 rows of 8-element lists) total time drops from 0.95s to 0.059s (~16x, identical results). For a workload where `array_has` is a smaller fraction -- e.g. the ~6% of profile that motivated this (see apache#18070 / apache#18161, which fixed the join's deep-copy but left the per-row `array_has` cost) -- the overall speedup is single-digit percent. Part of apache#18727. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
BooleanBuffer::has_true() was added in arrow 59; branch-54 is on arrow 58.3, so the element-null per-row reduction uses count_set_bits() > 0 instead. No behavior change (17 array_has unit tests pass, clippy clean). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
backport of freakyzoidberg#1
see linked PR for details