Skip to content

[branch-53] perf: Optimize array_concat using MutableArrayData (#20620)#118

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intobranch-53from
branch-53-cherrypick-20620
May 4, 2026
Merged

[branch-53] perf: Optimize array_concat using MutableArrayData (#20620)#118
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intobranch-53from
branch-53-cherrypick-20620

Conversation

@pabadrubio
Copy link
Copy Markdown

Cherry-pick this PR: apache#20620.

This is intended to be a performance improvement, but it also fixes a bug in the way array_concat handles null values that have a positive slot length.

## Which issue does this PR close?

- Closes apache#20619 .

## Rationale for this change

The current implementation of `array_concat` creates an `ArrayRef` for
each row, uses Arrow's `concat` kernel to merge the elements together,
and then uses `concat` again to produce the final results. This does a
lot of unnecessary allocation and copying.

Instead, we can use `MutableArrayData::extend` to copy element ranges in
bulk, which avoids much of this intermediate copying and allocation.
This approach is 5-15x faster on a microbenchmark.

## What changes are included in this PR?

* Add benchmark
* Improve SLT test coverage for `array_concat`
* Implement optimization

## Are these changes tested?

Yes, and benchmarked.

## Are there any user-facing changes?

No.

(cherry picked from commit d2df7a5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants