Skip to content

perf: Optimize NULL handling in arrays_zip#21475

Merged
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-arrays-zip-nulls
Apr 10, 2026
Merged

perf: Optimize NULL handling in arrays_zip#21475
mbutrovich merged 2 commits intoapache:mainfrom
neilconway:neilc/perf-arrays-zip-nulls

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented Apr 8, 2026

Which issue does this PR close?

Rationale for this change

arrays_zip was building a Vec<bool> and then converting it to a NullBuffer. It is simpler and (a bit) faster to just use NullBuffer / NullBufferBuilder directly.

Benchmarks (ARM64):

  - arrays_zip_no_nulls_8192: 1096.5µs → 1010.7µs, -7.8%
  - arrays_zip_10pct_nulls_8192: 1131.8µs → 1100.1µs, -2.8%

The improvement is not massive but it's non-zero, and the resulting code is cleaner and more idiomatic.

What changes are included in this PR?

  • Implement optimization
  • Add benchmark

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Seems more idiomatic with Arrow-rs, and we'll take performance wins! Thanks @neilconway!

@mbutrovich mbutrovich added this pull request to the merge queue Apr 10, 2026
Copy link
Copy Markdown
Contributor

@metegenez metegenez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a 50% null density case to the benchmark? I am not sure we are firing the all-null-row early exit.

LGTM.

Merged via the queue into apache:main with commit fbb5240 Apr 10, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NULL handling in arrays_zip

3 participants