feat: use aligned slice access during bulk append in SparkUnsafeArray by sandugood · Pull Request #4672 · apache/datafusion-comet

sandugood · 2026-06-18T00:32:55Z

Which issue does this PR close?

Rationale for this change

Currently, there is a bottleneck in performance during bulk append in the SparkUnsafeArray implementation (in macro and respective Builder types for bool, date32 and timestamp). If the array is NULLABLE there is a hotspot:

we check each corresponding element at the index for nullability using Self::is_null_in_bitset() which is suboptimal (see the benchmarking results below)

What changes are included in this PR?

With this PR we change the flow of execution for nullable arrays:

reading raw bytes from the null bitmap and flipping them according to Spark vs Arrow logic
constructing BooleanBuffer from the bytes and then creating a NullBuffer
creating a PrimitiveArray (with type specified in the macro_rules! and inside the according methods in the SparkUnsafeArray for several dtypes) from slice data
appending this array to the current builder

How are these changes tested?

All of the basic tests pass.
Here is the benchmarking result (copied from the comment below).
Result from with_nulls version of benches (no_nulls wasnt affected with this PR)

array_type	main (current)	incoming PR	time_reduce
i32	29.274μs	2.4256μs	-91.71%
i64	27.260μs	4.7887μs	-82.43%
f64	29.352μs	4.6621μs	-84.11%
date32	26.863μs	2.5345μs	-89.89%
timestamp	45.818μs	4.3659μs	-90.47%

…g the CI

…ccess

sandugood · 2026-06-18T13:42:41Z

When running CI pipeline, got few UB.
It was caused presumably by the bit_offset and Spark <-> Arrow null bitmask tinkering.

Switched to a less optimal way, which still provided boost in performance. By materializing &[bool] and passing it to builder's append_values()

array_type	main (current)	incoming PR	time_reduce
i32	30.030μs	21.250μs	-29.23%
i64	27.572μs	21.996μs	-20.22%
f64	30.059μs	21.889μs	-27.17%
date32	28.693μs	21.041μs	-26.66%
timestamp	46.832μs	21.888μs	-53.26%

sandugood · 2026-06-18T13:43:06Z

Could somebody potentially trigger the CI run? Thanks in advance

…ccess

andygrove · 2026-06-18T14:34:29Z

Thanks for picking this up. The shape of the change is exactly what #4626 was asking for: keep the runtime alignment check as load-bearing, and replace the per-element null branch with PrimitiveBuilder::append_values(values, &is_valid). The benchmark numbers look great.

A few things to look at, the first one is a correctness concern.

Boolean nullable path may have UB. In append_booleans_to_builder, the new nullable path casts *const u8 to *const bool and iterates:
```
let slice = unsafe { std::slice::from_raw_parts(self.element_offset as *const bool, num_elements) };
for (idx, &value) in slice.iter().enumerate() { ... }
```
Rust requires every bool value to be exactly 0 or 1. For null slots, the underlying byte is JVM-uninitialized memory, so the let &value materialization in the iterator pulls out a bool from arbitrary bytes which is UB even when the value is then ignored. The non-nullable branch a few lines down correctly uses *const u8 plus value != 0. Could you mirror that pattern in the nullable branch? cargo +nightly miri test would give a definitive answer, and it might be related to the UB you mentioned hitting earlier.
Boolean nullable path still branches per-element. Once the cast is fixed, the loop still does a per-element null check, which is the shape perf: use aligned slice access in SparkUnsafeArray bulk append #4626 specifically called out as the bigger win for booleans. BooleanBuilder::append_values(values: &[bool], is_valid: Option<&[bool]>) would absorb the validity in bulk and remove the branch. Worth taking here if straightforward.
Vec<bool> allocation per call. The simpler Vec<bool> form is fine for landing, but perf: use aligned slice access in SparkUnsafeArray bulk append #4626 suggested feeding a BooleanBuffer directly to the builder to skip the bool materialization. If you'd rather defer that, a follow-up issue with the link to the previous discussion would be helpful. At minimum a Vec::with_capacity(num_elements) instead of collect would avoid one growth.
Stale assert message in append_dates_to_builder. The new debug_assert! message reads "append_timestamps: element_offset is null". Copy-paste from the timestamp path, should be append_dates.
Comment phrasing on the alignment fallback. Could you reword the // Note: alignment is not guaranteed - that is why do this comments to point at the existing explanation at list.rs:105 and say why (nested-array variable-length region)? The prior PR was bounced because the description left the runtime check looking redundant, and a clear comment here is the easiest way to keep that from happening again.
PR description. The benchmark numbers in your comment look like the right shape. Could you pull them into the PR description body and mention they came from cargo bench --bench array_element_append?

I'll approve the CI workflows so the matrix actually exercises the change. Out of curiosity, what was the UB you hit earlier? Worth knowing whether item 1 above was the cause or there's a separate hazard.

Disclosure: this review was assisted by an AI tool (Anthropic Claude via Claude Code). I read the diff, the linked issue, and the prior closed PR's review feedback before forming the comments above, and I take responsibility for them.

…ccess

…Vec<> loops

sandugood · 2026-06-19T10:48:30Z

Thank you for the review @andygrove

I have changed my approach and bechmarked it.
My current approach is about using arrow internal types and buffers (essentially NullBuffer, from the underlying BooleanBufer, which is constructed from the bitmap, although with bits flipped), without materializing and looping through Vec<> which was still an improvement, but not a drastic one.

Posting the benchmarking results, which I ran using cargo bench --bench array_element_append. Posting result from with_nulls version of benches (no_nulls wasnt affected with this PR)

array_type	main (current)	incoming PR	time_reduce
i32	29.274μs	2.4256μs	-91.71%
i64	27.260μs	4.7887μs	-82.43%
f64	29.352μs	4.6621μs	-84.11%
date32	26.863μs	2.5345μs	-89.89%
timestamp	45.818μs	4.3659μs	-90.47%

cc @mbutrovich, if you could take a quick look at this, please. Thanks!

sandugood added 6 commits June 18, 2026 02:14

Builder appending refactoring

58189a1

Formatting, small refactoring and adding docstrings

7778a84

Added docstring regarding alignment

2bee010

Fixed the fallback in an unaligned but NULLABLE case, that was failin…

6abf1f7

…g the CI

Merge remote-tracking branch 'upstream/main' into feat/enable-slice-a…

7439706

…ccess

Changes fixed

74265c7

sandugood added 2 commits June 18, 2026 16:43

Merge remote-tracking branch 'upstream/main' into feat/enable-slice-a…

d2c85e4

…ccess

Fixed cargo formatting

662a7f2

sandugood added 3 commits June 19, 2026 10:40

Merge remote-tracking branch 'upstream/main' into feat/enable-slice-a…

7bf6f15

…ccess

Remade the building process using arrow buffers with no hot-spots in …

157c73f

…Vec<> loops

Formatting fix

0e5fcad

Added docstring about runtime check of alignment

42bd9dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use aligned slice access during bulk append in SparkUnsafeArray#4672

feat: use aligned slice access during bulk append in SparkUnsafeArray#4672
sandugood wants to merge 12 commits into
apache:mainfrom
sandugood:feat/enable-slice-access

sandugood commented Jun 18, 2026 •

edited

Loading

Uh oh!

sandugood commented Jun 18, 2026

Uh oh!

sandugood commented Jun 18, 2026

Uh oh!

andygrove commented Jun 18, 2026

Uh oh!

sandugood commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sandugood commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

sandugood commented Jun 18, 2026

Uh oh!

sandugood commented Jun 18, 2026

Uh oh!

andygrove commented Jun 18, 2026

Uh oh!

sandugood commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sandugood commented Jun 18, 2026 •

edited

Loading