feat: use aligned slice access during bulk append in SparkUnsafeArray#4672
feat: use aligned slice access during bulk append in SparkUnsafeArray#4672sandugood wants to merge 12 commits into
Conversation
|
When running CI pipeline, got few UB. Switched to a less optimal way, which still provided boost in performance. By materializing
|
|
Could somebody potentially trigger the CI run? Thanks in advance |
|
Thanks for picking this up. The shape of the change is exactly what #4626 was asking for: keep the runtime alignment check as load-bearing, and replace the per-element null branch with A few things to look at, the first one is a correctness concern.
I'll approve the CI workflows so the matrix actually exercises the change. Out of curiosity, what was the UB you hit earlier? Worth knowing whether item 1 above was the cause or there's a separate hazard. Disclosure: this review was assisted by an AI tool (Anthropic Claude via Claude Code). I read the diff, the linked issue, and the prior closed PR's review feedback before forming the comments above, and I take responsibility for them. |
|
Thank you for the review @andygrove I have changed my approach and bechmarked it. Posting the benchmarking results, which I ran using
cc @mbutrovich, if you could take a quick look at this, please. Thanks! |
Which issue does this PR close?
Closes #4626
Rationale for this change
Currently, there is a bottleneck in performance during bulk append in the
SparkUnsafeArrayimplementation (inmacroand respectiveBuildertypes forbool,date32andtimestamp). If the array isNULLABLEthere is a hotspot:Self::is_null_in_bitset()which is suboptimal (see the benchmarking results below)What changes are included in this PR?
With this PR we change the flow of execution for nullable arrays:
SparkvsArrowlogicBooleanBufferfrom the bytes and then creating aNullBufferPrimitiveArray(with type specified in themacro_rules!and inside the according methods in theSparkUnsafeArrayfor several dtypes) from slice databuilderHow are these changes tested?
All of the basic tests pass.
Here is the benchmarking result (copied from the comment below).
Result from
with_nullsversion of benches (no_nullswasnt affected with this PR)