Skip to content

perf: use bulk-NULL semantics in split and substring, skip Vec allocation in split#4403

Merged
mbutrovich merged 1 commit into
apache:mainfrom
mbutrovich:bulk_null
May 22, 2026
Merged

perf: use bulk-NULL semantics in split and substring, skip Vec allocation in split#4403
mbutrovich merged 1 commit into
apache:mainfrom
mbutrovich:bulk_null

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4397.

Rationale for this change

What changes are included in this PR?

  1. Bulk-NULL in split.rs (both array paths) - input nulls cloned, BooleanBufferBuilder removed.
  2. Bulk-NULL in substring.rs (Utf8, LargeUtf8, Binary, LargeBinary) - into_parts then reattach input nulls.
  3. LargeUtf8 offset-width fix in split.rs - LargeUtf8 input now produces LargeListArray.
  4. Skip Vec in split.rs hot loop - append &str slices straight into GenericStringBuilder.

How are these changes tested?

Existing tests.

…leanBufferBuilder removed.

2. Bulk-NULL in substring.rs (Utf8, LargeUtf8, Binary, LargeBinary) - into_parts then reattach input nulls.
3. LargeUtf8 offset-width fix in split.rs - LargeUtf8 input now produces LargeListArray.
4. Skip Vec in split.rs hot loop - append &str slices straight into GenericStringBuilder.
@mbutrovich mbutrovich self-assigned this May 22, 2026
@mbutrovich mbutrovich marked this pull request as ready for review May 22, 2026 12:38
@mbutrovich mbutrovich requested review from andygrove and comphead May 22, 2026 12:38
@mbutrovich mbutrovich changed the title chore: use bulk-NULL semantics in split and substring, skip Vec allocation in split perf: use bulk-NULL semantics in split and substring, skip Vec allocation in split May 22, 2026
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich substring is expected to be wired from DF since DF54

@mbutrovich mbutrovich merged commit 2a266f6 into apache:main May 22, 2026
155 of 156 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: Use bulk-NULL builders in split and substring

2 participants