perf: apply min-alloc split to ByteViewGroupValueBuilder::take_n (follow-up to #22165)#22205
Closed
RyanJamesStewart wants to merge 1 commit into
Closed
perf: apply min-alloc split to ByteViewGroupValueBuilder::take_n (follow-up to #22165)#22205RyanJamesStewart wants to merge 1 commit into
RyanJamesStewart wants to merge 1 commit into
Conversation
Follow-up to apache#22165, which replaced `drain(0..n).collect()` with the `split_vec_min_alloc` helper in the `bytes.rs` and `primitive.rs` `take_n` paths but did not cover the byte-view builder. `ByteViewGroupValueBuilder::take_n_inner` had the same idiom on `self.views`: `drain(0..n).collect()` always allocates `n` elements and leaves the retained vec at its pre-emit capacity. Under an OOM-triggered `EmitTo::First(n)` emit, `n` is close to `len`, so this copies the largest allocation. Routing through `split_vec_min_alloc` allocates `min(n, len - n)` instead, matching the other builders. Behavior is unchanged; existing `test_byte_view_take_n` / `test_byte_view_take_n_partial_completed_nonzero_index` cover both the drain and split_off branches.
e94d307 to
e4a184a
Compare
ariel-miculas
approved these changes
May 21, 2026
RyanJamesStewart
added a commit
to RyanJamesStewart/datafusion
that referenced
this pull request
May 21, 2026
Addresses the review on apache#22416: - split_vec_min_alloc: shrink_to_fit the emitted prefix in the split_off branch. datafusion accounts memory by capacity rather than length, so the prefix must not retain the pre-split (larger) allocation's capacity. - Remove the duplicate split_vec_min_alloc from physical-plan's multi_group_by/mod.rs (added in apache#22165); bytes.rs and primitive.rs now use the shared datafusion_common::utils version. - Fold in the ByteViewGroupValueBuilder::take_n change previously proposed as apache#22205, so this PR converts every split_vec_min_alloc call site in one place. All split_vec_min_alloc, take_n and min_max tests pass.
Contributor
Author
|
Superseded by #22416, which hoists |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Follow-up to #22165 / #22164.
Rationale for this change
#22165 introduced the
split_vec_min_allochelper and applied it to thetake_npaths inbytes.rsandprimitive.rs, but did not cover the byte-view builder.ByteViewGroupValueBuilder::take_n_innerstill usedself.views.drain(0..n).collect::<Vec<_>>(), which always allocatesnelements and leaves the retained vec at its pre-emit capacity.Under an OOM-triggered
EmitTo::First(n)emit,nis close to the length ofviews, so this copies the largest allocation — exactly the case #22164 was filed to fix. This PR routes the byte-view builder through the samesplit_vec_min_allochelper, which allocatesmin(n, len - n)instead, so all three multi-group-by builders now share one strategy.This PR previously also modified
bytes.rsandprimitive.rs; those changes are now fully covered by #22165 and have been dropped. The PR is rebased onto currentmainand reduced to the one builder #22165 missed.What changes are included in this PR?
ByteViewGroupValueBuilder::take_n_inner: replacedrain(0..n).collect()withsplit_vec_min_alloc(&mut self.views, n).Are these changes tested?
Behavior is unchanged. Existing
test_byte_view_take_nandtest_byte_view_take_n_partial_completed_nonzero_indexexercise both thedrainandsplit_offbranches of the helper; alltake_ntests pass locally.Are there any user-facing changes?
No.