fix array_repeat capacity overflow on constant scalar with large count#22305
Open
xiedeyantu wants to merge 3 commits into
Open
fix array_repeat capacity overflow on constant scalar with large count#22305xiedeyantu wants to merge 3 commits into
xiedeyantu wants to merge 3 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
array_repeatstill panics for oversized repeat counts in the constant-scalar path. The simplest reproducer is:Unlike the previously reported
array_repeatoverflow cases, this path does not sum counts across rows and does not multiply nested list lengths, but it still reaches an uncheckedVecpreallocation and panics withcapacity overflow.This change makes
array_repeatreject oversized output lengths up front and return a normal execution error instead of panicking.What changes are included in this PR?
This PR adds explicit bounds checks in repeat.rs so
array_repeatvalidates requested output sizes before allocating buffers.The main changes are:
array_repeat: requested length exceeds maximum array sizeAre these changes tested?
Yes.
This PR adds a regression test in repeat.rs covering the constant-scalar reproducer with
i64::MAXas the repeat count and verifies thatarray_repeatreturns an execution error rather than panicking.Validated with:
cargo test -p datafusion-functions-nested scalar_count_exceeding_max_array_size_returns_error --libAre there any user-facing changes?
Yes.
Previously, oversized
array_repeatcalls could panic the process. After this change, they return a regular execution error: