Skip to content

fix array_repeat capacity overflow on constant scalar with large count#22305

Open
xiedeyantu wants to merge 3 commits into
apache:mainfrom
xiedeyantu:array_repeat2
Open

fix array_repeat capacity overflow on constant scalar with large count#22305
xiedeyantu wants to merge 3 commits into
apache:mainfrom
xiedeyantu:array_repeat2

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

array_repeat still panics for oversized repeat counts in the constant-scalar path. The simplest reproducer is:

SELECT array_repeat(1, 9223372036854775807)

Unlike the previously reported array_repeat overflow cases, this path does not sum counts across rows and does not multiply nested list lengths, but it still reaches an unchecked Vec preallocation and panics with capacity overflow.

This change makes array_repeat reject oversized output lengths up front and return a normal execution error instead of panicking.

What changes are included in this PR?

This PR adds explicit bounds checks in repeat.rs so array_repeat validates requested output sizes before allocating buffers.

The main changes are:

  • Move repeat-length accumulation into shared checked helpers.
  • Reject oversized output lengths with:
    array_repeat: requested length exceeds maximum array size
  • Guard both scalar and list repeat paths so they fail consistently before hitting unchecked allocation or arithmetic overflow.
  • Reuse precomputed outer offsets for the list path instead of rebuilding them from unchecked lengths.

Are these changes tested?

Yes.

This PR adds a regression test in repeat.rs covering the constant-scalar reproducer with i64::MAX as the repeat count and verifies that array_repeat returns an execution error rather than panicking.

Validated with:

cargo test -p datafusion-functions-nested scalar_count_exceeding_max_array_size_returns_error --lib

Are there any user-facing changes?

Yes.

Previously, oversized array_repeat calls could panic the process. After this change, they return a regular execution error:

array_repeat: requested length exceeds maximum array size

@github-actions github-actions Bot added the functions Changes to functions implementation label May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: array_repeat capacity overflow on constant scalar with large count

1 participant