fix generate_series table function overflows#22250
Conversation
|
|
||
| fn advance(&mut self, step: &Self::StepType) -> Result<()> { | ||
| *self += step; | ||
| *self = self.checked_add(*step).ok_or_else(|| { |
There was a problem hiding this comment.
Interestingly, this is what PostgreSQL returns:
Output:
119 ms
generate_series
---------------------
9223372036854775806
(1 row)
And DuckDB does the same.
┌─────────────────────┐
│ generate_series │
│ int64 │
├─────────────────────┤
│ 9223372036854775806 │
│ (9.22 quintillion) │
└─────────────────────┘
I think it makes sense to implement the same behavior (i.e. find max value during instantiation).
There was a problem hiding this comment.
Agreed! I'll align DataFusion's behavior with these database behaviors.
Dandandan
left a comment
There was a problem hiding this comment.
Can we make it compatible with PostgreSQL / DuckDB?
Yes, please wait a moment. |
|
@Dandandan Please review again. Thanks! |
| include_end: bool, | ||
| name: &'static str, | ||
| /// Set to true when advancing would overflow, so the next call returns None. | ||
| finished: bool, |
There was a problem hiding this comment.
I think instead of adding finished we can just update end and also remove the checked_add (use += again since we don't overflow)
There was a problem hiding this comment.
I modified the implementation.
9898786 to
879c096
Compare
Co-authored-by: Copilot <copilot@github.com>
e716e46 to
c9f3285
Compare
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
Which issue does this PR close?
Closes #22208.
Rationale for this change
The table-valued
generate_series(start, stop, step)function uses an integer iterator that calledcurrent += stepwithout overflow checking. When the series approachesi64::MAXand the step would advance past it, Rust's debug build panics and release build silently wraps, producing incorrect results or a crash visible to end users.What changes are included in this PR?
*self += stepin theSeriesValue for i64advancemethod withchecked_add, returning aDataFusionError::Executionon overflow instead of panicking.SELECT * FROM generate_series(9223372036854775806, 9223372036854775807, 2)returns an execution error rather than panicking.Are these changes tested?
Yes. A
statement errorcase is added to table_functions.slt covering the exact query from the bug report. The timestamp-basedadvanceimplementation already usedOption-returning arithmetic and was not affected.Are there any user-facing changes?
Previously this query would cause a panic (or silent wrap in release mode); it now returns a clean
DataFusion error: Execution error: generate_series: integer overflow while advancing series.