Commit 846376a
committed
[SPARK-56983][SQL] DecimalAggregates widened-Cast peel must preserve query semantics
## What changes were proposed in this pull request?
Two semantic-preserving fixes to the SPARK-56627 widened-Cast peel in
`DecimalAggregates`:
1. **SUM arm**: Replace `Cast(MakeDecimal(_, p + 10, s), bounded(pPrime + 10, s))`
with `MakeDecimal(_, min(pPrime + 10, 38), s)`. The merged form's inner
`MakeDecimal` narrowed the overflow check to `10^(p+10)`, where the
un-rewritten `SUM(Cast(x, dec(pPrime, s)))` accepted up to
`10^min(pPrime+10, 38)`. For `pPrime + 10 > 18`, `Decimal.setOrNull` falls
into the BigDecimal branch and never rejects a Long, so the cleaner form
preserves the original overflow boundary for all Long-fit sums.
2. **AVG arm**: Change the guard from `p <= 7`
(`AVG_PEEL_MAX_INNER_PRECISION`) to `pPrime + 4 <= 15`
(`MAX_DOUBLE_DIGITS`). The merged guard switched `AVG(CAST(x AS DECIMAL(pPrime, s)))`
from full Decimal arithmetic to Double-regime whenever `p <= 7`, including
the `pPrime > 11` band where un-rewritten was Decimal-exact. The new guard
ensures the rewrite only fires inside the existing AVG fast-path's Double
envelope.
## Why are the changes needed?
A query rewrite should preserve the query's observable semantics. The
SPARK-56627 rule changed semantics in two ways:
- **SUM**: Inner `MakeDecimal(_, p + 10, s)` could return null (non-ANSI) or
throw (ANSI) for long-fit sums in \`(10^(p+10), 10^min(pPrime+10, 38))\`.
Example: \`SUM(CAST(x AS DECIMAL(15, 2)))\` where \`x: DECIMAL(5, 2)\`
rejected at \`10^15\` instead of \`10^25\` -- reachable around \`~10^10\`
rows of small-precision input.
- **AVG**: For \`pPrime > 11, p <= 7\`, the rule switched an exact-Decimal
computation into Double-regime, visible as last-digit rounding differences
at any input size. TPC-DS q18 (\`p=7, pPrime=12\`) was affected.
## Does this PR introduce _any_ user-facing change?
Yes -- restores the un-rewritten semantics for affected queries:
- \`SUM(CAST(x AS DECIMAL(pPrime, s)))\` no longer rejects long-fit sums
that the un-rewritten expression would have accepted.
- \`AVG(CAST(x AS DECIMAL(pPrime, s)))\` with \`pPrime + 4 > MAX_DOUBLE_DIGITS\`
is no longer peeled -- the un-rewritten Decimal-exact path is preserved.
TPC-DS q18 AVG aggregations revert to the un-peeled form (visible in the
regenerated \`q18\` plan-stability files).
## How was this patch tested?
- \`DecimalAggregatesSuite\`: 37/37 pass, including updated SUM arm shape
assertions, the property-test invariants for the new MakeDecimal-only
form, and the new AVG bound boundary tests.
- \`DataFrameAggregateSuite\`: SPARK-56627 numerical-equivalence PBTs pass
under the new AVG generator domain (\`pPrime in [p+1, 11]\`).
- TPC-DS q18 plan-stability files regenerated locally.
- \`DecimalAggregatesBenchmark\` AVG cases (B2, B4) updated to \`pPrime = 11\`
(the new bound). Result files left stale relative to the code change --
the \`benchmark.yml\` GHA workflow can regenerate them.
## Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
Closes #56036 from cloud-fan/SPARK-decimal-aggregates-semantics-fix.
Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>1 parent 74816d7 commit 846376a
8 files changed
Lines changed: 152 additions & 216 deletions
File tree
- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst/optimizer
- test/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/test
- resources/tpcds-plan-stability/approved-plans-v1_4
- q18.sf100
- q18
- scala/org/apache/spark/sql
- execution/benchmark
Lines changed: 20 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2564 | 2564 | | |
2565 | 2565 | | |
2566 | 2566 | | |
2567 | | - | |
2568 | | - | |
2569 | | - | |
2570 | | - | |
2571 | | - | |
2572 | | - | |
| 2567 | + | |
| 2568 | + | |
2573 | 2569 | | |
2574 | 2570 | | |
2575 | 2571 | | |
| |||
2604 | 2600 | | |
2605 | 2601 | | |
2606 | 2602 | | |
| 2603 | + | |
| 2604 | + | |
| 2605 | + | |
| 2606 | + | |
| 2607 | + | |
2607 | 2608 | | |
2608 | 2609 | | |
2609 | | - | |
2610 | | - | |
2611 | | - | |
2612 | | - | |
2613 | | - | |
2614 | | - | |
| 2610 | + | |
| 2611 | + | |
| 2612 | + | |
| 2613 | + | |
2615 | 2614 | | |
2616 | 2615 | | |
2617 | 2616 | | |
2618 | 2617 | | |
2619 | 2618 | | |
2620 | | - | |
2621 | | - | |
| 2619 | + | |
| 2620 | + | |
| 2621 | + | |
| 2622 | + | |
| 2623 | + | |
| 2624 | + | |
| 2625 | + | |
2622 | 2626 | | |
2623 | | - | |
| 2627 | + | |
2624 | 2628 | | |
2625 | 2629 | | |
2626 | 2630 | | |
2627 | | - | |
| 2631 | + | |
2628 | 2632 | | |
2629 | 2633 | | |
2630 | 2634 | | |
| |||
Lines changed: 78 additions & 140 deletions
Large diffs are not rendered by default.
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
257 | 257 | | |
258 | 258 | | |
259 | 259 | | |
260 | | - | |
| 260 | + | |
261 | 261 | | |
262 | 262 | | |
263 | 263 | | |
| |||
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
271 | | - | |
272 | | - | |
273 | | - | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
230 | | - | |
| 230 | + | |
231 | 231 | | |
232 | 232 | | |
233 | 233 | | |
| |||
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
241 | | - | |
242 | | - | |
243 | | - | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
0 commit comments