[SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input by uros-db · Pull Request #56891 · apache/spark

uros-db · 2026-06-30T06:41:51Z

What changes were proposed in this pull request?

Lower the single-empty-grouping-set case to a global Aggregate (no grouping expressions, no Expand) in GroupingAnalyticsTransformer, so it returns one (grand total) row over empty input, matching the GROUP BY-less form and the SQL standard. grouping_id() folds to the constant 0, and grouping()/grouping_id() in HAVING/ORDER BY resolve against that constant. The fix lands in both the legacy fixed-point analyzer and the single-pass resolver, which share GroupingAnalyticsTransformer.

Why are the changes needed?

GROUP BY GROUPING SETS (()) is a grand total, semantically identical to an aggregation with no GROUP BY clause. It was lowered to a grouped Aggregate over an Expand (grouping by spark_grouping_id), so over empty input it returned zero rows instead of one. The same defect affected the equivalent empty GROUP BY CUBE() and GROUP BY ROLLUP(), which also lower to a single empty grouping set.

Does this PR introduce any user-facing change?

Yes, the new (correct) behavior is gated by an internal SQL config,
spark.sql.analyzer.lowerEmptyGroupingSetToGlobalAggregate.enabled (default true). When set to false, lowering reverts to the legacy Expand-based form (zero rows over empty input). The flag gates all three decision points (the transformer lowering and the grouping_id() resolution in each analyzer) so the off state reproduces pre-fix behavior identically in both analyzers.

How was this patch tested?

Tested via golden cases in grouping_set.sql (empty-input grand total, grouping_id() in SELECT/HAVING/ORDER BY, grouping() rejection, non-empty input), flag-off coverage in grouping_set_grand_total_disabled.sql, the regenerated group-analytics golden file, and ResolveGroupingAnalyticsSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Opus 4.8

`GROUP BY GROUPING SETS (())` is a grand total, semantically identical to an aggregation with no `GROUP BY` clause. It was lowered to a grouped `Aggregate` over an `Expand` (grouping by `spark_grouping_id`), so over empty input it returned zero rows instead of one. The same defect affected the equivalent empty `GROUP BY CUBE()` and `GROUP BY ROLLUP()`, which also lower to a single empty grouping set. This lowers the single-empty-grouping-set case to a global `Aggregate` (no grouping expressions, no `Expand`) in `GroupingAnalyticsTransformer`, so it returns one (grand total) row over empty input, matching the `GROUP BY`-less form and the SQL standard. `grouping_id()` folds to the constant `0`, and `grouping()`/`grouping_id()` in `HAVING`/`ORDER BY` resolve against that constant. The fix lands in both the legacy fixed-point analyzer and the single-pass resolver, which share `GroupingAnalyticsTransformer`. The behavior is gated by an internal SQL config, `spark.sql.analyzer.lowerEmptyGroupingSetToGlobalAggregate.enabled` (default true). When set to false, lowering reverts to the legacy `Expand`-based form (zero rows over empty input). The flag gates all three decision points (the transformer lowering and the `grouping_id()` resolution in each analyzer) so the off state reproduces pre-fix behavior identically in both analyzers. Tested via golden cases in `grouping_set.sql` (empty-input grand total, `grouping_id()` in SELECT/HAVING/ORDER BY, `grouping()` rejection, non-empty input), flag-off coverage in `grouping_set_grand_total_disabled.sql`, the regenerated `group-analytics` golden file, and `ResolveGroupingAnalyticsSuite`. Co-authored-by: Isaac

uros-db changed the title ~~[WIP][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input~~ [SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input Jun 30, 2026

uros-db force-pushed the grouping-sets-grand-total branch from 15651e4 to 045f843 Compare June 30, 2026 08:50

Merge branch 'apache:master' into grouping-sets-grand-total

17d87f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input#56891

[SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input#56891
uros-db wants to merge 2 commits into
apache:masterfrom
uros-db:grouping-sets-grand-total

uros-db commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

uros-db commented Jun 30, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant