Skip to content

[SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input#56891

Open
uros-db wants to merge 2 commits into
apache:masterfrom
uros-db:grouping-sets-grand-total
Open

[SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input#56891
uros-db wants to merge 2 commits into
apache:masterfrom
uros-db:grouping-sets-grand-total

Conversation

@uros-db

@uros-db uros-db commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Lower the single-empty-grouping-set case to a global Aggregate (no grouping expressions, no Expand) in GroupingAnalyticsTransformer, so it returns one (grand total) row over empty input, matching the GROUP BY-less form and the SQL standard. grouping_id() folds to the constant 0, and grouping()/grouping_id() in HAVING/ORDER BY resolve against that constant. The fix lands in both the legacy fixed-point analyzer and the single-pass resolver, which share GroupingAnalyticsTransformer.

Why are the changes needed?

GROUP BY GROUPING SETS (()) is a grand total, semantically identical to an aggregation with no GROUP BY clause. It was lowered to a grouped Aggregate over an Expand (grouping by spark_grouping_id), so over empty input it returned zero rows instead of one. The same defect affected the equivalent empty GROUP BY CUBE() and GROUP BY ROLLUP(), which also lower to a single empty grouping set.

Does this PR introduce any user-facing change?

Yes, the new (correct) behavior is gated by an internal SQL config,
spark.sql.analyzer.lowerEmptyGroupingSetToGlobalAggregate.enabled (default true). When set to false, lowering reverts to the legacy Expand-based form (zero rows over empty input). The flag gates all three decision points (the transformer lowering and the grouping_id() resolution in each analyzer) so the off state reproduces pre-fix behavior identically in both analyzers.

How was this patch tested?

Tested via golden cases in grouping_set.sql (empty-input grand total, grouping_id() in SELECT/HAVING/ORDER BY, grouping() rejection, non-empty input), flag-off coverage in grouping_set_grand_total_disabled.sql, the regenerated group-analytics golden file, and ResolveGroupingAnalyticsSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Opus 4.8

@uros-db uros-db changed the title [WIP][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input [SPARK-57775][SQL] Return one row for GROUP BY GROUPING SETS (()) over empty input Jun 30, 2026
`GROUP BY GROUPING SETS (())` is a grand total, semantically identical to an
aggregation with no `GROUP BY` clause. It was lowered to a grouped `Aggregate`
over an `Expand` (grouping by `spark_grouping_id`), so over empty input it
returned zero rows instead of one. The same defect affected the equivalent empty
`GROUP BY CUBE()` and `GROUP BY ROLLUP()`, which also lower to a single empty
grouping set.

This lowers the single-empty-grouping-set case to a global `Aggregate` (no
grouping expressions, no `Expand`) in `GroupingAnalyticsTransformer`, so it
returns one (grand total) row over empty input, matching the `GROUP BY`-less
form and the SQL standard. `grouping_id()` folds to the constant `0`, and
`grouping()`/`grouping_id()` in `HAVING`/`ORDER BY` resolve against that
constant. The fix lands in both the legacy fixed-point analyzer and the
single-pass resolver, which share `GroupingAnalyticsTransformer`.

The behavior is gated by an internal SQL config,
`spark.sql.analyzer.lowerEmptyGroupingSetToGlobalAggregate.enabled` (default
true). When set to false, lowering reverts to the legacy `Expand`-based form
(zero rows over empty input). The flag gates all three decision points (the
transformer lowering and the `grouping_id()` resolution in each analyzer) so
the off state reproduces pre-fix behavior identically in both analyzers.

Tested via golden cases in `grouping_set.sql` (empty-input grand total,
`grouping_id()` in SELECT/HAVING/ORDER BY, `grouping()` rejection, non-empty
input), flag-off coverage in `grouping_set_grand_total_disabled.sql`, the
regenerated `group-analytics` golden file, and `ResolveGroupingAnalyticsSuite`.

Co-authored-by: Isaac
@uros-db uros-db force-pushed the grouping-sets-grand-total branch from 15651e4 to 045f843 Compare June 30, 2026 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant