Skip to content

Commit 91a1b04

Browse files
timsaucerclaude
andcommitted
docs(aggregations): use .alias() on grouping(), drop obsolete workaround
apache/datafusion#21411 is resolved — `.alias()` now works directly on a `grouping()` expression. Removed the note describing the limitation and the with_column_renamed workaround in the rollup and grouping_sets examples, aliasing the grouping columns inline instead. Verified on the current branch: the aliased aggregates execute and produce the named columns. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent ad835cc commit 91a1b04

1 file changed

Lines changed: 4 additions & 22 deletions

File tree

docs/source/user-guide/common-operations/aggregations.md

Lines changed: 4 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -262,29 +262,15 @@ tell a grand-total `null` apart from a Pokemon that genuinely has no type? The
262262
{py:func}`~datafusion.functions.grouping` function returns `0` when the column is a grouping key
263263
for that row and `1` when it is aggregated across.
264264

265-
:::{note}
266-
Due to an upstream DataFusion limitation
267-
([apache/datafusion#21411](https://github.com/apache/datafusion/issues/21411)),
268-
`.alias()` cannot be applied directly to a `grouping()` expression — it will raise an
269-
error at execution time. Instead, use
270-
{py:meth}`~datafusion.dataframe.DataFrame.with_column_renamed` on the result DataFrame to
271-
give the column a readable name. Once the upstream issue is resolved, you will be able to
272-
use `.alias()` directly and the workaround below will no longer be necessary.
273-
:::
274-
275-
The raw column name generated by `grouping()` contains internal identifiers, so we use
276-
{py:meth}`~datafusion.dataframe.DataFrame.with_column_renamed` to clean it up:
265+
Apply `.alias()` to the `grouping()` expression to give the column a readable name:
277266

278267
```{code-cell} ipython3
279268
result = df.aggregate(
280269
[GroupingSet.rollup(col_type_1)],
281270
[f.count(col_speed).alias("Count"),
282271
f.avg(col_speed).alias("Avg Speed"),
283-
f.grouping(col_type_1)]
272+
f.grouping(col_type_1).alias("Is Total")]
284273
)
285-
for field in result.schema():
286-
if field.name.startswith("grouping("):
287-
result = result.with_column_renamed(field.name, "Is Total")
288274
result.sort(col_type_1.sort(ascending=True, nulls_first=True))
289275
```
290276

@@ -357,13 +343,9 @@ result = df.aggregate(
357343
[GroupingSet.grouping_sets([col_type_1], [col_type_2])],
358344
[f.count(col_speed).alias("Count"),
359345
f.avg(col_speed).alias("Avg Speed"),
360-
f.grouping(col_type_1),
361-
f.grouping(col_type_2)]
346+
f.grouping(col_type_1).alias("grouping(Type 1)"),
347+
f.grouping(col_type_2).alias("grouping(Type 2)")]
362348
)
363-
for field in result.schema():
364-
if field.name.startswith("grouping("):
365-
clean = field.name.split(".")[-1].rstrip(")")
366-
result = result.with_column_renamed(field.name, f"grouping({clean})")
367349
result.sort(
368350
col_type_1.sort(ascending=True, nulls_first=True),
369351
col_type_2.sort(ascending=True, nulls_first=True)

0 commit comments

Comments
 (0)