Skip to content

Add missing aggregate functions#1471

Merged
timsaucer merged 14 commits intoapache:mainfrom
timsaucer:feat/expose-agg-fns
Apr 7, 2026
Merged

Add missing aggregate functions#1471
timsaucer merged 14 commits intoapache:mainfrom
timsaucer:feat/expose-agg-fns

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #861
Closes #925
Closes #1454

Rationale for this change

These functions exist upstream but were not exposed to the Python API

What changes are included in this PR?

Expose functions to Python API
Add unit tests

Are there any user-facing changes?

New addition only.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes several upstream DataFusion aggregate functions that were previously missing from the Python API, and adds unit tests to validate the new bindings.

Changes:

  • Expose grouping and percentile_cont from Rust bindings to the Python API.
  • Add Python-level wrapper for var_population as an alias of var_pop.
  • Add unit tests covering percentile_cont, grouping, and var_population.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
crates/core/src/functions.rs Enables the grouping aggregate binding and adds a new percentile_cont pyfunction export.
python/datafusion/functions.py Adds public Python wrappers/exports for grouping, percentile_cont, and var_population.
python/tests/test_functions.py Adds unit tests for the newly exposed aggregate functions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

timsaucer and others added 2 commits April 3, 2026 16:04
…ation

Expose upstream DataFusion aggregate functions that were not yet
available in the Python API. Closes apache#1454.

- grouping: returns grouping set membership indicator (rewritten by
  the ResolveGroupingFunction analyzer rule before physical planning)
- percentile_cont: computes exact percentile using continuous
  interpolation (unlike approx_percentile_cont which uses t-digest)
- var_population: alias for var_pop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer force-pushed the feat/expose-agg-fns branch from 0c8dc78 to d16cff1 Compare April 3, 2026 20:22
@timsaucer timsaucer marked this pull request as ready for review April 5, 2026 12:38
Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping examples would make it a lot clearer. Without looking around or thinking about the description more deeply the usage isn't clear to me.

timsaucer and others added 7 commits April 6, 2026 09:00
Add docstring example to grouping(), parametrize percentile_cont tests,
and add multi-column grouping test case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expose ROLLUP, CUBE, and GROUPING SETS via the DataFrame API by adding
static methods on GroupingSet that construct the corresponding Expr
variants. Update grouping() docstring and tests to use the new API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ctly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add user documentation for GroupingSet.rollup, .cube, and
.grouping_sets with Pokemon dataset examples. Document the upstream
alias limitation (apache/datafusion#21411) in both the grouping()
docstring and the aggregation user guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer requested review from Copilot and ntjohnson1 April 6, 2026 16:31
@timsaucer
Copy link
Copy Markdown
Member Author

@ntjohnson1 I expanded the PR quite a bit to properly support grouping sets and I also updated the online documentation to explain it in a lot more detail.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second pass helped a lot. Just a nit on some minor formatting on the examples.

timsaucer and others added 5 commits April 7, 2026 08:14
…ples

- Add quantile_cont as alias for percentile_cont (matches upstream)
- Replace pa.concat_arrays batch pattern with collect_column() in docstrings
- Add percentile_cont, quantile_cont, var_population to docs function list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GroupingSet.rollup(), .cube(), and .grouping_sets() now accept both
Expr objects and string column names, consistent with DataFrame.aggregate().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer merged commit 898d73d into apache:main Apr 7, 2026
21 checks passed
@timsaucer timsaucer deleted the feat/expose-agg-fns branch April 7, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants