Skip to content

Distinguish aggregate and row-level functions at the type level #1699

@dmpetrov

Description

@dmpetrov

Description

Currently all built-in functions — aggregates (func.count(), func.sum()), row-level transforms (func.path.file_stem(), func.string.length()), and window functions (func.row_number(), func.rank()) - return the same Func class.

This means type hints like group_by(**kwargs: Func) or mutate(**kwargs: Func) accept any function, even though:

  • Passing an aggregate to mutate without a window is invalid
  • Passing a row-level function to group_by as an aggregation is invalid
  • Window functions must be used with .over() and can't be used directly in group_by or mutate

These misuses are only caught at runtime. Splitting Func into distinct types (e.g., WindowFunc, AggFunc, Func) would:

  • Make API signatures self-documenting - users see which kind of function is expected
  • Enable type checkers to catch misuse before runtime
  • Improve IDE autocomplete by narrowing suggestions to valid functions
  • AI will generate better code

An extra action: AggFunc and WindowFunc needs to be separated from dc.func namespace. At the same time a backward compatibility needs to stay for some time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions