Description
Currently all built-in functions — aggregates (func.count(), func.sum()), row-level transforms (func.path.file_stem(), func.string.length()), and window functions (func.row_number(), func.rank()) - return the same Func class.
This means type hints like group_by(**kwargs: Func) or mutate(**kwargs: Func) accept any function, even though:
- Passing an aggregate to
mutate without a window is invalid
- Passing a row-level function to
group_by as an aggregation is invalid
- Window functions must be used with
.over() and can't be used directly in group_by or mutate
These misuses are only caught at runtime. Splitting Func into distinct types (e.g., WindowFunc, AggFunc, Func) would:
- Make API signatures self-documenting - users see which kind of function is expected
- Enable type checkers to catch misuse before runtime
- Improve IDE autocomplete by narrowing suggestions to valid functions
- AI will generate better code
An extra action: AggFunc and WindowFunc needs to be separated from dc.func namespace. At the same time a backward compatibility needs to stay for some time.
Description
Currently all built-in functions — aggregates (
func.count(),func.sum()), row-level transforms (func.path.file_stem(),func.string.length()), and window functions (func.row_number(),func.rank()) - return the sameFuncclass.This means type hints like
group_by(**kwargs: Func)ormutate(**kwargs: Func)accept any function, even though:mutatewithout a window is invalidgroup_byas an aggregation is invalid.over()and can't be used directly ingroup_byormutateThese misuses are only caught at runtime. Splitting
Funcinto distinct types (e.g.,WindowFunc,AggFunc,Func) would:An extra action: AggFunc and WindowFunc needs to be separated from dc.func namespace. At the same time a backward compatibility needs to stay for some time.