Commit 5293212
Add missing Dataframe functions (#1472)
* Add missing DataFrame methods for set operations and query
Expose upstream DataFusion DataFrame methods that were not yet
available in the Python API. Closes #1455.
Set operations:
- except_distinct: set difference with deduplication
- intersect_distinct: set intersection with deduplication
- union_by_name: union matching columns by name instead of position
- union_by_name_distinct: union by name with deduplication
Query:
- distinct_on: deduplicate rows based on specific columns
- sort_by: sort by expressions with ascending order and nulls last
Note: show_limit is already covered by the existing show(num) method.
explain_with_options and with_param_values are deferred as they require
exposing additional types (ExplainOption, ParamValues).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add ExplainFormat enum and format option to DataFrame.explain()
Extend the existing explain() method with an optional format parameter
instead of adding a separate explain_with_options() method. This keeps
the API simple while exposing all upstream ExplainOption functionality.
Available formats: indent (default), tree, pgjson, graphviz.
The ExplainFormat enum is exported from the top-level datafusion module.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add DataFrame.window() and unnest recursion options
Expose remaining DataFrame methods from upstream DataFusion.
Closes #1456.
- window(*exprs): apply window function expressions and append results
as new columns
- unnest_column/unnest_columns: add optional recursions parameter for
controlling unnest depth via (input_column, output_column, depth)
tuples
Note: drop_columns is already exposed as the existing drop() method.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update docstring
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Improve docstrings and test robustness for new DataFrame methods
Clarify except_distinct/intersect_distinct docstrings, add deterministic
sort to test_window, add sort_by ascending verification test, and add
smoke tests for PGJSON and GRAPHVIZ explain formats.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Consolidate new DataFrame tests into parametrized tests
Combine set operation tests (except_distinct, intersect_distinct,
union_by_name, union_by_name_distinct) into a single parametrized
test_set_operations_distinct. Merge sort_by tests and convert
explain format tests to parametrized form.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add doctest examples to new DataFrame method docstrings
Add >>> style usage examples for window, explain, except_distinct,
intersect_distinct, union_by_name, union_by_name_distinct, distinct_on,
sort_by, and unnest_columns to match existing docstring conventions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Improve error messages, tests, and API hygiene from PR review
- Provide actionable error message for invalid explain format strings
- Remove recursions param from deprecated unnest_column (use unnest_columns)
- Add null-handling test case for sort_by to verify nulls-last behavior
- Add format-specific assertions to explain tests (TREE, PGJSON, GRAPHVIZ)
- Add deep recursion test for unnest_columns with depth > 1
- Add multi-expression window test to verify variadic *exprs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Consolidate window and unnest tests into parametrized tests
Combine test_window and test_window_multiple_expressions into a single
parametrized test. Merge unnest recursion tests into one parametrized
test covering basic, explicit depth 1, and deep recursion cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address PR review feedback for DataFrame operations
- Use upstream parse error for explain format instead of hardcoded options
- Fix sort_by to use column name resolution consistent with sort()
- Use ExplainFormat enum members directly in tests instead of string lookup
- Merge union_by_name_distinct into union_by_name(distinct=False) for a
more Pythonic API
- Update check-upstream skill to note union_by_name_distinct coverage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add DataFrame.column(), col(), and find_qualified_columns() methods
Expose upstream find_qualified_columns to resolve unqualified column
names into fully qualified column expressions. This is especially
useful for disambiguating columns after joins.
- find_qualified_columns(*names) on Rust side calls upstream directly
- DataFrame.column(name) and col(name) alias on Python side
- Update join and join_on docstrings to reference DataFrame.col()
- Add "Disambiguating Columns with DataFrame.col()" section to joins docs
- Add tests for qualified column resolution, ambiguity, and join usage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Merge union_by_name and union_by_name_distinct into a single method with distinct flag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* converting into a python dict loses a column when the names are identical
* Consolidate except_all/except_distinct and intersect/intersect_distinct into single methods with distinct flag
Follows the same pattern as union(distinct=) and union_by_name(distinct=).
Also deprecates union_distinct() in favor of union(distinct=True).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>1 parent 898d73d commit 5293212
File tree
6 files changed
+767
-52
lines changed- .ai/skills/check-upstream
- crates/core/src
- docs/source/user-guide/common-operations
- python
- datafusion
- tests
6 files changed
+767
-52
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
| 112 | + | |
112 | 113 | | |
113 | 114 | | |
114 | 115 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
582 | 582 | | |
583 | 583 | | |
584 | 584 | | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
585 | 593 | | |
586 | 594 | | |
587 | 595 | | |
| |||
804 | 812 | | |
805 | 813 | | |
806 | 814 | | |
807 | | - | |
808 | | - | |
809 | | - | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
810 | 836 | | |
811 | 837 | | |
812 | 838 | | |
| |||
864 | 890 | | |
865 | 891 | | |
866 | 892 | | |
867 | | - | |
868 | | - | |
869 | | - | |
870 | | - | |
871 | | - | |
872 | | - | |
873 | | - | |
874 | | - | |
875 | | - | |
876 | | - | |
877 | | - | |
878 | | - | |
879 | | - | |
880 | | - | |
881 | | - | |
882 | | - | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
883 | 901 | | |
884 | 902 | | |
885 | 903 | | |
| |||
888 | 906 | | |
889 | 907 | | |
890 | 908 | | |
891 | | - | |
| 909 | + | |
892 | 910 | | |
893 | 911 | | |
894 | 912 | | |
895 | 913 | | |
| 914 | + | |
896 | 915 | | |
897 | | - | |
898 | | - | |
899 | | - | |
| 916 | + | |
900 | 917 | | |
901 | 918 | | |
902 | 919 | | |
| |||
907 | 924 | | |
908 | 925 | | |
909 | 926 | | |
910 | | - | |
911 | | - | |
912 | | - | |
913 | | - | |
914 | | - | |
915 | | - | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
916 | 936 | | |
917 | 937 | | |
918 | 938 | | |
919 | 939 | | |
920 | | - | |
921 | | - | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
922 | 949 | | |
923 | 950 | | |
924 | 951 | | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
925 | 1000 | | |
926 | 1001 | | |
927 | 1002 | | |
| |||
1295 | 1370 | | |
1296 | 1371 | | |
1297 | 1372 | | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
| 1392 | + | |
1298 | 1393 | | |
1299 | 1394 | | |
1300 | 1395 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
82 | 83 | | |
83 | 84 | | |
84 | 85 | | |
| 86 | + | |
85 | 87 | | |
86 | 88 | | |
87 | 89 | | |
| |||
0 commit comments