Skip to content

Commit 99bc960

Browse files
timsaucerclaude
andauthored
Add missing array functions (apache#1468)
* Add missing array/list functions and aliases (apache#1452) Add new array functions from upstream DataFusion v53: array_any_value, array_distance, array_max, array_min, array_reverse, arrays_zip, string_to_array, and gen_series. Add corresponding list_* aliases and missing list_* aliases for existing functions (list_empty, list_pop_back, list_pop_front, list_has, list_has_all, list_has_any). Also add array_contains/list_contains as aliases for array_has, generate_series as alias for gen_series, and string_to_list as alias for string_to_array. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add unit tests for new array/list functions and aliases Tests cover all functions and aliases added in the previous commit: array_any_value, array_distance, array_max, array_min, array_reverse, arrays_zip, string_to_array, gen_series, generate_series, array_contains, list_contains, list_empty, list_pop_back, list_pop_front, list_has, list_has_all, list_has_any, and list_* aliases for the new functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Improve array function APIs: optional params, better naming, restore comment - Make null_string optional in string_to_array/string_to_list - Make step optional in gen_series/generate_series - Rename second_array to element in array_contains/list_has/list_contains - Restore # Window Functions section comment in __all__ - Add tests for optional parameter variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Consolidate array/list function tests using pytest parametrize Reduce 26 individual tests to 14 test functions with parametrized cases, eliminating boilerplate while maintaining full coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move list alias tests into existing test_array_functions parametrize block Merge standalone tests for list_empty, list_pop_back, list_pop_front, list_has, array_contains, list_contains, list_has_all, and list_has_any into the existing parametrized test_array_functions block alongside their array_* counterparts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Merge test_array_any_value into parametrized test_any_value_aliases Use the richer multi-row dataset (including all-nulls case) for both array_any_value and list_any_value via the parametrized test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add arrays_overlap and list_overlap as aliases for array_has_any These aliases match the upstream DataFusion SQL-level aliases, completing the set of missing array functions from issue apache#1452. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add docstring examples for optional params in string_to_array and gen_series Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update AGENTS file to demonstrate preferred method of documenting python functions --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ff15648 commit 99bc960

File tree

4 files changed

+547
-0
lines changed

4 files changed

+547
-0
lines changed

AGENTS.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,20 @@ Skills follow the [Agent Skills](https://agentskills.io) open standard. Each ski
2525

2626
- `SKILL.md` — The skill definition with YAML frontmatter (name, description, argument-hint) and detailed instructions.
2727
- Additional supporting files as needed.
28+
29+
## Python Function Docstrings
30+
31+
Every Python function must include a docstring with usage examples.
32+
33+
- **Examples are required**: Each function needs at least one doctest-style example
34+
demonstrating basic usage.
35+
- **Optional parameters**: If a function has optional parameters, include separate
36+
examples that show usage both without and with the optional arguments. Pass
37+
optional arguments using their keyword name (e.g., `step=dfn.lit(3)`) so readers
38+
can immediately see which parameter is being demonstrated.
39+
- **Reuse input data**: Use the same input data across examples wherever possible.
40+
The examples should demonstrate how different optional arguments change the output
41+
for the same input, making the effect of each option easy to understand.
42+
- **Alias functions**: Functions that are simple aliases (e.g., `list_sort` aliasing
43+
`array_sort`) only need a one-line description and a `See Also` reference to the
44+
primary function. They do not need their own examples.

crates/core/src/functions.rs

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,50 @@ fn array_cat(exprs: Vec<PyExpr>) -> PyExpr {
9393
array_concat(exprs)
9494
}
9595

96+
#[pyfunction]
97+
fn array_distance(array1: PyExpr, array2: PyExpr) -> PyExpr {
98+
let args = vec![array1.into(), array2.into()];
99+
Expr::ScalarFunction(datafusion::logical_expr::expr::ScalarFunction::new_udf(
100+
datafusion::functions_nested::distance::array_distance_udf(),
101+
args,
102+
))
103+
.into()
104+
}
105+
106+
#[pyfunction]
107+
fn arrays_zip(exprs: Vec<PyExpr>) -> PyExpr {
108+
let exprs = exprs.into_iter().map(|x| x.into()).collect();
109+
datafusion::functions_nested::expr_fn::arrays_zip(exprs).into()
110+
}
111+
112+
#[pyfunction]
113+
#[pyo3(signature = (string, delimiter, null_string=None))]
114+
fn string_to_array(string: PyExpr, delimiter: PyExpr, null_string: Option<PyExpr>) -> PyExpr {
115+
let mut args = vec![string.into(), delimiter.into()];
116+
if let Some(null_string) = null_string {
117+
args.push(null_string.into());
118+
}
119+
Expr::ScalarFunction(datafusion::logical_expr::expr::ScalarFunction::new_udf(
120+
datafusion::functions_nested::string::string_to_array_udf(),
121+
args,
122+
))
123+
.into()
124+
}
125+
126+
#[pyfunction]
127+
#[pyo3(signature = (start, stop, step=None))]
128+
fn gen_series(start: PyExpr, stop: PyExpr, step: Option<PyExpr>) -> PyExpr {
129+
let mut args = vec![start.into(), stop.into()];
130+
if let Some(step) = step {
131+
args.push(step.into());
132+
}
133+
Expr::ScalarFunction(datafusion::logical_expr::expr::ScalarFunction::new_udf(
134+
datafusion::functions_nested::range::gen_series_udf(),
135+
args,
136+
))
137+
.into()
138+
}
139+
96140
#[pyfunction]
97141
fn make_map(keys: Vec<PyExpr>, values: Vec<PyExpr>) -> PyExpr {
98142
let keys = keys.into_iter().map(|x| x.into()).collect();
@@ -681,6 +725,10 @@ array_fn!(array_intersect, first_array second_array);
681725
array_fn!(array_union, array1 array2);
682726
array_fn!(array_except, first_array second_array);
683727
array_fn!(array_resize, array size value);
728+
array_fn!(array_any_value, array);
729+
array_fn!(array_max, array);
730+
array_fn!(array_min, array);
731+
array_fn!(array_reverse, array);
684732
array_fn!(cardinality, array);
685733
array_fn!(flatten, array);
686734
array_fn!(range, start stop step);
@@ -1152,6 +1200,14 @@ pub(crate) fn init_module(m: &Bound<'_, PyModule>) -> PyResult<()> {
11521200
m.add_wrapped(wrap_pyfunction!(array_replace_all))?;
11531201
m.add_wrapped(wrap_pyfunction!(array_sort))?;
11541202
m.add_wrapped(wrap_pyfunction!(array_slice))?;
1203+
m.add_wrapped(wrap_pyfunction!(array_any_value))?;
1204+
m.add_wrapped(wrap_pyfunction!(array_distance))?;
1205+
m.add_wrapped(wrap_pyfunction!(array_max))?;
1206+
m.add_wrapped(wrap_pyfunction!(array_min))?;
1207+
m.add_wrapped(wrap_pyfunction!(array_reverse))?;
1208+
m.add_wrapped(wrap_pyfunction!(arrays_zip))?;
1209+
m.add_wrapped(wrap_pyfunction!(string_to_array))?;
1210+
m.add_wrapped(wrap_pyfunction!(gen_series))?;
11551211
m.add_wrapped(wrap_pyfunction!(flatten))?;
11561212
m.add_wrapped(wrap_pyfunction!(cardinality))?;
11571213

0 commit comments

Comments
 (0)