Skip to content

feat: adds array_add function#22459

Merged
Dandandan merged 5 commits into
apache:mainfrom
SubhamSinghal:array-add-function
May 26, 2026
Merged

feat: adds array_add function#22459
Dandandan merged 5 commits into
apache:mainfrom
SubhamSinghal:array-add-function

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

@SubhamSinghal SubhamSinghal commented May 22, 2026

Which issue does this PR close?

  • Part of #21536 (array_add — first PR in the vector math series).

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes, via SLT only (array_add.slt). Coverage:

  • Happy paths: basic, negative components, single-element, empty, multi-row.
  • NULL propagation: whole-row NULL on each side / both sides; element-level NULL on each side / both sides at same
    and different positions.
  • Type / variant: integer literals, mixed int+float, LargeList×LargeList, mixed List+LargeList,
    FixedSizeListList coercion, Float32 leaf, Int64 leaf.
  • Decimal handling: Decimal128 / Decimal256 rejected at planning; explicit cast to DOUBLE opt-in works.
  • Error paths: per-row length mismatch (exec), unsupported non-list input (plan), non-numeric leaf (plan), boolean
    leaf (plan), nested list (plan), wrong arg count.
  • Aliases: list_add single-row + multi-row.
  • Composition: array_add(array_add(...), ...) chained — single-row, with element NULLs propagating across both
    layers, and multi-row with row-level NULL.

Are there any user-facing changes?

Yes — two new functions:

  • array_add(array1, array2) → List<Float64> / LargeList<Float64>
  • list_add(...) alias

Both exposed via expr_fn and registered in all_default_nested_functions(). Documented inline via #[user_doc]
(description, syntax, SQL example, argument descriptions).

No breaking API changes.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 22, 2026
@crm26
Copy link
Copy Markdown
Contributor

crm26 commented May 22, 2026

Heads up — the check configs.md and ***_functions.md is up-to-date job is red because docs/source/user-guide/sql/scalar_functions.md needs to be regenerated for the new array_add entry from your #[user_doc] macro.

One-command fix:

./dev/update_function_docs.sh
git add docs/source/user-guide/sql/scalar_functions.md

Then amend or new-commit + push.

Ok(arg_types[0].clone())
}

fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in here seems to differ from what we usually have, e.g.

fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> {
let [_, _] = take_function_args(self.name(), arg_types)?;
let coercion = Some(&ListCoercion::FixedSizedListToList);
for arg_type in arg_types {
if !matches!(arg_type, Null | List(_) | LargeList(_) | FixedSizeList(..)) {
return plan_err!("{} does not support type {arg_type}", self.name());
}
}
// If any input is `LargeList`, both sides must be widened to `LargeList`
// so the runtime dispatch in `cosine_distance_inner` sees a homogeneous
// pair. Follows the pattern in `ArrayConcat::coerce_types`.
let any_large_list = arg_types.iter().any(|t| matches!(t, LargeList(_)));
let coerced = arg_types
.iter()
.map(|arg_type| {
if matches!(arg_type, Null) {
let field = Arc::new(Field::new_list_field(DataType::Float64, true));
return if any_large_list {
LargeList(field)
} else {
List(field)
};
}
let coerced = coerced_type_with_base_type_only(
arg_type,
&DataType::Float64,
coercion,
);
match coerced {
List(field) if any_large_list => LargeList(field),
other => other,
}
})
.collect();
Ok(coerced)
}

Is there a reason for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefffrey I have added more checks like

  1. List of List is handled upfront. List<List> throws error as non numeric
  2. Decimal128 and Decimal256 types throws error as downcasting it would loose precision.
  3. throws error if element type is non-numeric to avoid automatic typecasting of boolean or string type to numeric.

Let me know if I should get rid of these checks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @crm26

We should try to ensure these array math functions are consistent with each other on the signature they accept

Copy link
Copy Markdown
Contributor Author

@SubhamSinghal SubhamSinghal May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefffrey can we make this a util function?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be useful to keep them aligned, until we have better support in the signature API for these types of behaviour

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefffrey Create utility function coerce_array_math_arg_types with same implementation as other existing functions. We can add more checks to common function if needed.

Comment thread datafusion/functions-nested/src/array_add.rs Outdated
Comment thread datafusion/functions-nested/src/array_add.rs Outdated
Comment thread datafusion/functions-nested/src/array_add.rs Outdated
Comment thread datafusion/functions-nested/src/array_add.rs Outdated
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 23, 2026
@Dandandan Dandandan added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 26, 2026
zhuqi-lucas pushed a commit to zhuqi-lucas/arrow-datafusion that referenced this pull request May 26, 2026
## Which issue does this PR close?

Partial of apache#21536 — `array_scale` (the list+scalar arithmetic function
in the vector math series).

## Rationale for this change

Continues the per-function split requested by @alamb on apache#21536. Three
sibling PRs already merged: `cosine_distance` (apache#21542), `inner_product`
(apache#21861), `array_normalize` (apache#22013). `array_add` is in flight as apache#22459
by @SubhamSinghal.

Adds element-wise scalar multiplication for numeric arrays, returning a
list of the same shape. Aliased as `list_scale` to match the `array_X` /
`list_X` precedent in this crate.

## What changes are included in this PR?

- New scalar UDF `array_scale(array, scalar)` in
`datafusion/functions-nested/src/array_scale.rs`
- Module wire-up + registration in
`datafusion/functions-nested/src/lib.rs`
- SLT tests at `datafusion/sqllogictest/test_files/array_scale.slt`
- Auto-generated function docs entry in
`docs/source/user-guide/sql/scalar_functions.md`

**Signature:** first arg `List/LargeList/FixedSizeList<numeric>`, second
arg numeric scalar. Both coerce to `Float64`. Same list-widening rules
as the binary-op siblings.

**NULL semantics:**
- NULL row in array → NULL row out
- NULL scalar → NULL row out (whole-row, because the scalar applies
uniformly)
- NULL element at position \`i\` → NULL element at \`i\` out
(per-element propagation)
- Empty array → empty array

**Builders:** uses \`OffsetBufferBuilder\` + \`NullBufferBuilder\` per
the pattern adopted in the round-1 review of apache#22013.

## Are these changes tested?

Yes. \`array_scale.slt\` covers:
- Happy paths (positive, negative, zero, fractional, single-element)
- NULL propagation at all three levels (NULL row, NULL scalar, NULL
element)
- All list type variants (\`List\`, \`LargeList\`, \`FixedSizeList\`)
- Numeric inner type coercion (Float32, Int64, integer literals)
- Multi-row queries with both constant-scalar broadcast and per-row
column scalar
- Error paths (non-numeric scalar, non-list first arg, wrong arity)
- Empty array
- \`list_scale\` alias

## Are there any user-facing changes?

Yes — new SQL scalar function \`array_scale(array, scalar)\` and its
alias \`list_scale\`. Documented in
\`docs/source/user-guide/sql/scalar_functions.md\`.
@SubhamSinghal
Copy link
Copy Markdown
Contributor Author

@Jefffrey Resolved merge conflict.

@Dandandan Dandandan added this pull request to the merge queue May 26, 2026
Merged via the queue into apache:main with commit 7d862d6 May 26, 2026
36 checks passed
@SubhamSinghal SubhamSinghal deleted the array-add-function branch May 27, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants