Commit 40b2161
committed
[Analytics Backend / DataFusion] Onboard PPL mvappend via custom Rust UDF
PPL `mvappend(arg1, arg2, …)` flattens a mixed list of array and scalar
arguments into one array, dropping null arguments and null elements within
array arguments. DataFusion's `array_concat` is the closest stdlib match but
only accepts arrays (not mixed array+scalar) and preserves nulls — different
semantics. Onboards as a custom Rust ScalarUDF on the analytics-backend-datafusion
plugin's session context, mirroring the mvzip / mvfind pattern.
Templated shape:
Rust side:
udf::mvappend::MvappendUdf — Signature::user_defined; per-row walk over
operands, skipping NULL args and NULL elements inside array args, with
explicit Arrow type arms for {Int8/16/32/64, UInt8/16/32/64,
Float32/64, Boolean, Utf8/LargeUtf8/Utf8View}. The string arms output
List<Utf8> or List<Utf8View> depending on the inferred element type so
the result schema matches what `return_type` declared (DataFusion's
execution-time schema check rejects mismatches). Defensive Null
element-type arm covers the empty-array shape. 6 unit tests.
Registered on each session context via udf::register_all.
Java side:
ScalarFunction.MVAPPEND enum entry (SqlKind.OTHER_FUNCTION; resolves
through identifier-name valueOf("MVAPPEND")).
MvappendAdapter — locally-declared SqlFunction("mvappend") +
ADDITIONAL_SCALAR_SIGS bridge. Casts every scalar operand to the
call's array component type and every array operand to
ARRAY<componentType> before substrait emission, so the UDF sees a
single uniform element type across all positions.
DataFusionAnalyticsBackendPlugin: ARRAY_RETURNING_PROJECT_OPS membership
(returns ARRAY<commonType>); adapter registration in
scalarFunctionAdapters().
opensearch_array_functions.yaml: variadic min:1 entry with `list<any1?>`
return type.
# Pass-rate (CalciteMVAppendFunctionIT, force-routed)
* Before: 0/15.
* After: 6/15.
Newly passing:
testMvappendWithMultipleElements, testMvappendWithSingleElement,
testMvappendWithArrayFlattening, testMvappendWithStringValues,
testMvappendWithNestedArrays, testMvappendWithRealFields.
# Remaining failures
* 8 tests fail with "Unable to convert the type ANY". Root cause is
PPL's MVAppendFunctionImpl.updateMostGeneralType using strict
Object.equals on each pair of operand types, returning Calcite's
ANY type when any two don't match — including when they only differ
in nullability tag (a literal 3 is INTEGER NOT NULL but the
component type of `array(1, 2)` is INTEGER NULLABLE). Substrait
can't serialize ANY. The fix belongs in the SQL plugin's
MVAppendFunctionImpl (use typeFactory.leastRestrictive instead of
Object.equals) and isn't addressed here.
* testMvappendInWhereClause — uses `where array_length(combined) = 2`
which the analytics-engine planner rejects with "No backend can
evaluate filter predicate [EQUALS] on fields [combined:ARRAY]".
Filter-side capability gap unrelated to mvappend.
* testMvappendWithComplexExpression — fails substrait conversion on
a nested mvappend call ("Unable to convert call mvappend(list, …)"),
likely the same nullability widening pattern flowing through nested
calls. Same upstream fix applies.
# Pass-rate impact on the broader CalciteArrayFunctionIT
Unchanged at 43/60 — mvappend isn't exercised there.
Signed-off-by: Kai Huang <ahkcs@amazon.com>1 parent 67a97ad commit 40b2161
7 files changed
Lines changed: 613 additions & 4 deletions
File tree
- sandbox
- libs/analytics-framework/src/main/java/org/opensearch/analytics/spi
- plugins/analytics-backend-datafusion
- rust/src/udf
- src/main
- java/org/opensearch/be/datafusion
- resources
Lines changed: 10 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
177 | | - | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
178 | 187 | | |
179 | 188 | | |
180 | 189 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
123 | | - | |
| 124 | + | |
| 125 | + | |
124 | 126 | | |
125 | 127 | | |
126 | 128 | | |
| |||
0 commit comments