Commit c059c38
committed
[Analytics Backend / DataFusion] Onboard PPL mvzip via custom Rust UDF
PPL `mvzip(left, right [, sep])` element-wise zips two arrays into a list of
strings, joined per pair by a separator (default `,`). DataFusion has no
stdlib equivalent — `array_concat` is end-to-end concatenation, and Substrait's
lambda support is too thin for a transform/zip rewrite — so this onboards a
custom Rust ScalarUDF on the analytics-backend-datafusion plugin's session
context and wires the Java side to route to it.
Templated shape (extends the existing pattern from convert_tz):
Rust side:
udf::mvzip::MvzipUdf — Signature::user_defined; coerce_types pins the
first two args to ListArray and the optional 3rd to Utf8; invoke_with_args
iterates per row, takes min(len(left), len(right)) elements, stringifies
each (matching `Objects.toString(elem, "")` for null elements), and
builds a List<Utf8>. Defensive Null-element-type arm handles the empty
array case before the SQL-plugin VARCHAR-default kicks in.
Registered on each session context via udf::register_all alongside
convert_tz. 7 unit tests cover the basic / custom-sep / truncation /
null-element / null-array / empty-array / numeric-array shapes.
Java side:
ScalarFunction.MVZIP enum entry (SqlKind.OTHER_FUNCTION; resolves through
identifier-name valueOf("MVZIP") since PPL's MVZipFunctionImpl registers
under the function name "mvzip").
MvzipAdapter — locally-declared SqlFunction("mvzip") + ADDITIONAL_SCALAR_SIGS
bridge so isthmus emits a Substrait scalar function call with the exact
name the Rust UDF is registered under.
DataFusionAnalyticsBackendPlugin: ARRAY_RETURNING_PROJECT_OPS membership
(returns ARRAY<VARCHAR>, registered against FieldType.ARRAY); adapter
registration in scalarFunctionAdapters().
opensearch_array_functions.yaml: two impls for arity-2 and arity-3.
# Pass-rate (CalciteArrayFunctionIT, force-routed)
* Before: 28/60.
* After: 34/60.
Newly passing — all 5 testMvzip* variants:
testMvzipBasic, testMvzipWithCustomDelimiter, testMvzipNested,
testMvzipWithEmptyArray, testMvzipWithBothEmptyArrays.
(Test count delta is +6 because the test class also exercises mvzip in 1
other test under a different name, picked up by the same fix.)
# Companion changes
This PR's run also picks up the SQL-plugin companion #5421 which defaults
empty `array()` to ARRAY<VARCHAR>. Without that companion the testMvzipWith*EmptyArray
variants would still fail — substrait would reject the input ARRAY<NULL>
type before reaching the UDF. The Rust UDF's Null-element arm exists as a
defensive backstop in case the call ever reaches it with a null-typed list.
Signed-off-by: Kai Huang <ahkcs@amazon.com>1 parent 9a4587a commit c059c38
7 files changed
Lines changed: 523 additions & 4 deletions
File tree
- sandbox
- libs/analytics-framework/src/main/java/org/opensearch/analytics/spi
- plugins/analytics-backend-datafusion
- rust/src/udf
- src/main
- java/org/opensearch/be/datafusion
- resources
Lines changed: 9 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | | - | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
162 | 170 | | |
163 | 171 | | |
164 | 172 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
119 | | - | |
| 120 | + | |
| 121 | + | |
120 | 122 | | |
121 | 123 | | |
122 | 124 | | |
| |||
0 commit comments