Skip to content

Commit ced6b21

Browse files
committed
Collapse array_length impls to single list<any1>
CI surfaced this on the post-rebase rex run: Duplicate key FunctionAnchor{urn=extension:org.opensearch:scalar_functions, key=array_length:list} (attempted merging values array_length:list and array_length:list) The Part 3 commit declared two impls — `list<varchar<L1>>` and `list<string>` — with the intent of covering both element-type families produced by `rex_extract_multi`'s pair of impls. But substrait's compound function key drops the inner parametric element type at the key level, so both impls collapse to the same key `array_length:list`. The YAML loader rejects the collision when the analytics-backend-datafusion plugin's `SimpleExtension.ExtensionCollection` merges the file in. Replace the two impls with a single `list<any1>` polymorphic impl. The `any1` type variable matches any element type at planning, so a call site that produces `list<varchar<L1>>` (rex_extract_multi varchar overload) and a call site that produces `list<string>` (rex_extract_multi string overload) both bind to the one impl. Net effect on planning is equivalent and the duplicate-key collision goes away. The duplicate didn't surface on the original rex CI run because the prior PPL_REX_MAX_MATCH_LIMIT NPE failed every query at plan time before the function-extension merge was reached. Once the mavenLocal pin fix landed the prior commit and queries actually reached the planner, this older latent collision was unmasked. Signed-off-by: Jialiang Liang <jiallian@amazon.com>
1 parent b9b9da9 commit ced6b21

2 files changed

Lines changed: 6 additions & 11 deletions

File tree

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionAnalyticsBackendPlugin.java

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -140,12 +140,6 @@ public class DataFusionAnalyticsBackendPlugin implements AnalyticsSearchBackendP
140140
ScalarFunction.REX_EXTRACT,
141141
ScalarFunction.REX_EXTRACT_MULTI,
142142
ScalarFunction.REX_OFFSET,
143-
// ARRAY_LENGTH — counts elements in array<*>; needed end-to-end so PPL queries can size
144-
// the list returned by `rex field=f "(?<g>...)"` extract-mode (CalciteRexCommandIT's
145-
// testRexMaxMatch{Zero,Within,At}DefaultLimit and testRexMaxMatchConfigurableLimit all
146-
// do `eval count = array_length(g)`). DataFusion has it natively; isthmus default
147-
// catalog binds it.
148-
ScalarFunction.ARRAY_LENGTH,
149143
ScalarFunction.PLUS,
150144
ScalarFunction.TIMES,
151145
ScalarFunction.DIVIDE,

sandbox/plugins/analytics-backend-datafusion/src/main/resources/opensearch_scalar_functions.yaml

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -441,13 +441,14 @@ scalar_functions:
441441
(`eval count = array_length(g)` after `rex field=f "(?<g>...)"`).
442442
datafusion-substrait resolves the extension name "array_length" to
443443
DataFusion's native `array_length` UDF.
444+
Single polymorphic impl over `list<any1>`: substrait's compound function
445+
key drops the inner parametric element type, so declaring separate
446+
`list<varchar<L1>>` and `list<string>` impls collapses to the same key
447+
`array_length:list` and trips the SimpleExtension loader's duplicate-key
448+
check at plugin start. The single `any1` impl matches both call sites.
444449
impls:
445450
- args:
446-
- value: "list<varchar<L1>>"
447-
name: "input"
448-
return: i64
449-
- args:
450-
- value: "list<string>"
451+
- value: "list<any1>"
451452
name: "input"
452453
return: i64
453454

0 commit comments

Comments
 (0)