Commit ed78d03
authored
Enable PPL eval string concat on the analytics-engine route via DataFusion CONCAT/CAST (opensearch-project#21498)
* [Analytics Framework] Resolve symbolic operators and add SAFE_CAST
Calcite emits SqlBinaryOperators (e.g. `||`, the lowering target of PPL
string `+`) with SqlKind.OTHER and a non-identifier name. The existing
ScalarFunction.fromSqlKind / fromSqlFunction pair fails to resolve these:
fromSqlKind misses (OTHER is shared), fromSqlFunction throws because
`||` is not a SqlFunction (it's a SqlBinaryOperator). The planner-side
fallout is "No backend supports scalar function [null] among
[datafusion]" with no useful name in the error.
Introduce ScalarFunction.fromSqlOperator(SqlOperator) — the unified entry
point used by OpenSearchProjectRule, OpenSearchFilterRule, and
BackendPlanAdapter in subsequent commits. Resolution order:
1. SqlKind via fromSqlKind (covers PLUS, CAST, COALESCE, etc.)
2. Symbolic-name lookup (handles `||` -> CONCAT)
3. Identifier-name valueOf fallback (covers UPPER, LOWER, etc.)
The symbolic-name table currently has one entry (`||` -> CONCAT) but is
the documented extension point for future SqlBinaryOperators with non-
identifier names.
Also adds SAFE_CAST as a sibling enum constant to CAST. PPL emits
explicit `CAST(... AS ...)` lowered to Calcite's SqlKind.SAFE_CAST when
the source value may be NULL or the conversion may fail. SAFE_CAST and
CAST share the same backend semantics (DataFusion's native cast already
returns NULL on conversion failure) but resolve through distinct
SqlKinds, so they need distinct enum entries.
Unit test pins all three resolution branches plus the unknown-operator
return-null contract — a regression that drops a branch surfaces here
rather than as an opaque "[null]" IT failure.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Analytics Engine] Migrate rules and adapter dispatch to fromSqlOperator
Three call sites resolved a RexCall's operator using the same two-step
pattern (SqlKind first, SqlFunction-cast second) and all three failed
identically on `||` (a SqlBinaryOperator with SqlKind.OTHER):
- OpenSearchProjectRule.resolveScalarViableBackends
- OpenSearchFilterRule (predicate operator resolution)
- BackendPlanAdapter.resolveFunction (per-function adapter dispatch)
Migrate all three to ScalarFunction.fromSqlOperator, the unified
resolver added in the previous commit. Behavior for previously-resolved
operators is unchanged — fromSqlOperator delegates to fromSqlKind first,
so anything that resolved through SqlKind continues to. New behavior:
`||` now resolves to CONCAT, and unrecognized operators return null
(catching the IllegalArgumentException that fromSqlFunction's valueOf
threw before; the call sites already handled null and now produce a
better-formed error message that includes the operator name).
Also drop the unused SqlFunction import in OpenSearchFilterRule and
BackendPlanAdapter, and tighten the OpenSearchProjectRule error message
to fall back to operator.getName() when the resolver returns null —
"[null]" was unactionable for triage; "[||]" or "[<unknown_name>]"
points directly at the missing capability.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Analytics Backend / DataFusion] Wire CONCAT/CAST/SAFE_CAST + concat null adapter
Three new ScalarFunctions in STANDARD_PROJECT_OPS:
- CONCAT — lowering target of PPL `eval`'s `+` for strings (Calcite
emits `||`, resolved to CONCAT through the symbolic-name
branch of ScalarFunction.fromSqlOperator)
- CAST — covers PPL's explicit `CAST(... AS ...)` over non-null
source types (Calcite emits SqlKind.CAST)
- SAFE_CAST — same surface, but emitted by Calcite when the source
value is nullable (SqlKind.SAFE_CAST)
CONCAT additionally needs a ScalarFunctionAdapter to preserve null
semantics. Calcite's `||` follows the SQL standard: if any operand is
NULL, the result is NULL. Substrait's default `concat` extension is
documented with the same semantics, but DataFusion's substrait reader
maps it to the DataFusion `concat()` function — which deviates from the
standard and treats NULL operands as empty strings. PPL queries like
`'Age: ' + CAST(null AS STRING)` expect NULL, not 'Age: '.
ConcatFunctionAdapter rewrites `||(a, b, ...)` into
CASE WHEN a IS NULL OR b IS NULL OR ... THEN NULL ELSE ||(a, b, ...) END
The inner `||` survives unchanged and serializes through the same
Substrait conversion path; the surrounding CASE/IS_NULL short-circuits
the DataFusion `concat()` call whenever any operand is NULL, restoring
SQL-standard null propagation without a custom DataFusion UDF.
Trade-off: the rewrite double-evaluates each operand (once in IS_NULL,
once in the inner `||`). For RexInputRef and RexLiteral operands —
the only shapes PPL emits today for string concat — this is free; for
nested calls the cost is proportional to operand count, not operand
depth, since each `||` adapter wraps one CASE around its direct call.
A custom null-propagating concat UDF (Bucket-3 work in
sandbox/plugins/analytics-backend-datafusion/rust) is the alternative
but disproportionate for a Bucket-1 surface.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [QA] Add EvalCommandIT for the analytics-engine REST path
Self-contained integration test for PPL `eval` on the analytics-engine
route. Mirrors CalciteEvalCommandIT in opensearch-project/sql so the
analytics-engine path can be verified inside core without cross-plugin
dependencies on the SQL plugin. Each test sends a PPL query through
POST /_analytics/ppl (exposed by test-ppl-frontend) which runs the
same UnifiedQueryPlanner -> CalciteRelNodeVisitor -> Substrait ->
DataFusion pipeline as the SQL plugin's force-routed analytics path.
Four tests on the calcs dataset cover the eval surface this PR enables:
- testEvalStringConcatLiteralPlusField — `'literal' + str_field`
exercises the symbolic-name resolution for `||` and the CONCAT
capability; null str field rows assert null propagation through
the CASE adapter.
- testEvalStringConcatWithCastIntField — `'literal' + CAST(int AS STRING)`
exercises both CAST/SAFE_CAST and CONCAT in the same projection;
null int rows confirm CAST(NULL) -> NULL propagates through the
surrounding concat.
- testEvalStringConcatMultipleLiteralsAndFields — chained four-arg
concat exercises the recursive AnnotatedProjectExpression strip
for nested project calls.
- testEvalStringConcatTwoFields — pure field-to-field concat with
no literal operands; planner takes the hasFieldRef=true path in
resolveScalarViableBackends.
Reuses the existing calcs dataset (no new fixtures). Once this lands,
the SQL-plugin's CalciteEvalCommandIT is verification-only — this QA
IT is the source of truth for the analytics-engine path.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Analytics Framework] Rename fromSqlOperator to fromSqlOperatorWithFallback
Per @expani's PR feedback: the method walks three resolution paths
(SqlKind, symbolic-name table, identifier-name valueOf) before returning
null, so the name should advertise the fallback behavior at the call
site rather than only in the javadoc.
Mechanical rename across all callers — `ScalarFunction.fromSqlOperator`
-> `ScalarFunction.fromSqlOperatorWithFallback` in:
- the resolver itself plus its 7 unit tests
- OpenSearchProjectRule (2 call sites)
- OpenSearchFilterRule (1 call site)
- BackendPlanAdapter.resolveFunction (1 call site)
- EvalCommandIT javadoc cross-reference
No behavioral change.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Analytics Framework + Backend] Address @expani review on PR opensearch-project#21498
Three feedback items in one commit:
1. Co-locate symbolic operator name with the enum constant.
The static SYMBOLIC_OPERATOR_NAMES map duplicated a property that
belongs on the enum itself. Moved to a nullable `symbolicOperatorName`
field on each ScalarFunction constant — currently set only on CONCAT
("||"). The reverse-index map is now built from the enum at class-init
time, so adding a new symbolic operator is a single-site edit on the
constant rather than a separate map entry.
2. Inline the OR/IS_NULL fold in ConcatFunctionAdapter.
Drop the temporary List<RexNode> nullChecks and accumulate the
OR-of-IS_NULLs directly in the loop body. Same generated tree, fewer
allocations, less to read.
3. Note the Map.of single-line constraint on scalarFunctionAdapters.
Per-pair formatting is rejected by spotless; left a comment pointing
future contributors at alphabetical ordering instead, and reordered
the entries (CONCAT before TIMESTAMP) to make the convention concrete.
No behavioral change. CalciteEvalCommandIT 4/4 still passes against the
analytics-engine route; sandbox per-module check (excluding the
unrelated commons-text dependencyLicenses task) remains green.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Analytics Framework] Resolve symbolic operators by Calcite-operator reference
Per @expani's PR follow-up: the symbolic-name string ("||") was a
runtime-coupled identifier that could silently drift if Calcite renamed
the operator. Replace it with a direct reference to the Calcite operator
constant (SqlStdOperatorTable.CONCAT), so the link is enforced at
compile time and a Calcite-side rename surfaces as a build failure here.
- String symbolicOperatorName -> SqlOperator referenceOperator on the
enum constructor.
- CONCAT now points at SqlStdOperatorTable.CONCAT instead of "||".
- Reverse index switches from Map<String, ScalarFunction> keyed by
operator name to Map<SqlOperator, ScalarFunction> keyed by operator
identity. Calcite's standard operators are singletons, so identity
lookup is exact.
- Unit test renamed (testFromSqlOperatorResolvesPipeConcatViaReferenceOperator)
and its comment updated; the assertions on `getName()` / `getKind()`
are kept as documentation of WHY this branch is needed at all.
No behavioral change in the resolution logic — same three-step chain
(SqlKind, then this branch, then identifier-name valueOf), with the
middle branch now identity-comparing rather than name-comparing.
CalciteEvalCommandIT 4/4 still passes; ScalarFunctionTests 7/7.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* [Analytics Framework + Backend] Address @expani follow-up on PR opensearch-project#21498
Two feedback items in one commit:
1. Drop the redundant Map.copyOf on BY_REFERENCE_OPERATOR. The HashMap
built in the static initializer is private static final and is only
read via the resolver's get() — never returned, never iterated. The
immutability wrapper added an allocation without conferring any
external safety guarantee. Comment explains the reasoning so future
readers don't reintroduce the wrap.
2. Add ConcatFunctionAdapterTests with seven structural assertions on
the CASE rewrite contract:
- testAdaptBinaryConcatProducesCaseWrapper: rewritten root is a
three-operand CASE (condition, then, else).
- testAdaptedCaseElseBranchIsOriginalConcat: else branch is the
original RexCall by reference (assertSame, not assertEquals) —
downstream substrait conversion expects the same object the
resolver annotated.
- testAdaptedCaseThenBranchIsNullLiteralOfMatchingSqlType: then
branch is a NULL literal whose SQL type name matches the original
CONCAT's. Comment explains why we compare type name rather than
full RelDataType (RexBuilder.makeNullLiteral promotes nullability,
so the full types differ harmlessly).
- testAdaptedCaseConditionIsOrOfIsNullChecks: condition is OR with
each disjunct an IS_NULL wrapping the corresponding original
operand at matching index — null-propagation contract is per
operand.
- testAdaptPreservesReturnType: full RelDataType identity between
adapted CASE and original CONCAT — locks the type-preserving
argument of rexBuilder.makeCall(originalType, CASE, ...).
- testAdaptNaryConcatChainsIsNullChecksLeftAssociative: builds a
ternary CONCAT via SqlLibraryOperators.CONCAT_FUNCTION and
verifies the left-fold structure OR(OR(IS_NULL(a), IS_NULL(b)),
IS_NULL(c)) — the binary `||` only ever appears with arity 2 in
production, but the loop's correctness for arbitrary N is now a
test invariant.
- testAdaptSingleOperandConcatPassesThroughUnchanged: 1-operand
call returns input by reference; documents the early-out branch.
Each test pins one structural property in isolation, so a regression
that drops any one piece of the contract surfaces with a focused
failure rather than at IT-level row-mismatch noise.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
---------
Signed-off-by: Kai Huang <ahkcs@amazon.com>1 parent dcb68f9 commit ed78d03
9 files changed
Lines changed: 656 additions & 34 deletions
File tree
- sandbox
- libs/analytics-framework/src
- main/java/org/opensearch/analytics/spi
- test/java/org/opensearch/analytics/spi
- plugins
- analytics-backend-datafusion/src
- main/java/org/opensearch/be/datafusion
- test/java/org/opensearch/be/datafusion
- analytics-engine/src/main/java/org/opensearch/analytics/planner
- dag
- rules
- qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa
Lines changed: 83 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
13 | 15 | | |
| 16 | + | |
14 | 17 | | |
| 18 | + | |
15 | 19 | | |
16 | 20 | | |
17 | 21 | | |
| |||
52 | 56 | | |
53 | 57 | | |
54 | 58 | | |
55 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
56 | 68 | | |
57 | 69 | | |
58 | 70 | | |
| |||
68 | 80 | | |
69 | 81 | | |
70 | 82 | | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
71 | 91 | | |
72 | 92 | | |
73 | 93 | | |
| |||
98 | 118 | | |
99 | 119 | | |
100 | 120 | | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
101 | 130 | | |
102 | 131 | | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
103 | 136 | | |
104 | 137 | | |
| 138 | + | |
105 | 139 | | |
106 | 140 | | |
107 | 141 | | |
| |||
134 | 168 | | |
135 | 169 | | |
136 | 170 | | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
137 | 219 | | |
Lines changed: 67 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
17 | 25 | | |
18 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
19 | 48 | | |
20 | 49 | | |
21 | 50 | | |
| |||
34 | 63 | | |
35 | 64 | | |
36 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
37 | 104 | | |
Lines changed: 69 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
Lines changed: 17 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
89 | | - | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
90 | 93 | | |
91 | 94 | | |
92 | 95 | | |
93 | 96 | | |
| 97 | + | |
| 98 | + | |
94 | 99 | | |
95 | 100 | | |
96 | 101 | | |
| |||
180 | 185 | | |
181 | 186 | | |
182 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
183 | 191 | | |
184 | | - | |
185 | | - | |
| 192 | + | |
| 193 | + | |
186 | 194 | | |
187 | | - | |
188 | 195 | | |
189 | | - | |
190 | | - | |
191 | | - | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
192 | 201 | | |
193 | 202 | | |
194 | 203 | | |
| |||
0 commit comments