Skip to content

Commit ed78d03

Browse files
authored
Enable PPL eval string concat on the analytics-engine route via DataFusion CONCAT/CAST (opensearch-project#21498)
* [Analytics Framework] Resolve symbolic operators and add SAFE_CAST Calcite emits SqlBinaryOperators (e.g. `||`, the lowering target of PPL string `+`) with SqlKind.OTHER and a non-identifier name. The existing ScalarFunction.fromSqlKind / fromSqlFunction pair fails to resolve these: fromSqlKind misses (OTHER is shared), fromSqlFunction throws because `||` is not a SqlFunction (it's a SqlBinaryOperator). The planner-side fallout is "No backend supports scalar function [null] among [datafusion]" with no useful name in the error. Introduce ScalarFunction.fromSqlOperator(SqlOperator) — the unified entry point used by OpenSearchProjectRule, OpenSearchFilterRule, and BackendPlanAdapter in subsequent commits. Resolution order: 1. SqlKind via fromSqlKind (covers PLUS, CAST, COALESCE, etc.) 2. Symbolic-name lookup (handles `||` -> CONCAT) 3. Identifier-name valueOf fallback (covers UPPER, LOWER, etc.) The symbolic-name table currently has one entry (`||` -> CONCAT) but is the documented extension point for future SqlBinaryOperators with non- identifier names. Also adds SAFE_CAST as a sibling enum constant to CAST. PPL emits explicit `CAST(... AS ...)` lowered to Calcite's SqlKind.SAFE_CAST when the source value may be NULL or the conversion may fail. SAFE_CAST and CAST share the same backend semantics (DataFusion's native cast already returns NULL on conversion failure) but resolve through distinct SqlKinds, so they need distinct enum entries. Unit test pins all three resolution branches plus the unknown-operator return-null contract — a regression that drops a branch surfaces here rather than as an opaque "[null]" IT failure. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [Analytics Engine] Migrate rules and adapter dispatch to fromSqlOperator Three call sites resolved a RexCall's operator using the same two-step pattern (SqlKind first, SqlFunction-cast second) and all three failed identically on `||` (a SqlBinaryOperator with SqlKind.OTHER): - OpenSearchProjectRule.resolveScalarViableBackends - OpenSearchFilterRule (predicate operator resolution) - BackendPlanAdapter.resolveFunction (per-function adapter dispatch) Migrate all three to ScalarFunction.fromSqlOperator, the unified resolver added in the previous commit. Behavior for previously-resolved operators is unchanged — fromSqlOperator delegates to fromSqlKind first, so anything that resolved through SqlKind continues to. New behavior: `||` now resolves to CONCAT, and unrecognized operators return null (catching the IllegalArgumentException that fromSqlFunction's valueOf threw before; the call sites already handled null and now produce a better-formed error message that includes the operator name). Also drop the unused SqlFunction import in OpenSearchFilterRule and BackendPlanAdapter, and tighten the OpenSearchProjectRule error message to fall back to operator.getName() when the resolver returns null — "[null]" was unactionable for triage; "[||]" or "[<unknown_name>]" points directly at the missing capability. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [Analytics Backend / DataFusion] Wire CONCAT/CAST/SAFE_CAST + concat null adapter Three new ScalarFunctions in STANDARD_PROJECT_OPS: - CONCAT — lowering target of PPL `eval`'s `+` for strings (Calcite emits `||`, resolved to CONCAT through the symbolic-name branch of ScalarFunction.fromSqlOperator) - CAST — covers PPL's explicit `CAST(... AS ...)` over non-null source types (Calcite emits SqlKind.CAST) - SAFE_CAST — same surface, but emitted by Calcite when the source value is nullable (SqlKind.SAFE_CAST) CONCAT additionally needs a ScalarFunctionAdapter to preserve null semantics. Calcite's `||` follows the SQL standard: if any operand is NULL, the result is NULL. Substrait's default `concat` extension is documented with the same semantics, but DataFusion's substrait reader maps it to the DataFusion `concat()` function — which deviates from the standard and treats NULL operands as empty strings. PPL queries like `'Age: ' + CAST(null AS STRING)` expect NULL, not 'Age: '. ConcatFunctionAdapter rewrites `||(a, b, ...)` into CASE WHEN a IS NULL OR b IS NULL OR ... THEN NULL ELSE ||(a, b, ...) END The inner `||` survives unchanged and serializes through the same Substrait conversion path; the surrounding CASE/IS_NULL short-circuits the DataFusion `concat()` call whenever any operand is NULL, restoring SQL-standard null propagation without a custom DataFusion UDF. Trade-off: the rewrite double-evaluates each operand (once in IS_NULL, once in the inner `||`). For RexInputRef and RexLiteral operands — the only shapes PPL emits today for string concat — this is free; for nested calls the cost is proportional to operand count, not operand depth, since each `||` adapter wraps one CASE around its direct call. A custom null-propagating concat UDF (Bucket-3 work in sandbox/plugins/analytics-backend-datafusion/rust) is the alternative but disproportionate for a Bucket-1 surface. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [QA] Add EvalCommandIT for the analytics-engine REST path Self-contained integration test for PPL `eval` on the analytics-engine route. Mirrors CalciteEvalCommandIT in opensearch-project/sql so the analytics-engine path can be verified inside core without cross-plugin dependencies on the SQL plugin. Each test sends a PPL query through POST /_analytics/ppl (exposed by test-ppl-frontend) which runs the same UnifiedQueryPlanner -> CalciteRelNodeVisitor -> Substrait -> DataFusion pipeline as the SQL plugin's force-routed analytics path. Four tests on the calcs dataset cover the eval surface this PR enables: - testEvalStringConcatLiteralPlusField — `'literal' + str_field` exercises the symbolic-name resolution for `||` and the CONCAT capability; null str field rows assert null propagation through the CASE adapter. - testEvalStringConcatWithCastIntField — `'literal' + CAST(int AS STRING)` exercises both CAST/SAFE_CAST and CONCAT in the same projection; null int rows confirm CAST(NULL) -> NULL propagates through the surrounding concat. - testEvalStringConcatMultipleLiteralsAndFields — chained four-arg concat exercises the recursive AnnotatedProjectExpression strip for nested project calls. - testEvalStringConcatTwoFields — pure field-to-field concat with no literal operands; planner takes the hasFieldRef=true path in resolveScalarViableBackends. Reuses the existing calcs dataset (no new fixtures). Once this lands, the SQL-plugin's CalciteEvalCommandIT is verification-only — this QA IT is the source of truth for the analytics-engine path. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [Analytics Framework] Rename fromSqlOperator to fromSqlOperatorWithFallback Per @expani's PR feedback: the method walks three resolution paths (SqlKind, symbolic-name table, identifier-name valueOf) before returning null, so the name should advertise the fallback behavior at the call site rather than only in the javadoc. Mechanical rename across all callers — `ScalarFunction.fromSqlOperator` -> `ScalarFunction.fromSqlOperatorWithFallback` in: - the resolver itself plus its 7 unit tests - OpenSearchProjectRule (2 call sites) - OpenSearchFilterRule (1 call site) - BackendPlanAdapter.resolveFunction (1 call site) - EvalCommandIT javadoc cross-reference No behavioral change. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [Analytics Framework + Backend] Address @expani review on PR opensearch-project#21498 Three feedback items in one commit: 1. Co-locate symbolic operator name with the enum constant. The static SYMBOLIC_OPERATOR_NAMES map duplicated a property that belongs on the enum itself. Moved to a nullable `symbolicOperatorName` field on each ScalarFunction constant — currently set only on CONCAT ("||"). The reverse-index map is now built from the enum at class-init time, so adding a new symbolic operator is a single-site edit on the constant rather than a separate map entry. 2. Inline the OR/IS_NULL fold in ConcatFunctionAdapter. Drop the temporary List<RexNode> nullChecks and accumulate the OR-of-IS_NULLs directly in the loop body. Same generated tree, fewer allocations, less to read. 3. Note the Map.of single-line constraint on scalarFunctionAdapters. Per-pair formatting is rejected by spotless; left a comment pointing future contributors at alphabetical ordering instead, and reordered the entries (CONCAT before TIMESTAMP) to make the convention concrete. No behavioral change. CalciteEvalCommandIT 4/4 still passes against the analytics-engine route; sandbox per-module check (excluding the unrelated commons-text dependencyLicenses task) remains green. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [Analytics Framework] Resolve symbolic operators by Calcite-operator reference Per @expani's PR follow-up: the symbolic-name string ("||") was a runtime-coupled identifier that could silently drift if Calcite renamed the operator. Replace it with a direct reference to the Calcite operator constant (SqlStdOperatorTable.CONCAT), so the link is enforced at compile time and a Calcite-side rename surfaces as a build failure here. - String symbolicOperatorName -> SqlOperator referenceOperator on the enum constructor. - CONCAT now points at SqlStdOperatorTable.CONCAT instead of "||". - Reverse index switches from Map<String, ScalarFunction> keyed by operator name to Map<SqlOperator, ScalarFunction> keyed by operator identity. Calcite's standard operators are singletons, so identity lookup is exact. - Unit test renamed (testFromSqlOperatorResolvesPipeConcatViaReferenceOperator) and its comment updated; the assertions on `getName()` / `getKind()` are kept as documentation of WHY this branch is needed at all. No behavioral change in the resolution logic — same three-step chain (SqlKind, then this branch, then identifier-name valueOf), with the middle branch now identity-comparing rather than name-comparing. CalciteEvalCommandIT 4/4 still passes; ScalarFunctionTests 7/7. Signed-off-by: Kai Huang <ahkcs@amazon.com> * [Analytics Framework + Backend] Address @expani follow-up on PR opensearch-project#21498 Two feedback items in one commit: 1. Drop the redundant Map.copyOf on BY_REFERENCE_OPERATOR. The HashMap built in the static initializer is private static final and is only read via the resolver's get() — never returned, never iterated. The immutability wrapper added an allocation without conferring any external safety guarantee. Comment explains the reasoning so future readers don't reintroduce the wrap. 2. Add ConcatFunctionAdapterTests with seven structural assertions on the CASE rewrite contract: - testAdaptBinaryConcatProducesCaseWrapper: rewritten root is a three-operand CASE (condition, then, else). - testAdaptedCaseElseBranchIsOriginalConcat: else branch is the original RexCall by reference (assertSame, not assertEquals) — downstream substrait conversion expects the same object the resolver annotated. - testAdaptedCaseThenBranchIsNullLiteralOfMatchingSqlType: then branch is a NULL literal whose SQL type name matches the original CONCAT's. Comment explains why we compare type name rather than full RelDataType (RexBuilder.makeNullLiteral promotes nullability, so the full types differ harmlessly). - testAdaptedCaseConditionIsOrOfIsNullChecks: condition is OR with each disjunct an IS_NULL wrapping the corresponding original operand at matching index — null-propagation contract is per operand. - testAdaptPreservesReturnType: full RelDataType identity between adapted CASE and original CONCAT — locks the type-preserving argument of rexBuilder.makeCall(originalType, CASE, ...). - testAdaptNaryConcatChainsIsNullChecksLeftAssociative: builds a ternary CONCAT via SqlLibraryOperators.CONCAT_FUNCTION and verifies the left-fold structure OR(OR(IS_NULL(a), IS_NULL(b)), IS_NULL(c)) — the binary `||` only ever appears with arity 2 in production, but the loop's correctness for arbitrary N is now a test invariant. - testAdaptSingleOperandConcatPassesThroughUnchanged: 1-operand call returns input by reference; documents the early-out branch. Each test pins one structural property in isolation, so a regression that drops any one piece of the contract surfaces with a focused failure rather than at IT-level row-mismatch noise. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent dcb68f9 commit ed78d03

9 files changed

Lines changed: 656 additions & 34 deletions

File tree

sandbox/libs/analytics-framework/src/main/java/org/opensearch/analytics/spi/ScalarFunction.java

Lines changed: 83 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,12 @@
1010

1111
import org.apache.calcite.sql.SqlFunction;
1212
import org.apache.calcite.sql.SqlKind;
13+
import org.apache.calcite.sql.SqlOperator;
14+
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
1315

16+
import java.util.HashMap;
1417
import java.util.Locale;
18+
import java.util.Map;
1519

1620
/**
1721
* All scalar functions a backend may support — comparisons, full-text search,
@@ -52,7 +56,15 @@ public enum ScalarFunction {
5256
LOWER(Category.STRING, SqlKind.OTHER_FUNCTION),
5357
TRIM(Category.STRING, SqlKind.TRIM),
5458
SUBSTRING(Category.STRING, SqlKind.OTHER_FUNCTION),
55-
CONCAT(Category.STRING, SqlKind.OTHER_FUNCTION),
59+
/**
60+
* String concatenation. Calcite's {@code SqlStdOperatorTable.CONCAT} is a
61+
* {@link org.apache.calcite.sql.SqlBinaryOperator} named {@code "||"} (not {@code "CONCAT"})
62+
* with {@link SqlKind#OTHER}, so neither {@link #fromSqlKind(SqlKind)} nor identifier-name
63+
* {@link #valueOf(String)} resolves it. The {@code referenceOperator} hook below pins the
64+
* concrete Calcite operator constant so resolution is a singleton-identity match — a Calcite
65+
* rename surfaces as a compile error rather than as a silent string mismatch at runtime.
66+
*/
67+
CONCAT(Category.STRING, SqlKind.OTHER_FUNCTION, SqlStdOperatorTable.CONCAT),
5668
CHAR_LENGTH(Category.STRING, SqlKind.OTHER_FUNCTION),
5769

5870
// ── Math ─────────────────────────────────────────────────────────
@@ -68,6 +80,14 @@ public enum ScalarFunction {
6880

6981
// ── Cast / type ──────────────────────────────────────────────────
7082
CAST(Category.SCALAR, SqlKind.CAST),
83+
/**
84+
* Calcite's {@code SAFE_CAST} — emitted by PPL's explicit {@code CAST(... AS ...)} when the
85+
* source value may be NULL or the conversion may fail; returns NULL on failure rather than
86+
* throwing. Resolves through {@link SqlKind#SAFE_CAST}, distinct from {@link #CAST} which
87+
* uses {@link SqlKind#CAST}. DataFusion's native cast already returns NULL on conversion
88+
* failure, so SAFE_CAST and CAST share the same backend semantics.
89+
*/
90+
SAFE_CAST(Category.SCALAR, SqlKind.SAFE_CAST),
7191

7292
// ── Conditional ──────────────────────────────────────────────────
7393
CASE(Category.SCALAR, SqlKind.CASE),
@@ -98,10 +118,24 @@ public enum Category {
98118

99119
private final Category category;
100120
private final SqlKind sqlKind;
121+
/**
122+
* Optional Calcite operator that this constant maps to when the operator cannot be resolved
123+
* via {@link SqlKind} or via identifier-name {@link #valueOf(String)} — typically operators
124+
* whose {@code getName()} returns a non-identifier token (e.g. {@code SqlStdOperatorTable.CONCAT}
125+
* is named {@code "||"}). Null for the common case where SqlKind or name resolution suffices.
126+
* Stored as a reference (not a string) so a Calcite-side rename of the operator surfaces as a
127+
* compile error here.
128+
*/
129+
private final SqlOperator referenceOperator;
101130

102131
ScalarFunction(Category category, SqlKind sqlKind) {
132+
this(category, sqlKind, null);
133+
}
134+
135+
ScalarFunction(Category category, SqlKind sqlKind, SqlOperator referenceOperator) {
103136
this.category = category;
104137
this.sqlKind = sqlKind;
138+
this.referenceOperator = referenceOperator;
105139
}
106140

107141
public Category getCategory() {
@@ -134,4 +168,52 @@ public static ScalarFunction fromSqlFunction(SqlFunction function) {
134168
// valueOf(toUpperCase). This couples enum constant naming to SQL function naming convention.
135169
return ScalarFunction.valueOf(function.getName().toUpperCase(Locale.ROOT));
136170
}
171+
172+
/**
173+
* Reverse index from {@link #referenceOperator} to enum constant. Built from the enum itself
174+
* at class init — adding a new symbolic operator is a single-site change on the enum constant,
175+
* no separate map to maintain. Lookup is identity-keyed because Calcite's standard operators
176+
* are singletons (e.g. {@code SqlStdOperatorTable.CONCAT}). Empty in the common case (most
177+
* constants resolve by SqlKind or identifier-name valueOf).
178+
*/
179+
private static final Map<SqlOperator, ScalarFunction> BY_REFERENCE_OPERATOR;
180+
181+
static {
182+
Map<SqlOperator, ScalarFunction> byOperator = new HashMap<>();
183+
for (ScalarFunction func : values()) {
184+
if (func.referenceOperator != null) {
185+
byOperator.put(func.referenceOperator, func);
186+
}
187+
}
188+
// The HashMap is private static final and never exposed beyond the get() in the resolver
189+
// below — wrapping it in Map.copyOf adds an allocation without any external safety guarantee.
190+
BY_REFERENCE_OPERATOR = byOperator;
191+
}
192+
193+
/**
194+
* Maps any Calcite {@link SqlOperator} to a {@link ScalarFunction}, or returns null if
195+
* unrecognized. Resolution order: {@link SqlKind} match, then {@link #referenceOperator}
196+
* identity match (handles {@code SqlStdOperatorTable.CONCAT} a.k.a. {@code ||}), then
197+
* identifier-name {@link #valueOf(String)} match.
198+
*
199+
* <p>Prefer this entry point over {@link #fromSqlKind(SqlKind)} /
200+
* {@link #fromSqlFunction(SqlFunction)} when resolving an arbitrary {@code RexCall}'s
201+
* operator: a {@code RexCall} may be backed by a {@code SqlBinaryOperator} (e.g. {@code ||})
202+
* which is neither covered by {@code OTHER} {@code SqlKind} nor by {@code SqlFunction}.
203+
*/
204+
public static ScalarFunction fromSqlOperatorWithFallback(SqlOperator operator) {
205+
ScalarFunction byKind = fromSqlKind(operator.getKind());
206+
if (byKind != null) {
207+
return byKind;
208+
}
209+
ScalarFunction byReference = BY_REFERENCE_OPERATOR.get(operator);
210+
if (byReference != null) {
211+
return byReference;
212+
}
213+
try {
214+
return ScalarFunction.valueOf(operator.getName().toUpperCase(Locale.ROOT));
215+
} catch (IllegalArgumentException ignored) {
216+
return null;
217+
}
218+
}
137219
}

sandbox/libs/analytics-framework/src/test/java/org/opensearch/analytics/spi/ScalarFunctionTests.java

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,42 @@
99
package org.opensearch.analytics.spi;
1010

1111
import org.apache.calcite.sql.SqlKind;
12+
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
1213
import org.opensearch.test.OpenSearchTestCase;
1314

1415
import java.util.EnumMap;
1516
import java.util.Map;
1617

18+
/**
19+
* Unit coverage for {@link ScalarFunction}'s three resolution paths used by the analytics-engine
20+
* planner ({@code OpenSearchProjectRule}, {@code OpenSearchFilterRule}, {@code BackendPlanAdapter}).
21+
*
22+
* <p>Each test pins one of the resolver's branches so a regression that drops a branch surfaces
23+
* here rather than in IT-level "No backend supports scalar function [null]" errors.
24+
*/
1725
public class ScalarFunctionTests extends OpenSearchTestCase {
1826

27+
// ── fromSqlKind ─────────────────────────────────────────────────────────────
28+
29+
public void testFromSqlKindResolvesDedicatedKind() {
30+
assertEquals(ScalarFunction.EQUALS, ScalarFunction.fromSqlKind(SqlKind.EQUALS));
31+
assertEquals(ScalarFunction.PLUS, ScalarFunction.fromSqlKind(SqlKind.PLUS));
32+
assertEquals(ScalarFunction.CAST, ScalarFunction.fromSqlKind(SqlKind.CAST));
33+
assertEquals(ScalarFunction.SAFE_CAST, ScalarFunction.fromSqlKind(SqlKind.SAFE_CAST));
34+
assertEquals(ScalarFunction.COALESCE, ScalarFunction.fromSqlKind(SqlKind.COALESCE));
35+
}
36+
37+
public void testFromSqlKindReturnsNullForOtherKind() {
38+
// SqlKind.OTHER is shared by many SqlBinaryOperators — must NOT resolve via SqlKind.
39+
assertNull(ScalarFunction.fromSqlKind(SqlKind.OTHER));
40+
}
41+
42+
public void testFromSqlKindReturnsNullForOtherFunctionKind() {
43+
// SqlKind.OTHER_FUNCTION is shared by many name-distinguished SqlFunctions — must NOT
44+
// resolve via SqlKind even though several enum entries declare it.
45+
assertNull(ScalarFunction.fromSqlKind(SqlKind.OTHER_FUNCTION));
46+
}
47+
1948
/** Non-OTHER_FUNCTION SqlKinds must be unique: fromSqlKind picks the first match and would shadow later entries. */
2049
public void testNoDuplicateSqlKindBindings() {
2150
Map<SqlKind, ScalarFunction> claimedBy = new EnumMap<>(SqlKind.class);
@@ -34,4 +63,42 @@ public void testNoDuplicateSqlKindBindings() {
3463
public void testSargPredicateIsBoundToSqlKindSearch() {
3564
assertSame(ScalarFunction.SARG_PREDICATE, ScalarFunction.fromSqlKind(SqlKind.SEARCH));
3665
}
66+
67+
// ── fromSqlOperatorWithFallback: SqlKind branch ────────────────────────────────────────
68+
69+
public void testFromSqlOperatorResolvesViaSqlKind() {
70+
// Calcite's CAST has a dedicated SqlKind.CAST — short-circuit before name lookup.
71+
assertEquals(ScalarFunction.CAST, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.CAST));
72+
assertEquals(ScalarFunction.PLUS, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.PLUS));
73+
assertEquals(ScalarFunction.GREATER_THAN, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.GREATER_THAN));
74+
assertEquals(ScalarFunction.COALESCE, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.COALESCE));
75+
}
76+
77+
// ── fromSqlOperatorWithFallback: reference-operator branch ─────────────────────────────
78+
79+
public void testFromSqlOperatorResolvesPipeConcatViaReferenceOperator() {
80+
// The original "no backend supports scalar function [null]" symptom for PPL string `+`.
81+
// SqlStdOperatorTable.CONCAT is a SqlBinaryOperator named "||" with SqlKind.OTHER —
82+
// neither fromSqlKind nor fromSqlFunction(SqlFunction) resolves it. CONCAT's
83+
// referenceOperator field points at the singleton, so the resolver matches by identity.
84+
assertEquals("||", SqlStdOperatorTable.CONCAT.getName());
85+
assertEquals(SqlKind.OTHER, SqlStdOperatorTable.CONCAT.getKind());
86+
assertEquals(ScalarFunction.CONCAT, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.CONCAT));
87+
}
88+
89+
// ── fromSqlOperatorWithFallback: identifier-name branch ────────────────────────────────
90+
91+
public void testFromSqlOperatorResolvesViaIdentifierName() {
92+
// SqlStdOperatorTable.UPPER is a SqlFunction named "UPPER" with SqlKind.OTHER_FUNCTION;
93+
// resolves through the valueOf(name.toUpperCase()) fallback after SqlKind misses.
94+
assertEquals(ScalarFunction.UPPER, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.UPPER));
95+
assertEquals(ScalarFunction.LOWER, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.LOWER));
96+
assertEquals(ScalarFunction.ABS, ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.ABS));
97+
}
98+
99+
public void testFromSqlOperatorReturnsNullForUnknownFunction() {
100+
// UNARY_MINUS has SqlKind.MINUS_PREFIX (no enum) and name "-" (not a valid valueOf input);
101+
// both resolution paths miss and the resolver returns null instead of throwing.
102+
assertNull(ScalarFunction.fromSqlOperatorWithFallback(SqlStdOperatorTable.UNARY_MINUS));
103+
}
37104
}
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
/*
2+
* SPDX-License-Identifier: Apache-2.0
3+
*
4+
* The OpenSearch Contributors require contributions made to
5+
* this file be licensed under the Apache-2.0 license or a
6+
* compatible open source license.
7+
*/
8+
9+
package org.opensearch.be.datafusion;
10+
11+
import org.apache.calcite.plan.RelOptCluster;
12+
import org.apache.calcite.rex.RexBuilder;
13+
import org.apache.calcite.rex.RexCall;
14+
import org.apache.calcite.rex.RexNode;
15+
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
16+
import org.opensearch.analytics.spi.FieldStorageInfo;
17+
import org.opensearch.analytics.spi.ScalarFunctionAdapter;
18+
19+
import java.util.List;
20+
21+
/**
22+
* Adapts {@code ||(a, b, ...)} (Calcite {@code SqlStdOperatorTable.CONCAT}) into a
23+
* null-propagating form for the DataFusion backend.
24+
*
25+
* <p>Calcite's {@code ||} operator follows the SQL standard: if any operand is NULL, the result
26+
* is NULL. Substrait's default {@code concat} extension is documented with the same semantics,
27+
* but DataFusion's substrait reader maps it to the DataFusion {@code concat()} function — which
28+
* deviates from the standard and treats NULL operands as empty strings. To preserve Calcite's
29+
* semantics on the analytics-engine path, this adapter rewrites
30+
*
31+
* <pre>{@code
32+
* ||(a, b)
33+
* →
34+
* CASE WHEN a IS NULL OR b IS NULL THEN NULL ELSE ||(a, b) END
35+
* }</pre>
36+
*
37+
* The inner {@code ||} is left intact and serializes through the same Substrait conversion path,
38+
* but with the surrounding CASE/IS_NULL the DataFusion {@code concat()} call is short-circuited
39+
* for any input that contains a NULL — restoring SQL-standard null-propagation without requiring
40+
* a custom DataFusion UDF.
41+
*
42+
* <p>Single-operand calls fall through unchanged (the result equals the operand, so no
43+
* null-handling rewrite is needed).
44+
*/
45+
class ConcatFunctionAdapter implements ScalarFunctionAdapter {
46+
47+
@Override
48+
public RexNode adapt(RexCall original, List<FieldStorageInfo> fieldStorage, RelOptCluster cluster) {
49+
List<RexNode> operands = original.getOperands();
50+
if (operands.size() < 2) {
51+
return original;
52+
}
53+
RexBuilder rexBuilder = cluster.getRexBuilder();
54+
// Fold operands into a single OR(IS_NULL(o0), IS_NULL(o1), ...) predicate. IS_NULL on a
55+
// non-null literal reduces to constant-false, so the OR collapses cleanly through the
56+
// optimizer for cases where some operands are statically non-null.
57+
RexNode anyNull = rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operands.get(0));
58+
for (int i = 1; i < operands.size(); i++) {
59+
anyNull = rexBuilder.makeCall(
60+
SqlStdOperatorTable.OR,
61+
anyNull,
62+
rexBuilder.makeCall(SqlStdOperatorTable.IS_NULL, operands.get(i))
63+
);
64+
}
65+
// Result type stays the same as the original CONCAT — nullable VARCHAR.
66+
RexNode nullLiteral = rexBuilder.makeNullLiteral(original.getType());
67+
return rexBuilder.makeCall(original.getType(), SqlStdOperatorTable.CASE, List.of(anyNull, nullLiteral, original));
68+
}
69+
}

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionAnalyticsBackendPlugin.java

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,17 @@ public class DataFusionAnalyticsBackendPlugin implements AnalyticsSearchBackendP
8585
// path). COALESCE is the lowering target of PPL `fillnull`. CAST is required because
8686
// ReduceExpressionsRule.ProjectReduceExpressionsRule (in PlannerImpl) constant-folds field
8787
// references through equality filters into typed literals — e.g. after `where str0 = 'FURNITURE'`,
88-
// the projection `fields str0` is rewritten to `CAST('FURNITURE' AS VARCHAR)`. The remaining
89-
// comparison / arithmetic / logical operators are project-capable for eval-style projections.
88+
// the projection `fields str0` is rewritten to `CAST('FURNITURE' AS VARCHAR)`. CONCAT is the
89+
// lowering target of PPL `eval`'s `+` for strings (Calcite emits `||`, resolved to CONCAT in
90+
// ScalarFunction); SAFE_CAST covers PPL `eval`'s explicit nullable `CAST(... AS ...)`
91+
// expressions. The remaining comparison / arithmetic / logical operators are project-capable
92+
// for eval-style projections.
9093
private static final Set<ScalarFunction> STANDARD_PROJECT_OPS = Set.of(
9194
ScalarFunction.COALESCE,
9295
ScalarFunction.CEIL,
9396
ScalarFunction.CAST,
97+
ScalarFunction.CONCAT,
98+
ScalarFunction.SAFE_CAST,
9499
ScalarFunction.SARG_PREDICATE,
95100
ScalarFunction.EQUALS,
96101
ScalarFunction.NOT_EQUALS,
@@ -180,15 +185,19 @@ public Set<AggregateCapability> aggregateCapabilities() {
180185

181186
@Override
182187
public Map<ScalarFunction, ScalarFunctionAdapter> scalarFunctionAdapters() {
188+
// Add new (ScalarFunction, ScalarFunctionAdapter) pairs in alphabetical order for
189+
// readability — the Map.ofEntries form keeps spotless happy past the 5-pair point
190+
// where Map.of becomes single-line and unreadable.
183191
return Map.ofEntries(
184-
Map.entry(ScalarFunction.TIMESTAMP, new TimestampFunctionAdapter()),
185-
Map.entry(ScalarFunction.SARG_PREDICATE, new SargAdapter()),
192+
Map.entry(ScalarFunction.CONCAT, new ConcatFunctionAdapter()),
193+
Map.entry(ScalarFunction.CONVERT_TZ, new ConvertTzAdapter()),
186194
Map.entry(ScalarFunction.DIVIDE, new StdOperatorRewriteAdapter("DIVIDE", SqlStdOperatorTable.DIVIDE)),
187-
Map.entry(ScalarFunction.MOD, new StdOperatorRewriteAdapter("MOD", SqlStdOperatorTable.MOD)),
188195
Map.entry(ScalarFunction.LIKE, new LikeAdapter()),
189-
Map.entry(ScalarFunction.YEAR, new YearAdapter()),
190-
Map.entry(ScalarFunction.CONVERT_TZ, new ConvertTzAdapter()),
191-
Map.entry(ScalarFunction.UNIX_TIMESTAMP, new UnixTimestampAdapter())
196+
Map.entry(ScalarFunction.MOD, new StdOperatorRewriteAdapter("MOD", SqlStdOperatorTable.MOD)),
197+
Map.entry(ScalarFunction.SARG_PREDICATE, new SargAdapter()),
198+
Map.entry(ScalarFunction.TIMESTAMP, new TimestampFunctionAdapter()),
199+
Map.entry(ScalarFunction.UNIX_TIMESTAMP, new UnixTimestampAdapter()),
200+
Map.entry(ScalarFunction.YEAR, new YearAdapter())
192201
);
193202
}
194203
};

0 commit comments

Comments
 (0)