Skip to content

Commit 8720ca7

Browse files
committed
fix(opensearch): collapse 4-arg REGEXP_REPLACE_PG_4 'g' to 3-arg at script pushdown
The `feat(core)` commit on this branch lowered PPL `patterns` to a 4-arg `REGEXP_REPLACE_PG_4(field, pattern, replacement, 'g')` so DataFusion (which defaults to first-match-only) does global replacement on the analytics-engine route. Calcite's enumerable runtime — which the V2 / Calcite-pushdown path uses to compile the serialized RexCall into Janino bytecode — has no matching `SqlFunctions.regexpReplace(String, String, String, String)` impl (only `(String, String, String, int[, ...])` variants where the 4th arg is start position, not a flags string). Janino codegen failed with `No applicable constructor/method found` for the 4-arg-with-flags call shape, breaking the patterns.md doctest (`source=apache | patterns message method=simple_pattern mode=aggregation`). Two complementary fixes: 1. `RexStandardizer.visitCall`: before serializing for pushdown, collapse `REGEXP_REPLACE_PG_4(field, pattern, replacement, 'g')` to the 3-arg `REGEXP_REPLACE_3` form. Safe because Calcite's 3-arg variant is already replace-all (same semantics as PG_4 with `g`). Only fires when the flags literal is exactly `"g"` so any future `i`/`m`/etc. use cases pass through untouched. 2. `ExtendedRelJson.toOp`: pass operand count when looking up an operator on the deserialization side so multi-arity SQL names (REGEXP_REPLACE_3 vs REGEXP_REPLACE_PG_4 vs REGEXP_REPLACE_5 all share `name="REGEXP_REPLACE"`) resolve to the right overload. Defensive — the standardizer fix above is what actually unblocks the doctest, but the resolver was picking by name alone and would have surfaced the same bug for any other overloaded builtin. Verified locally: - doctest queries (`patterns ... method=simple_pattern mode=aggregation [...]`) now return fully-tokenized output; - `CalcitePPLDashboardPatternsIT` still 1/1 PASS; - `CalcitePPLPatternsIT` still 10/15 with the same five known-pending failures (LogicalCorrelate + `_ShowNumberedToken` BRAIN cases). Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent 394cb5a commit 8720ca7

2 files changed

Lines changed: 51 additions & 2 deletions

File tree

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/ExtendedRelJson.java

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -460,7 +460,8 @@ RexNode toRex(RelInput relInput, @PolyNull Object o) {
460460
distinct,
461461
false);
462462
} else {
463-
final SqlOperator operator = requireNonNull(toOp(opMap), "operator");
463+
final SqlOperator operator =
464+
requireNonNull(toOp(opMap, rexOperands.size()), "operator");
464465
final RelDataType type;
465466
if (jsonType != null) {
466467
type = toType(typeFactory, jsonType);
@@ -553,6 +554,22 @@ private static <T extends Object> T get(Map<String, ? extends @Nullable Object>
553554

554555
// Copied from RelJson for the usage of custom operatorTable
555556
@Nullable SqlOperator toOp(Map<String, ? extends @Nullable Object> map) {
557+
return toOp(map, -1);
558+
}
559+
560+
/**
561+
* Look up an operator matching the {@code map}'s {@code name} / {@code kind} / {@code syntax}.
562+
*
563+
* <p>When {@code operandCount >= 0}, prefer an overload whose declared operand-count range
564+
* accepts {@code operandCount} — Calcite registers multiple operators under the same SQL name
565+
* with different arities (e.g. {@code REGEXP_REPLACE} maps to {@code REGEXP_REPLACE_3},
566+
* {@code REGEXP_REPLACE_PG_4}, and {@code REGEXP_REPLACE_5} in PostgreSQL library). Picking by
567+
* name + kind alone returns whichever overload is listed first, so a 4-arg call would bind to
568+
* the 3-arg operator and fail at runtime with {@code IllegalArgumentException: no matching
569+
* method} during {@code RexToLixTranslator} code-gen. Fall back to the first kind match when
570+
* no overload's arity range fits, preserving prior behavior for callers that pass {@code -1}.
571+
*/
572+
@Nullable SqlOperator toOp(Map<String, ? extends @Nullable Object> map, int operandCount) {
556573
// in case different operator has the same kind, check with both name and kind.
557574
String name = get(map, "name");
558575
String kind = get(map, "kind");
@@ -566,11 +583,21 @@ private static <T extends Object> T get(Map<String, ? extends @Nullable Object>
566583
sqlSyntax,
567584
operators,
568585
SqlNameMatchers.liberal());
586+
SqlOperator firstKindMatch = null;
569587
for (SqlOperator operator : operators) {
570-
if (operator.kind == sqlKind) {
588+
if (operator.kind != sqlKind) {
589+
continue;
590+
}
591+
if (firstKindMatch == null) {
592+
firstKindMatch = operator;
593+
}
594+
if (operandCount < 0 || operator.getOperandCountRange().isValidCount(operandCount)) {
571595
return operator;
572596
}
573597
}
598+
if (firstKindMatch != null) {
599+
return firstKindMatch;
600+
}
574601
String class_ = (String) map.get("class");
575602
if (class_ != null) {
576603
return AvaticaUtils.instantiatePlugin(SqlOperator.class, class_);

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/serde/RexStandardizer.java

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
import org.apache.calcite.rex.RexUtil;
3333
import org.apache.calcite.sql.SqlKind;
3434
import org.apache.calcite.sql.SqlOperator;
35+
import org.apache.calcite.sql.fun.SqlLibraryOperators;
3536
import org.apache.calcite.sql.type.BasicSqlType;
3637
import org.apache.calcite.sql.type.SqlTypeName;
3738
import org.apache.calcite.sql.type.SqlTypeUtil;
@@ -73,6 +74,27 @@ public RexNode visitCall(final RexCall call, ScriptParameterHelper helper) {
7374
// We can downgrade to still use `Sarg` literal instead of replacing it with parameter.
7475
}
7576
}
77+
// PPL's `patterns` lowering emits 4-arg `REGEXP_REPLACE_PG_4(field, pattern, replacement,
78+
// 'g')` so the analytics-engine/DataFusion path gets global replacement (DataFusion's
79+
// `regexp_replace` defaults to first-match-only without an explicit flag). Calcite's
80+
// 3-arg `REGEXP_REPLACE_3` is already replace-all in its enumerable runtime, but the PG_4
81+
// form has no matching `SqlFunctions.regexpReplace(String, String, String, String)` impl
82+
// — Calcite's runtime only ships the `(String, String, String, int[, ...])` shapes (the
83+
// 4-arg variant treats the 4th arg as start-position, not a flags string). The script
84+
// pushdown path codegen would fail with `No applicable constructor/method found`. Collapse
85+
// the 4-arg call to the 3-arg form whenever the flags literal is exactly `"g"`, which
86+
// preserves replace-all semantics on the V2 / Calcite-pushdown side.
87+
if (call.op == SqlLibraryOperators.REGEXP_REPLACE_PG_4
88+
&& call.operands.size() == 4
89+
&& call.operands.get(3) instanceof RexLiteral flagsLit
90+
&& "g".equals(flagsLit.getValueAs(String.class))) {
91+
return helper.rexBuilder
92+
.makeCall(
93+
call.getType(),
94+
SqlLibraryOperators.REGEXP_REPLACE_3,
95+
call.operands.subList(0, 3))
96+
.accept(this, helper);
97+
}
7698
// Some functions only support limited numeric type. Keep conservative here.
7799
boolean allowNumericTypeWiden =
78100
call.op.kind.belongsTo(SqlKind.BINARY_ARITHMETIC)

0 commit comments

Comments
 (0)