Skip to content

Register LENGTH, REGEXP_REPLACE, DATE_TRUNC in unified function spec#5419

Merged
dai-chen merged 2 commits into
opensearch-project:mainfrom
dai-chen:feature/register-scalar-functions
May 7, 2026
Merged

Register LENGTH, REGEXP_REPLACE, DATE_TRUNC in unified function spec#5419
dai-chen merged 2 commits into
opensearch-project:mainfrom
dai-chen:feature/register-scalar-functions

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 7, 2026

Description

Register LENGTH, REGEXP_REPLACE, and DATE_TRUNC in the unified function spec, unblocking ClickBench q28, q29, and q43 at the unified SQL parsing/validation layer. Additionally, a preCompilationRules() extension point enables late-binding function implementations (e.g., DATE_TRUNC → FLOOR) applied only at compilation time, keeping the logical plan clean for external consumers.

Note: These functions are safe to register: LENGTH and REGEXP_REPLACE have consistent signatures across all execution paths (SparkSQL, DataFusion, SQL V2, PPL), and DATE_TRUNC aligns with SparkSQL and DataFusion semantics with no conflicting definition in existing engines. See the criteria in #5346 (comment).

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

dai-chen added 2 commits May 7, 2026 12:42
Add FunctionSpecBuilder DSL with three construction paths: delegateTo()
for existing Calcite operators, vararg() for pushdown-only UDFs, and
operands() for typed functions with optional late-binding impl.

Register LENGTH, REGEXP_REPLACE, and DATE_TRUNC in UnifiedFunctionSpec
LIBRARY category. Contribute via CoreExtension registered in
UnifiedSqlSpec.extended().

This unblocks ClickBench q28 (LENGTH), q29 (REGEXP_REPLACE), and q43
(DATE_TRUNC) at the SQL Plugin parsing/validation layer.

Signed-off-by: Chen Dai <daichen@amazon.com>
Add preCompilationRules() extension point to LanguageSpec that allows
extensions to transform the logical plan before in-memory execution
only. The plan remains clean for external consumers (Analytics Engine).

CoreExtension registers FunctionImplBindingRule which fetches impl
bindings from UnifiedFunctionSpec and rewrites custom function calls
into executable Calcite expressions at compilation time.

DATE_TRUNC now has an impl that rewrites to FLOOR(ts TO unit), making
it executable in-memory while preserving DATE_TRUNC in the logical plan
for the Analytics Engine path.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen self-assigned this May 7, 2026
@dai-chen dai-chen added enhancement New feature or request SQL labels May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The visit method creates a new RexBuilder from node.getCluster().getRexBuilder() but then applies the shuttle to visited instead of node. If the shuttle modifies the node structure, the RexBuilder may reference stale cluster metadata. This can cause inconsistencies when the rewritten expressions reference types or operators from a different cluster state.

public RelNode visit(RelNode node) {
  RelNode visited = super.visit(node);
  RexBuilder rexBuilder = node.getCluster().getRexBuilder();
  return visited.accept(
      new RexShuttle() {
        @Override
        public RexNode visitCall(RexCall call) {
          RexCall visited = (RexCall) super.visitCall(call);
          return Optional.ofNullable(bindings.get(visited.getOperator()))
              .map(impl -> impl.apply(rexBuilder, visited))
              .orElse(visited);
        }
      });
Possible Issue

The impl lambda for date_trunc casts call.operands.get(0) to RexLiteral without null or type checking. If the first operand is not a literal (e.g., a column reference or expression), this throws ClassCastException at compilation time. The function should either validate the operand type or handle non-literal cases gracefully.

RexLiteral unitLiteral = (RexLiteral) call.operands.get(0);
String unit = unitLiteral.getValueAs(String.class);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle potential duplicate operator keys

The Collectors.toMap will throw IllegalStateException if duplicate keys (operators)
exist. Since multiple function specs could theoretically share the same operator,
use toMap with a merge function to handle collisions explicitly, or validate
uniqueness upfront.

api/src/main/java/org/opensearch/sql/api/spec/core/LateBindingFunctionRule.java [28-32]

 private final Map<SqlOperator, BiFunction<RexBuilder, RexCall, RexNode>> bindings =
     UnifiedFunctionSpec.ALL_SPECS.values().stream()
         .filter(spec -> spec.getImpl() != null)
         .collect(
-            Collectors.toMap(UnifiedFunctionSpec::getOperator, UnifiedFunctionSpec::getImpl));
+            Collectors.toMap(
+                UnifiedFunctionSpec::getOperator,
+                UnifiedFunctionSpec::getImpl,
+                (existing, replacement) -> existing));
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that Collectors.toMap throws IllegalStateException on duplicate keys. Adding a merge function prevents runtime exceptions if multiple UnifiedFunctionSpec instances share the same SqlOperator. This is a valid defensive programming practice that improves robustness.

Medium

@dai-chen dai-chen merged commit 0ff1eec into opensearch-project:main May 7, 2026
46 of 48 checks passed
ahkcs added a commit to ahkcs/sql that referenced this pull request May 8, 2026
Bring the analytics-engine catch-up PR up to current upstream/main by
resolving conflicts introduced by 4 main commits since 2026-04-30:

- opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- opensearch-project#5408 (datetime type normalization)
- opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion)
- opensearch-project#5399 (FGAC-scoped SQL cursor continuation)

Resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. PR's RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in opensearch-project#5408 / opensearch-project#5419.

core/executor/QueryService.java: composed both sides — kept HEAD's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking. Same pattern as the original PR
resolution; both improvements are orthogonal.

plugin/SQLPlugin.java: kept HEAD imports for ExplainResponse and
QueryType (referenced by createSqlAnalyticsRouter, which only exists
on the feature branch).

plugin/transport/TransportPPLQueryAction.java: kept HEAD's
queryPlanExecutor parameter alongside main's extensionsHolder
parameter, since both are referenced in the constructor body.

legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced
a duplicated handleException/getRawErrorCode block; HEAD already
contained both the delegateToV2Engine refactor and the ErrorReport
unwrap from main, so HEAD is the correct superset.

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava and
unit tests pass; spotlessCheck passes.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 8, 2026
…ation

Brings the catch-up branch up to current upstream/main (4 commits since
this PR was opened) and current feature/mustang-ppl-integration (9
commits since this PR was opened), so the PR is mergeable into
feature/mustang-ppl-integration without conflicts.

Squashed (rather than two real merge commits) for the same DCO reason
the original commit was squashed: upstream commits authored by many
contributors with inconsistent or missing Signed-off-by trailers would
otherwise be brought into this PR's history.

Newer main commits absorbed (4):
- opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- opensearch-project#5408 (datetime type normalization)
- opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion)
- opensearch-project#5399 (FGAC-scoped SQL cursor continuation)

Newer feature commits absorbed (9):
- opensearch-project#5403 (analytics-engine optional dependency — major rewiring)
- opensearch-project#5407 (Carry CalciteEvalCommandIT through helper-managed index path)
- opensearch-project#5413 (Default plugins.calcite.enabled=true on unified path)
- opensearch-project#5415, opensearch-project#5416, opensearch-project#5417, opensearch-project#5409, opensearch-project#5400, opensearch-project#5406 (smaller carryovers + bumps)

Conflict resolutions (10 from main side, 3 from feature side):

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. PR's RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in opensearch-project#5408 / opensearch-project#5419.

core/executor/QueryService.java: composed both sides — kept HEAD's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking. Same pattern as the original PR
resolution; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took HEAD. The 3-way merge produced
a duplicated handleException/getRawErrorCode block; HEAD already
contained both the delegateToV2Engine refactor and the ErrorReport
unwrap from main, so HEAD is the correct superset.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 TestEventReporterAsListener
cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs
get a parquet-backed index. Added back PR HEAD's test_eval_agent setup
(needed by the dotted-path eval tests for opensearch-project#5351) wrapped in its own
isIndexExist guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java: took feature. PR opensearch-project#5403 made
analytics-engine an optional dependency by moving QueryPlanExecutor
from a required constructor parameter to an @Inject(optional=true)
setter. Feature's design supersedes our prior wiring.

plugin/.../SQLPlugin.java: took feature. The same opensearch-project#5403 simplification
removed loadExtensions/EngineExtensionsHolder/executionEngineExtensions
plumbing (no longer needed once analytics-engine is optionally bound).
Feature retains the createSqlAnalyticsRouter method this PR introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted. Unreferenced
after taking feature's SQLPlugin/TransportPPLQueryAction; not present
on feature branch.

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 8, 2026
Single squashed commit on top of feature/mustang-ppl-integration that
absorbs upstream/main's commits not yet on the feature branch. Replaces
the prior catch-up squash (opensearch-project#5396 base + the original af831d3 rebase
commit) so this PR is a fast-forward into feature/mustang-ppl-integration.

Squashed (rather than a merge commit) because upstream main commits
were authored by many contributors with inconsistent or missing
Signed-off-by trailers; DCO would otherwise reject those commits.

Main commits absorbed (54 since divergence; 4 since the original
catch-up squash was made on 2026-04-30):
- opensearch-project#5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- opensearch-project#5408 (datetime type normalization)
- opensearch-project#5414 (Gradle wrapper bump + @ignore exclusion)
- opensearch-project#5399 (FGAC-scoped SQL cursor continuation)
- opensearch-project#5394 (SQL Vector Search), opensearch-project#5361 (OpenSearch 3.7), opensearch-project#5360 (unified SQL
  language spec), opensearch-project#5240 (PPL Union), and 46 others.

Conflict resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in opensearch-project#5408 / opensearch-project#5419.

core/executor/QueryService.java: composed both sides — kept feature's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took feature. The 3-way merge
produced a duplicated handleException/getRawErrorCode block; feature
already contained both the delegateToV2Engine refactor and the
ErrorReport unwrap from main, so feature is the correct superset.

CLAUDE.md, docs/user/ppl/functions/condition.md: took main.

explain_streamstats_global{,_null_bucket}.yaml: took main (post-opensearch-project#5359
shape).

core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation
utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into
PlanUtils.findInputCollation).

integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT.

ppl/antlr/OpenSearchPPLParser.g4: added unionCommand.

ppl/calcite/CalcitePPLStreamstatsTest.java: added
testMultipleStreamstatsWithWindow.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from opensearch-project#5407) so analytics-engine compatibility runs
get a parquet-backed index. Added the test_eval_agent setup (needed by
the dotted-path eval tests for opensearch-project#5351) wrapped in its own isIndexExist
guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java:
took feature. PR opensearch-project#5403 made analytics-engine an optional dependency by
moving QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter, and removed the loadExtensions /
EngineExtensionsHolder / executionEngineExtensions plumbing. Feature
retains the createSqlAnalyticsRouter method this catch-up introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced
post-opensearch-project#5403; not present on feature).

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit that referenced this pull request May 8, 2026
…5397)

Single squashed commit on top of feature/mustang-ppl-integration that
absorbs upstream/main's commits not yet on the feature branch. Replaces
the prior catch-up squash (#5396 base + the original af831d3 rebase
commit) so this PR is a fast-forward into feature/mustang-ppl-integration.

Squashed (rather than a merge commit) because upstream main commits
were authored by many contributors with inconsistent or missing
Signed-off-by trailers; DCO would otherwise reject those commits.

Main commits absorbed (54 since divergence; 4 since the original
catch-up squash was made on 2026-04-30):
- #5419 (LENGTH/REGEXP_REPLACE/DATE_TRUNC unified function spec)
- #5408 (datetime type normalization)
- #5414 (Gradle wrapper bump + @ignore exclusion)
- #5399 (FGAC-scoped SQL cursor continuation)
- #5394 (SQL Vector Search), #5361 (OpenSearch 3.7), #5360 (unified SQL
  language spec), #5240 (PPL Union), and 46 others.

Conflict resolutions:

api/spec/* (LanguageSpec, UnifiedFunctionSpec, UnifiedPplSpec,
UnifiedSqlSpec): took main. Main is a strict superset — adds
postAnalysisRules and preCompilationRules extension points, the new
FunctionSpecBuilder DSL, SCALAR category for length/regexp_replace/
date_trunc, the DatetimeExtension on PPL spec, and the CoreExtension
wiring on SQL spec. RELEVANCE category is preserved unchanged.

api/UnifiedQueryPlanner.java, api/compiler/UnifiedQueryCompiler.java:
took main. Both adopt the new postAnalysisRules / preCompilationRules
hooks introduced in #5408 / #5419.

core/executor/QueryService.java: composed both sides — kept feature's
CalciteClassLoaderHelper.withCalciteClassLoader wrapper around main's
StageErrorHandler stage tracking; both improvements are orthogonal.

legacy/plugin/RestSqlAction.java: took feature. The 3-way merge
produced a duplicated handleException/getRawErrorCode block; feature
already contained both the delegateToV2Engine refactor and the
ErrorReport unwrap from main, so feature is the correct superset.

CLAUDE.md, docs/user/ppl/functions/condition.md: took main.

explain_streamstats_global{,_null_bucket}.yaml: took main (post-#5359
shape).

core/CalciteRelNodeVisitor + utils/PlanUtils: took main (collation
utility hoisted from CalciteRelNodeVisitor.backtrackForCollation into
PlanUtils.findInputCollation).

integ-test/CalciteNoPushdownIT.java: added CalciteMixedFieldTypeIT.

ppl/antlr/OpenSearchPPLParser.g4: added unionCommand.

ppl/calcite/CalcitePPLStreamstatsTest.java: added
testMultipleStreamstatsWithWindow.

integ-test/build.gradle: took feature. Both sides added the same
@ignore exclusion block; feature has alphabetical ordering and a more
detailed comment explaining the Gradle 9.4.1 cast bug.

integ-test/.../CalciteEvalCommandIT.java: composed both sides. Took
feature's helper-managed test_eval provisioning (createIndexByRestClient
+ isIndexExist guard, from #5407) so analytics-engine compatibility runs
get a parquet-backed index. Added the test_eval_agent setup (needed by
the dotted-path eval tests for #5351) wrapped in its own isIndexExist
guard for the same parquet-aware idempotency.

plugin/.../TransportPPLQueryAction.java, plugin/.../SQLPlugin.java:
took feature. PR #5403 made analytics-engine an optional dependency by
moving QueryPlanExecutor from a required constructor parameter to an
@Inject(optional=true) setter, and removed the loadExtensions /
EngineExtensionsHolder / executionEngineExtensions plumbing. Feature
retains the createSqlAnalyticsRouter method this catch-up introduced.

plugin/.../config/EngineExtensionsHolder.java: deleted (unreferenced
post-#5403; not present on feature).

Build: :api, :core, :opensearch-sql-plugin, :legacy compileJava +
:integ-test compileTestJava all pass; unit tests pass; spotlessCheck
clean.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 11, 2026
Brings the branch up to current upstream/main (54 commits since the last
catch-up at opensearch-project#5397, divergence point 513e1b2). Preserves both parents
so commit lineage from main is retained on the feature branch.

### Main commits absorbed (54 since divergence)

Highlights:
- opensearch-project#5419 Register LENGTH/REGEXP_REPLACE/DATE_TRUNC in unified function spec
- opensearch-project#5408 Normalize datetime types for unified query API
- opensearch-project#5414 Gradle wrapper 9.4.1 bump + exclude @ignore classes
- opensearch-project#5399 [BugFix] Scope SQL cursor continuation to original query indices under FGAC
- opensearch-project#5394 [Feature] Support SQL Vector Search
- opensearch-project#5361 Version bump to OpenSearch 3.7 (Jackson 2 → 3 parser API)
- opensearch-project#5360 Define unified SQL language spec with composable extensions
- opensearch-project#5240 [FEATURE] Union command in PPL
- opensearch-project#5266 Initial implementation of report-builder interface
- opensearch-project#5278 isnotnull condition support
- opensearch-project#3922, opensearch-project#4659, opensearch-project#4800, opensearch-project#5099, opensearch-project#5169, opensearch-project#5172, opensearch-project#5175, opensearch-project#5185, opensearch-project#5347, opensearch-project#5370
  and other bugfixes
- 34 others (bugfixes, doc updates, infra)

### Conflict resolutions

Seven content conflicts resolved during the merge. Resolution kept the
feature branch's analytics-engine-path semantics where main's changes
would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict in Builder.build(); took main's tighter
  formatting (no extra blank).

- core/.../executor/QueryService.java
  Main reformatted executeWithCalcite without the
  CalciteClassLoaderHelper.withCalciteClassLoader() wrapper. Kept the
  feature-branch wrapping — required for analytics-engine classloader
  isolation — and the matching import. Same shape applies to
  explainWithCalcite which git auto-merged correctly.

- integ-test/build.gradle
  Both sides added @Ignore-class exclusions to work around the Gradle
  9.4.1 TestEventReporterAsListener ClassCastException. Took the
  feature-branch's detailed root-cause comment, kept ASCII ordering of
  the JSONRequestIT/JoinIT and SQLFunctionsIT/ShowIT/SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Feature branch wraps test-data PUTs in
  if (!TestUtils.isIndexExist(...)) for idempotency on the
  helper-managed-index analytics-engine compatibility run; main has the
  unguarded PUTs. Kept feature's idempotency guards for both test_eval
  and test_eval_agent.

- legacy/.../RestSqlAction.java
  Feature branch added delegateToV2Engine(...) (extracted from the
  analytics-engine routing path) and ALSO has handleException /
  getRestStatus / getRawErrorCode; main added the latter three in the
  same position. Kept feature's delegateToV2Engine, kept one copy of
  the helpers, removed the duplicate set that git produced.

- plugin/.../SQLPlugin.java
  Both sides expanded the executor import block. Kept the union:
  ExecutionEngine + ExecutionEngine.ExplainResponse + QueryType (all
  three are referenced in this file).

- plugin/.../transport/TransportPPLQueryAction.java
  Main now passes engine extensions to OpenSearchPluginModule via
  extensionsHolder.engines(). Feature creates a local pluginSettings
  for the unified-query handler. Combined both — kept the
  extensions-aware module construction *and* the pluginSettings /
  pluginSettingsRef wiring.

EngineExtensionsHolder is a new file from main (opensearch-project#5298) preserved as-is.

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava
  passes locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 11, 2026
Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
  CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
  containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)

### Feature-branch commits squashed (22)

opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413,
opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267,
opensearch-project#5397, opensearch-project#5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
ahkcs added a commit to ahkcs/sql that referenced this pull request May 11, 2026
Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(opensearch-project#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (opensearch-project#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (opensearch-project#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (opensearch-project#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (opensearch-project#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (opensearch-project#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (opensearch-project#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (opensearch-project#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (opensearch-project#5429, opensearch-project#5432) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (opensearch-project#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (opensearch-project#5400, opensearch-project#5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (opensearch-project#5407, opensearch-project#5417);
  CalciteReplaceCommandIT column-order-agnostic (opensearch-project#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (opensearch-project#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (opensearch-project#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (opensearch-project#5418)
- Calcite tolerance fixes: array() default type (opensearch-project#5421),
  containsNestedAggregator flat-leaf schemas (opensearch-project#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (opensearch-project#5426)

### Feature-branch commits squashed (22)

opensearch-project#5432, opensearch-project#5429, opensearch-project#5426, opensearch-project#5423, opensearch-project#5421, opensearch-project#5418, opensearch-project#5403, opensearch-project#5417, opensearch-project#5415, opensearch-project#5416, opensearch-project#5413,
opensearch-project#5407, opensearch-project#5409, opensearch-project#5406, opensearch-project#5400, opensearch-project#5396, opensearch-project#5317, opensearch-project#5302, opensearch-project#5285, opensearch-project#5275, opensearch-project#5267,
opensearch-project#5397, opensearch-project#5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at opensearch-project#5397, divergence point 513e1b2). Highlights: opensearch-project#5419,
opensearch-project#5408, opensearch-project#5414, opensearch-project#5399, opensearch-project#5394, opensearch-project#5361, opensearch-project#5360, opensearch-project#5240, opensearch-project#5266, opensearch-project#5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (opensearch-project#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (opensearch-project#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (opensearch-project#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
ahkcs added a commit that referenced this pull request May 11, 2026
* Land analytics-engine PPL integration into main

Single squashed delivery of the long-running
feature/mustang-ppl-integration branch into main, consolidating 22
feature-branch PRs plus the conflict-resolved merge of current main.
Squashed because the feature branch's history includes commits with
missing or mismatched Signed-off-by trailers that block DCO at this
scope — the equivalent issue documented for the catch-up squashes
(#5397).

The feature branch f006e29 is retained for individual-commit lineage.

### What this delivers

Analytics-engine PPL integration — a new execution path that routes
Parquet-backed (non-Lucene) indices through an analytics engine while
keeping Lucene-backed indices on the existing v2 / Calcite paths.

Headline pieces:
- Query routing (#5267) — PPL queries against Parquet-backed indices
  hand off to the analytics-engine execution path; Lucene-backed indices
  continue through the legacy path
- Explain support (#5275) — EXPLAIN covers the analytics-engine path
- Profiling + UnifiedQueryParser (#5285) — migrates PPL parsing to the
  unified parser and wires profiling metrics through the analytics path
- extendedPlugins wiring (#5302) — analytics-engine attaches as an
  OpenSearch extension via SPI
- SQL REST endpoint integration (#5317) — same analytics-route fork
  applied to the SQL transport, plus delegateToV2Engine extraction in
  RestSqlAction
- Async QueryPlanExecutor (#5396) — async execution for analytics-engine
  plans + version bump to OpenSearch 3.7
- Optional dependency (#5403) — analytics-engine becomes an optional
  runtime dep so the SQL bundle is shippable without it
- Index-setting-based routing (#5429, #5432) — replaces the earlier
  table-name-prefix heuristic with an authoritative index-setting check

Supporting infrastructure:
- Gradle wrapper bump to 9.4.1 (#5406)
- Jar-hell exclusions for arrow-flight-rpc / httpcore5-h2 /
  httpcore5-reactive / httpclient5 (#5400, #5409)
- IT plumbing: CalciteEvalCommandIT / CalciteFieldFormatCommandIT
  carried through the helper-managed index path (#5407, #5417);
  CalciteReplaceCommandIT column-order-agnostic (#5415); @ignore'd
  Calcite ITs dropped from CalciteNoPushdownIT (#5416)
- plugins.calcite.enabled=true defaulted on the unified query path
  (#5413)
- PPL_REX_MAX_MATCH_LIMIT bridged into UnifiedQueryContext (#5418)
- Calcite tolerance fixes: array() default type (#5421),
  containsNestedAggregator flat-leaf schemas (#5423)
- Sandbox deps switched to analytics-api JDK 21 surface (#5426)

### Feature-branch commits squashed (22)

#5432, #5429, #5426, #5423, #5421, #5418, #5403, #5417, #5415, #5416, #5413,
#5407, #5409, #5406, #5400, #5396, #5317, #5302, #5285, #5275, #5267,
#5397, #5286

### Main commits absorbed via the merge (54)

Brings the branch up to current upstream/main (54 commits since the
last catch-up at #5397, divergence point 513e1b2). Highlights: #5419,
#5408, #5414, #5399, #5394, #5361, #5360, #5240, #5266, #5278, plus 44
others (bugfixes, doc updates, infra).

### Conflict resolutions (7)

Resolved during the merge of main into the feature branch. Resolution
kept the feature branch's analytics-engine-path semantics where main's
changes would have regressed them.

- api/.../UnifiedQueryContext.java
  Blank-line-only conflict; took main's tighter formatting.

- core/.../executor/QueryService.java
  Kept feature's CalciteClassLoaderHelper.withCalciteClassLoader(...)
  wrapping (required for analytics-engine classloader isolation) and
  the matching import.

- integ-test/build.gradle
  Kept feature's detailed root-cause comment on the Gradle 9.4.1
  TestEventReporterAsListener workaround; kept ASCII ordering of
  JSONRequestIT / JoinIT and SQLFunctionsIT / ShowIT / SourceFieldIT
  entries.

- integ-test/.../CalciteEvalCommandIT.java
  Kept feature's if (!TestUtils.isIndexExist(...)) idempotency guards
  on test_eval and test_eval_agent setup (needed for the helper-managed
  index analytics-engine compatibility run).

- legacy/.../RestSqlAction.java
  Kept feature's delegateToV2Engine(...) (extracted from the
  analytics-engine routing path). Both sides added handleException /
  getRestStatus / getRawErrorCode; removed the duplicate set git
  produced.

- plugin/.../SQLPlugin.java
  Took the union of imports: ExecutionEngine +
  ExecutionEngine.ExplainResponse + QueryType.

- plugin/.../transport/TransportPPLQueryAction.java
  Combined main's OpenSearchPluginModule(extensionsHolder.engines()) and
  feature's local pluginSettings / pluginSettingsRef wiring.

EngineExtensionsHolder.java is a new file from main (#5298) preserved
as-is.

### Compatibility / opt-in

The analytics-engine path is gated by the extendedPlugins extension
being installed (#5403 makes the dep optional). Clusters without
analytics-engine installed see no behavior change. Clusters with
analytics-engine installed route only Parquet-backed indices through
the new path (#5429 — by index setting).

### Verification

- ./gradlew :api:compileJava :core:compileJava :legacy:compileJava
  :opensearch-sql-plugin:compileJava :integ-test:compileTestJava passes
  locally

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>

* Address @penghuo: revert stray blank line in doctest/build.gradle

After 'apply plugin: opensearch.testclusters', one blank line is enough
— restoring the single-blank spacing to match upstream/main.

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Co-authored-by: bowenlan-amzn <bowenlan23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants