Extend V2 SQL parser with IN/EXISTS subqueries for unified query path#5448
Conversation
PR Reviewer Guide 🔍(Review updated until commit bd3074d)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to bd3074d Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 62a146b
Suggestions up to commit cc181cd
|
cc181cd to
62a146b
Compare
|
Persistent review updated to latest commit 62a146b |
Add grammar rules (inSubqueryPredicate, existsSubqueryExpressionAtom) and wire them through ExtendedAstExpressionBuilder to produce InSubquery and ExistsSubquery AST nodes for the Calcite-based unified query path. Base AstExpressionBuilder throws SyntaxCheckException to preserve legacy engine fallback. AstBuilder now uses createExpressionBuilder() factory method to allow subclass customization. Also add Alias handling in CalciteRelNodeVisitor.expandProjectFields required for any non-SELECT * query in the unified path. Signed-off-by: Chen Dai <daichen@amazon.com>
62a146b to
bd3074d
Compare
|
Persistent review updated to latest commit bd3074d |
| public UnresolvedExpression visitExistsSubqueryExpressionAtom( | ||
| ExistsSubqueryExpressionAtomContext ctx) { | ||
| throw new SyntaxCheckException( | ||
| "EXISTS subquery is not supported in the V2 SQL engine. Falling back to legacy engine."); |
There was a problem hiding this comment.
suggestion: Change these to NotSupportedException or similar to make downstream handling more precise.
It's not technically a syntax error to use an unsupported operation and I think it'd be good to distinguish "the engine can't handle your query" from "your query is bad."
Might need to make a new exception type if we don't have one. If we use the design from #5448 (comment), a clearer option might be NotEnabledException.
There was a problem hiding this comment.
yeah, it make senses but that introduces many changes for this temporary workaround. Probably we can defer this refactor?
| public UnresolvedExpression visitExistsSubqueryExpressionAtom( | ||
| ExistsSubqueryExpressionAtomContext ctx) { | ||
| throw new SyntaxCheckException( | ||
| "EXISTS subquery is not supported in the V2 SQL engine. Falling back to legacy engine."); |
There was a problem hiding this comment.
suggestion: I would prefer we don't explicitly state "Falling back to legacy engine" in the exception since that's really the exception handler's concern
There was a problem hiding this comment.
I thought this is internal and won't be returned. Let me double check. Thanks!
| * visit subquery plan nodes. Must be created via {@link #createExpressionBuilder()} because the | ||
| * enclosing {@code this} reference is not available during {@code super()} construction. | ||
| */ | ||
| private class ExtendedAstExpressionBuilder extends AstExpressionBuilder { |
There was a problem hiding this comment.
issue: If this is only relevant to legacy we should call this LegacyExpressionBuilder.
We hope that eventually the primary expression builder will have more features than legacy. I also suspect without proof that if we start splitting out the builder, we'll eventually have yet another builder with a different set of features.
There was a problem hiding this comment.
V2's expression builder is AstExpressionBuilder so this one is ExtendedAstExpressionBuilder which extends V2's?
There was a problem hiding this comment.
I was going off the exceptions saying 'falling back to legacy'
There was a problem hiding this comment.
- Execution path 1: V2 grammar/AstBuilder + Legacy (will be migrating to path 2)
- Execution path 2: V2 extended grammar/AstBuilder + CalciteRelNodeVisitor
| * visit subquery plan nodes. Must be created via {@link #createExpressionBuilder()} because the | ||
| * enclosing {@code this} reference is not available during {@code super()} construction. | ||
| */ | ||
| private class ExtendedAstExpressionBuilder extends AstExpressionBuilder { |
There was a problem hiding this comment.
suggestion (architecture): I would prefer if we put these additional predicates behind configured flags instead of making a whole new builder for them.
This makes it easier to find where AST visit code is defined since it's all in the same place, and makes it easy to see at a glance under what conditions different features are enabled. It also means that we can easily have multiple types of builders share functionality: if we eventually need another subclass of AstExpressionBuilder we may end up needing to duplicate visit code too.
One way might be to have AstExpressionBuilder take a enableSubqueryPredicates flag as part of its construction (or enableExtendedFeatures, or enableLegacyFeatures...). Another could be supplying an Engine enum and using that.
There was a problem hiding this comment.
To give you some context: this design is a temporary workaround to enable SQL V2 + Calcite on the Analytics Engine path with minimal impact on existing SQL. Given that, I'd like to keep the change surface as small as possible and avoid introducing a flag-based mechanism.
| .assertPlan( | ||
| """ | ||
| LogicalProject(name=[$1]) | ||
| LogicalFilter(condition=[IN($2, { |
There was a problem hiding this comment.
Does this logical plan can be rewrite as Join? PPL has similar execution path?
There was a problem hiding this comment.
Yes, confirmed that PPL generates similar RelNode plan: https://github.com/opensearch-project/sql/blob/main/ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLInSubqueryTest.java#L38
PR opensearch-project#5448 (parent commit b10d541) introduced SQL IN/EXISTS subquery support that produces a Project AST whose project list contains Alias expressions. The 'Lower structured search predicates' commit (c18b11e) on this PR accidentally dropped the case Alias branch from CalciteRelNodeVisitor.expandProjectFields, so the subquery's project list hit the default arm and threw 'Unexpected expression type in project list: Alias'. Restoring the one-line branch fixes UnifiedQueryPlannerSqlV2Test testInSubquery / testNotInSubquery / testExistsSubquery / testNotExistsSubquery (the four macOS unit-test failures on this PR). Signed-off-by: Kai Huang <ahkcs@amazon.com>
…search-project#5448) Add grammar rules (inSubqueryPredicate, existsSubqueryExpressionAtom) and wire them through ExtendedAstExpressionBuilder to produce InSubquery and ExistsSubquery AST nodes for the Calcite-based unified query path. Base AstExpressionBuilder throws SyntaxCheckException to preserve legacy engine fallback. AstBuilder now uses createExpressionBuilder() factory method to allow subclass customization. Also add Alias handling in CalciteRelNodeVisitor.expandProjectFields required for any non-SELECT * query in the unified path. Signed-off-by: Chen Dai <daichen@amazon.com>
Description
Extend the V2 SQL ANTLR grammar and add
ExtendedAstExpressionBuilderto supportINandEXISTSsubqueries in the unified Calcite-based query path. The baseAstExpressionBuildercontinues to throwSyntaxCheckExceptionfor these statements, preserving the legacy engine fallback in production.Notes
CalciteRelNodeVisitorhere to unblock tests using SELECT with specific fieldsEXISTSsubquery only works for PartiQL nested fields in the legacy engine, thus no IT added herePlanned PRs
JOINfor unified query path #5446CalciteRelNodeVisitorissues identified during V2 AST integrationRelated Issues
Part of #5248
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.