-
Notifications
You must be signed in to change notification settings - Fork 208
Extend V2 SQL parser with IN/EXISTS subqueries for unified query path
#5448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,19 +5,25 @@ | |
|
|
||
| package org.opensearch.sql.api.parser; | ||
|
|
||
| import static org.opensearch.sql.ast.dsl.AstDSL.existsSubquery; | ||
| import static org.opensearch.sql.ast.dsl.AstDSL.inSubquery; | ||
| import static org.opensearch.sql.ast.dsl.AstDSL.join; | ||
|
|
||
| import java.util.Optional; | ||
| import org.antlr.v4.runtime.tree.ParseTree; | ||
| import org.opensearch.sql.ast.expression.Not; | ||
| import org.opensearch.sql.ast.expression.UnresolvedExpression; | ||
| import org.opensearch.sql.ast.statement.Query; | ||
| import org.opensearch.sql.ast.statement.Statement; | ||
| import org.opensearch.sql.ast.tree.Join.JoinType; | ||
| import org.opensearch.sql.ast.tree.UnresolvedPlan; | ||
| import org.opensearch.sql.sql.antlr.SQLSyntaxParser; | ||
| import org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser; | ||
| import org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.ExistsSubqueryExpressionAtomContext; | ||
| import org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.InSubqueryPredicateContext; | ||
| import org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.JoinClauseContext; | ||
| import org.opensearch.sql.sql.parser.AstBuilder; | ||
| import org.opensearch.sql.sql.parser.AstExpressionBuilder; | ||
| import org.opensearch.sql.sql.parser.AstStatementBuilder; | ||
|
|
||
| /** SQL query parser that produces {@link UnresolvedPlan} using the V2 ANTLR grammar. */ | ||
|
|
@@ -52,6 +58,11 @@ private static class ExtendedAstBuilder extends AstBuilder { | |
| super(query); | ||
| } | ||
|
|
||
| @Override | ||
| protected AstExpressionBuilder createExpressionBuilder() { | ||
| return new ExtendedAstExpressionBuilder(); | ||
| } | ||
|
|
||
| @Override | ||
| public UnresolvedPlan visitJoinClause(JoinClauseContext ctx) { | ||
| JoinType joinType = toJoinType(ctx); | ||
|
|
@@ -69,5 +80,27 @@ private JoinType toJoinType(JoinClauseContext ctx) { | |
| default -> JoinType.INNER; | ||
| }; | ||
| } | ||
|
|
||
| /** | ||
| * Expression builder with IN/EXISTS subquery support. Accesses the enclosing AstBuilder to | ||
| * visit subquery plan nodes. Must be created via {@link #createExpressionBuilder()} because the | ||
| * enclosing {@code this} reference is not available during {@code super()} construction. | ||
| */ | ||
| private class ExtendedAstExpressionBuilder extends AstExpressionBuilder { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion (architecture): I would prefer if we put these additional predicates behind configured flags instead of making a whole new builder for them. This makes it easier to find where AST visit code is defined since it's all in the same place, and makes it easy to see at a glance under what conditions different features are enabled. It also means that we can easily have multiple types of builders share functionality: if we eventually need another subclass of One way might be to have
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To give you some context: this design is a temporary workaround to enable SQL V2 + Calcite on the Analytics Engine path with minimal impact on existing SQL. Given that, I'd like to keep the change surface as small as possible and avoid introducing a flag-based mechanism. |
||
|
|
||
| @Override | ||
| public UnresolvedExpression visitInSubqueryPredicate(InSubqueryPredicateContext ctx) { | ||
| UnresolvedPlan subquery = ExtendedAstBuilder.this.visit(ctx.querySpecification()); | ||
| UnresolvedExpression inExpr = inSubquery(subquery, visit(ctx.predicate())); | ||
| return (ctx.NOT() != null) ? new Not(inExpr) : inExpr; | ||
| } | ||
|
|
||
| @Override | ||
| public UnresolvedExpression visitExistsSubqueryExpressionAtom( | ||
| ExistsSubqueryExpressionAtomContext ctx) { | ||
| UnresolvedPlan subquery = ExtendedAstBuilder.this.visit(ctx.querySpecification()); | ||
| return existsSubquery(subquery); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -142,4 +142,80 @@ public void testJoinWithFilterAndOrderBy() { | |
| LogicalTableScan(table=[[catalog, departments]]) | ||
| """); | ||
| } | ||
|
|
||
| @Test | ||
| public void testInSubquery() { | ||
| givenQuery( | ||
| """ | ||
| SELECT name FROM catalog.employees | ||
| WHERE age IN (SELECT age FROM catalog.departments WHERE dept_name = 'Engineering') | ||
| """) | ||
| .assertPlan( | ||
| """ | ||
| LogicalProject(name=[$1]) | ||
| LogicalFilter(condition=[IN($2, { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this logical plan can be rewrite as Join? PPL has similar execution path?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, confirmed that PPL generates similar RelNode plan: https://github.com/opensearch-project/sql/blob/main/ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLInSubqueryTest.java#L38 |
||
| LogicalProject(age=[$cor0.age]) | ||
| LogicalFilter(condition=[=($1, 'Engineering')]) | ||
| LogicalTableScan(table=[[catalog, departments]]) | ||
| })], variablesSet=[[$cor0]]) | ||
| LogicalTableScan(table=[[catalog, employees]]) | ||
| """); | ||
| } | ||
|
|
||
| @Test | ||
| public void testExistsSubquery() { | ||
| givenQuery( | ||
| """ | ||
| SELECT name FROM catalog.employees | ||
| WHERE EXISTS (SELECT 1 FROM catalog.departments WHERE dept_id = age) | ||
| """) | ||
| .assertPlan( | ||
| """ | ||
| LogicalProject(name=[$1]) | ||
| LogicalFilter(condition=[EXISTS({ | ||
| LogicalProject(1=[1]) | ||
| LogicalFilter(condition=[=($0, $cor0.age)]) | ||
| LogicalTableScan(table=[[catalog, departments]]) | ||
| })], variablesSet=[[$cor0]]) | ||
| LogicalTableScan(table=[[catalog, employees]]) | ||
| """); | ||
| } | ||
|
|
||
| @Test | ||
| public void testNotInSubquery() { | ||
| givenQuery( | ||
| """ | ||
| SELECT name FROM catalog.employees | ||
| WHERE age NOT IN (SELECT age FROM catalog.departments WHERE dept_name = 'Engineering') | ||
| """) | ||
| .assertPlan( | ||
| """ | ||
| LogicalProject(name=[$1]) | ||
| LogicalFilter(condition=[NOT(IN($2, { | ||
| LogicalProject(age=[$cor0.age]) | ||
| LogicalFilter(condition=[=($1, 'Engineering')]) | ||
| LogicalTableScan(table=[[catalog, departments]]) | ||
| }))], variablesSet=[[$cor0]]) | ||
| LogicalTableScan(table=[[catalog, employees]]) | ||
| """); | ||
| } | ||
|
|
||
| @Test | ||
| public void testNotExistsSubquery() { | ||
| givenQuery( | ||
| """ | ||
| SELECT name FROM catalog.employees | ||
| WHERE NOT EXISTS (SELECT 1 FROM catalog.departments WHERE dept_id = age) | ||
| """) | ||
| .assertPlan( | ||
| """ | ||
| LogicalProject(name=[$1]) | ||
| LogicalFilter(condition=[NOT(EXISTS({ | ||
| LogicalProject(1=[1]) | ||
| LogicalFilter(condition=[=($0, $cor0.age)]) | ||
| LogicalTableScan(table=[[catalog, departments]]) | ||
| }))], variablesSet=[[$cor0]]) | ||
| LogicalTableScan(table=[[catalog, employees]]) | ||
| """); | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,13 +28,15 @@ | |
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.DataTypeFunctionCallContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.DateLiteralContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.DistinctCountFunctionCallContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.ExistsSubqueryExpressionAtomContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.ExtractFunctionCallContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.FilterClauseContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.FilteredAggregationFunctionCallContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.FunctionArgContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.GetFormatFunctionCallContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.HighlightFunctionCallContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.InPredicateContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.InSubqueryPredicateContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.IsNullPredicateContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.LikePredicateContext; | ||
| import static org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser.MathExpressionAtomContext; | ||
|
|
@@ -82,6 +84,7 @@ | |
| import org.opensearch.sql.ast.dsl.AstDSL; | ||
| import org.opensearch.sql.ast.expression.*; | ||
| import org.opensearch.sql.ast.tree.Sort.SortOption; | ||
| import org.opensearch.sql.common.antlr.SyntaxCheckException; | ||
| import org.opensearch.sql.common.utils.StringUtils; | ||
| import org.opensearch.sql.expression.function.BuiltinFunctionName; | ||
| import org.opensearch.sql.sql.antlr.parser.OpenSearchSQLParser; | ||
|
|
@@ -668,4 +671,17 @@ private List<UnresolvedExpression> getExtractFunctionArguments(ExtractFunctionCa | |
| visitFunctionArg(ctx.extractFunction().functionArg())); | ||
| return args; | ||
| } | ||
|
|
||
| @Override | ||
| public UnresolvedExpression visitInSubqueryPredicate(InSubqueryPredicateContext ctx) { | ||
| throw new SyntaxCheckException( | ||
| "IN subquery is not supported in the V2 SQL engine. Falling back to legacy engine."); | ||
| } | ||
|
|
||
| @Override | ||
| public UnresolvedExpression visitExistsSubqueryExpressionAtom( | ||
| ExistsSubqueryExpressionAtomContext ctx) { | ||
| throw new SyntaxCheckException( | ||
| "EXISTS subquery is not supported in the V2 SQL engine. Falling back to legacy engine."); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: Change these to It's not technically a syntax error to use an unsupported operation and I think it'd be good to distinguish "the engine can't handle your query" from "your query is bad." Might need to make a new exception type if we don't have one. If we use the design from #5448 (comment), a clearer option might be
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, it make senses but that introduces many changes for this temporary workaround. Probably we can defer this refactor?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggestion: I would prefer we don't explicitly state "Falling back to legacy engine" in the exception since that's really the exception handler's concern
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought this is internal and won't be returned. Let me double check. Thanks! |
||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: If this is only relevant to legacy we should call this
LegacyExpressionBuilder.We hope that eventually the primary expression builder will have more features than legacy. I also suspect without proof that if we start splitting out the builder, we'll eventually have yet another builder with a different set of features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V2's expression builder is
AstExpressionBuilderso this one isExtendedAstExpressionBuilderwhich extends V2's?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going off the exceptions saying 'falling back to legacy'
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.