feat: support InSubquery and Exists in Projection expressions#21363
feat: support InSubquery and Exists in Projection expressions#21363crm26 wants to merge 5 commits intoapache:mainfrom
Conversation
Previously, `DecorrelatePredicateSubquery` only handled InSubquery and Exists nodes in Filter (WHERE/HAVING). When they appeared in Projection expressions (CASE WHEN, COALESCE, bare boolean SELECT), the physical planner errored with "Physical plan does not support logical expression InSubquery(...)". This adds a Projection match arm following the same pattern as ScalarSubqueryToJoin's Projection handler. InSubquery and Exists in Projection expressions are decorrelated into LeftMark joins, with the mark column aliased to preserve the original schema. When decorrelation fails (e.g., correlated subquery with LIMIT), the plan is returned unchanged — matching ScalarSubqueryToJoin's bail-out semantics. Patterns now supported: - CASE WHEN id IN (SELECT ...) THEN ... ELSE ... END - CASE WHEN EXISTS (SELECT ...) THEN ... END - NOT IN / NOT EXISTS in CASE - id IN (SELECT ...) as bare boolean in SELECT - Correlated IN/EXISTS in expressions - Multiple subqueries in one expression - COALESCE with IN subquery Includes sqllogictest coverage (8 test cases) and plan-shape unit test for the bail-out path.
|
@crm26 Thanks for this, I'll take a look shortly! Can you fix the test and fmt CI failures? |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
neilconway
left a comment
There was a problem hiding this comment.
Overall this looks great!
One minor quibble: I notice the name of the file and various identifiers refer to "Predicate", which will no longer be accurate. The common thread is that we are rewriting only IN and EXISTS subqueries, so maybe some name involving that -- like ExistenceSubqueryToJoin or ExistsInSubqueryToJoin. But if we're going to change that, let's do it in a followup PR.
- Per-expression early bail-out: check has_subquery(&rewritten) immediately after each rewrite instead of processing all exprs then checking at end. Preserves all-or-nothing semantics with earlier termination. - Add "Optimization:" prefix to alias preservation comment to clarify it is not required for correctness. - Add test projection_mixed_decorrelatable_and_non_bails_out covering one decorrelatable + one non-decorrelatable subquery in the same projection (exercises the partial-rewrite-then-bail path). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough review, @neilconway! Addressed all three points in the latest push (commit 12a12bc):
Validation: 59 decorrelate tests pass (was 58), full optimizer suite (665 + 26 + 5) passes, fmt + clippy -D warnings clean, subquery.slt passes. |
The test at predicates.slt:845 asserted that `NULL IN (SELECT ...)` in a projection would fail with "Physical plan does not support logical expression InSubquery". With decorrelation of InSubquery in projections now supported, the query correctly returns (false, true) instead of erroring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Found and fixed the CI failure, pushed as 9288448. The My earlier validation gap: I only ran
CI on the new commit is currently in On the rename suggestion ("Predicate" → "ExistsInSubquery") — happy to do that, agreed a followup PR makes sense to keep this one focused. |
|
@crm26 Thanks for iterating on this! The comment I was suggesting you add "Optimization:" to was actually a different one :) My suggestion was attached to the "// Skip if no predicate subqueries in any projection expression" check at the top of the "Projection" match arm. I noticed you didn't address my suggestion to simplify the cloning / early-return code path; let me know if what I'm suggesting isn't clear. I don't have permissions to run CI myself, but perhaps @alamb or someone else can help. BTW I personally find it helpful to "Resolve comment" on review comments that you believe have been addressed, so we can more easily see what remains open. |
Previous push addressed the per-expression bail-out correctly but got two items wrong: - Added "Optimization:" prefix to the alias preservation comment instead of the skip check at the top of the Projection arm (the one neilconway actually pointed at). - Did not apply the ScalarSubqueryToJoin-style cloning simplification. This commit fixes both: - Moved the "Optimization:" prefix to the correct line and removed the stray prefix on the alias comment. - Replaced Arc::clone + Arc::unwrap_or_clone with projection.input.as_ref().clone(), mirroring ScalarSubqueryToJoin. - Dropped the original_input variable and the Projection::try_new_with_schema reconstruction on bail-out; the bail-out now returns Ok(Transformed::no(LogicalPlan::Projection( projection))) directly since projection is still fully owned. - Removed the now-unused Projection import. The loop clones projection.expr upfront so it can consume owned expressions without holding a borrow on projection, which lets the bail-out move projection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@neilconway You were right — I got two of the three wrong in the last push. Apologies for the noise. Pushed fixes in
Validation I ran locally this time (full suite, not just the targeted file):
Will resolve the review threads in the UI once CI confirms. The "Predicate" → "ExistsInSubquery" rename is noted for a followup PR as you suggested. |
Summary
DecorrelatePredicateSubqueryonly handledInSubqueryandExistsinFilternodes. When they appeared in Projection expressions (CASE, COALESCE, bare boolean SELECT), the physical planner errored:This adds a
Projectionmatch arm following the same pattern asScalarSubqueryToJoin's Projection handler.What's fixed
All of these now work:
Design
ScalarSubqueryToJoin's established Projection handler patternrewrite_inner_subqueries+mark_join— no new decorrelation logicTests
in_subquery_projection.slt)