[BugFix] Reject outer WHERE on vectorSearch() subqueries to prevent silently dropped filters#1
Closed
mengweieric wants to merge 1 commit into
Closed
Conversation
When vectorSearch() is wrapped in a subquery and the outer query has its
own WHERE clause, for example:
SELECT * FROM (SELECT v.firstname, v.state
FROM vectorSearch(table='idx', field='emb',
vector='[1,2]', option='k=5') AS v) t
WHERE t.state = 'TX'
the outer predicate never reaches the push-down contract. The plan shape
after analysis is LogicalFilter over LogicalProject over the scan
builder, so the PUSH_DOWN_FILTER rule (which matches filter directly
above scanBuilder) does not fire. The filter is then applied in memory
after k-NN has already returned top-k documents ranked by vector
distance, which can silently yield zero rows with no explanation.
This commit adds a post-optimization validation hook on TableScanBuilder
(default no-op) and invokes it from Planner.plan() once all push-down
rules have settled. VectorSearchIndexScanBuilder overrides the hook to
walk the fully optimized plan and reject the outer-WHERE-over-subquery
shape: a LogicalFilter whose descendant chain reaches the scan builder
through one or more LogicalProject nodes. A filter directly above the
scan builder (WHERE on vectorSearch() itself) is explicitly allowed
because that path has already been exercised by push-down.
The error message names the shape and tells the user to move the
predicate inside the subquery so it is applied during the k-NN search.
Tests:
- VectorSearchIndexScanBuilderTest adds six cases covering the bad
shape, nested subqueries, and the three positive controls (filter
directly on scanBuilder, inner filter wrapped in outer project, no
filter at all) plus a bare-scanBuilder defensive baseline.
- VectorSearchSubqueryIT adds integration tests for the exact tracker
query shape, the with-LIMIT variant, the _explain path, and positive
controls for inner WHERE and no-WHERE subqueries.
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Owner
Author
|
Closing and reopening against upstream opensearch-project/sql. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the P0 tracker item Outer WHERE on subquery silently dropped from DSL: when
vectorSearch()is wrapped in a subquery and the outer query has its ownWHEREclause, the outer predicate never reaches the push-down contract. The plan shape after analysis isLogicalFilteroverLogicalProjectover the scan builder, so thePUSH_DOWN_FILTERrule (which matches filter directly abovescanBuilder) does not fire. The filter is then applied in memory after k-NN has already returned top-k documents ranked by vector distance, which can silently yield zero rows with no explanation.Failing shape
Fix
Per the tracker (either reject vectorSearch() inside subqueries with outer WHERE, or document clearly) this PR chooses rejection as the safer preview behavior.
validatePlan(LogicalPlan root)hook onTableScanBuilder(default no-op). Scan builders can now inspect their ancestors once all push-down rules have settled.Planner.plan()so every scan builder in the optimized plan gets a chance to validate.VectorSearchIndexScanBuilderoverrides the hook to walk the plan and reject the outer-WHERE-over-subquery shape: aLogicalFilterwhose descendant chain reaches this scan builder through one or moreLogicalProjectnodes (the subquery-boundary marker). A filter directly above the scan builder (WHERE onvectorSearch()itself) is explicitly allowed because that path has already been exercised by push-down.Test plan
VectorSearchIndexScanBuilderTestcovering the bad shape, nested subqueries, and three positive controls (filter directly on scanBuilder, inner filter wrapped in outer project, no filter at all) plus a bare-scanBuilder defensive baseline.VectorSearchSubqueryITcovering the exact tracker query shape, the with-LIMIT variant, the_explainpath, and positive controls for inner WHERE and no-WHERE subqueries.VectorSearchIT(18/18) andVectorSearchExplainIT(11/11) still green../gradlew spotlessCheckclean../gradlew :core:testand:opensearch:testgreen.