Skip to content

Commit 154df52

Browse files
committed
Reject OFFSET, WHERE on _score, and script subtrees under filter_type=efficient
Three bug fixes in VectorSearchQueryBuilder that close silent-data-loss paths on vectorSearch() relations: 1. OFFSET on vectorSearch(): pushDownLimit now rejects any LogicalLimit with a non-zero offset. The parent path would push `from: <offset>` into the OpenSearch request, silently shifting the top-k window and dropping the highest-scoring results. LogicalSort carries only a count (no offset), so no change is needed on the sort path. 2. WHERE on _score: pushDownFilter walks the condition with an ExpressionNodeVisitor before compiling to DSL and rejects any ReferenceExpression whose attr matches "_score" case-insensitively. Previously the synthetic _score column compiled to a range query on a non-existent stored field, which returned zero rows at the cluster with no error. Users who want a score floor should use option='min_score=...'. 3. Script subtrees under filter_type=efficient: after building the WHERE QueryBuilder, the EFFICIENT branch recursively scans for any ScriptQueryBuilder (traversing BoolQueryBuilder's must/filter/should/ mustNot lists and ConstantScoreQueryBuilder wrappers). If found, refuse to embed it under knn.filter. AOSS and serverless vector collections reject script queries inside knn.filter; without this guard the user saw an opaque cluster-side failure instead of a clear plan-time error. Unit tests cover each rejection (including the compound predicate path for _score, the uppercase _SCORE bypass attempt, and the POST-mode negative case for script predicates). Integration tests assert 400 status and the user-facing message from the SQL endpoint, including the combined sort + limit + offset shape that exercises the pushDownSort() path. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
1 parent a5102b7 commit 154df52

3 files changed

Lines changed: 339 additions & 0 deletions

File tree

integ-test/src/test/java/org/opensearch/sql/sql/VectorSearchIT.java

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,86 @@ public void testBareAggregateRejects() throws IOException {
263263
containsString("Aggregations are not supported on vectorSearch() relations"));
264264
}
265265

266+
// ── OFFSET / WHERE _score / filter_type=efficient script rejection ───
267+
268+
@Test
269+
public void testOffsetRejected() throws IOException {
270+
ResponseException ex =
271+
expectThrows(
272+
ResponseException.class,
273+
() ->
274+
executeQuery(
275+
"SELECT v._id FROM vectorSearch(table='"
276+
+ TEST_INDEX
277+
+ "', field='embedding', "
278+
+ "vector='[1.0, 2.0]', option='k=5') AS v "
279+
+ "LIMIT 5 OFFSET 2"));
280+
281+
assertThat(ex.getMessage(), containsString("OFFSET is not supported on vectorSearch()"));
282+
assertThat(ex.getMessage(), containsString("LIMIT only"));
283+
}
284+
285+
@Test
286+
public void testScoreInWhereRejected() throws IOException {
287+
ResponseException ex =
288+
expectThrows(
289+
ResponseException.class,
290+
() ->
291+
executeQuery(
292+
"SELECT v._id FROM vectorSearch(table='"
293+
+ TEST_INDEX
294+
+ "', field='embedding', "
295+
+ "vector='[1.0, 2.0]', option='k=5') AS v "
296+
+ "WHERE v._score > 0.5 "
297+
+ "LIMIT 5"));
298+
299+
assertThat(ex.getMessage(), containsString("WHERE on _score is not supported"));
300+
assertThat(ex.getMessage(), containsString("min_score"));
301+
}
302+
303+
@Test
304+
public void testOrderByScoreDescLimitOffsetRejected() throws IOException {
305+
// The natural user shape pairs sort with pagination: ORDER BY _score DESC LIMIT N OFFSET M.
306+
// The planner's pushDownSort() path can collapse the sort+limit into a top-k size, so OFFSET
307+
// must still be rejected by pushDownLimit when the combined form is used. Without this guard
308+
// the parent builder would push `from: <offset>` and silently shift the top-k window.
309+
ResponseException ex =
310+
expectThrows(
311+
ResponseException.class,
312+
() ->
313+
executeQuery(
314+
"SELECT v._id FROM vectorSearch(table='"
315+
+ TEST_INDEX
316+
+ "', field='embedding', "
317+
+ "vector='[1.0, 2.0]', option='k=5') AS v "
318+
+ "ORDER BY v._score DESC "
319+
+ "LIMIT 5 OFFSET 2"));
320+
321+
assertThat(ex.getMessage(), containsString("OFFSET is not supported on vectorSearch()"));
322+
}
323+
324+
@Test
325+
public void testEfficientModeRejectsScriptPredicate() throws IOException {
326+
// WHERE age + 1 > 30 compiles to a ScriptQueryBuilder under the hood because the outer >
327+
// is applied to an arithmetic expression, not a direct field reference. Efficient mode
328+
// cannot embed script queries under knn.filter, so this must be rejected up front with a
329+
// clear remediation hint instead of a cluster-side failure.
330+
ResponseException ex =
331+
expectThrows(
332+
ResponseException.class,
333+
() ->
334+
executeQuery(
335+
"SELECT v._id FROM vectorSearch(table='"
336+
+ TEST_INDEX
337+
+ "', field='embedding', "
338+
+ "vector='[1.0, 2.0]', option='k=5,filter_type=efficient') AS v "
339+
+ "WHERE v.age + 1 > 30 "
340+
+ "LIMIT 5"));
341+
342+
assertThat(ex.getMessage(), containsString("filter_type=efficient does not support"));
343+
assertThat(ex.getMessage(), containsString("script queries"));
344+
}
345+
266346
// ── k-NN plugin capability check ──────────────────────────────────────
267347
// The default integ-test cluster does not have the k-NN plugin installed. Execution-path
268348
// queries against vectorSearch() should therefore fail with the clear "k-NN plugin missing"

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/VectorSearchQueryBuilder.java

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,19 @@
66
package org.opensearch.sql.opensearch.storage.scan;
77

88
import java.util.Map;
9+
import java.util.concurrent.atomic.AtomicBoolean;
910
import java.util.function.Function;
1011
import org.apache.commons.lang3.tuple.Pair;
1112
import org.opensearch.index.query.BoolQueryBuilder;
13+
import org.opensearch.index.query.ConstantScoreQueryBuilder;
1214
import org.opensearch.index.query.QueryBuilder;
1315
import org.opensearch.index.query.QueryBuilders;
16+
import org.opensearch.index.query.ScriptQueryBuilder;
1417
import org.opensearch.sql.ast.tree.Sort;
1518
import org.opensearch.sql.ast.tree.Sort.SortOption;
1619
import org.opensearch.sql.exception.ExpressionEvaluationException;
1720
import org.opensearch.sql.expression.Expression;
21+
import org.opensearch.sql.expression.ExpressionNodeVisitor;
1822
import org.opensearch.sql.expression.ReferenceExpression;
1923
import org.opensearch.sql.opensearch.request.OpenSearchRequestBuilder;
2024
import org.opensearch.sql.opensearch.storage.FilterType;
@@ -78,6 +82,18 @@ public VectorSearchQueryBuilder(
7882
public boolean pushDownFilter(LogicalFilter filter) {
7983
FilterQueryBuilder queryBuilder = new FilterQueryBuilder(new DefaultExpressionSerializer());
8084
Expression queryCondition = filter.getCondition();
85+
86+
// Reject WHERE predicates that reference the synthetic _score column. The planner surfaces
87+
// _score as a projectable/filterable field, but it is not a stored document field in
88+
// OpenSearch. If we let this pass through, FilterQueryBuilder produces a range query on a
89+
// non-existent field and the cluster silently returns 0 rows. Users who want a score floor
90+
// should use option='min_score=...' instead.
91+
if (containsScoreReference(queryCondition)) {
92+
throw new ExpressionEvaluationException(
93+
"WHERE on _score is not supported on vectorSearch()."
94+
+ " Use option='min_score=...' for score-floor filtering.");
95+
}
96+
8197
QueryBuilder whereQuery;
8298
try {
8399
whereQuery = queryBuilder.build(queryCondition);
@@ -93,6 +109,16 @@ public boolean pushDownFilter(LogicalFilter filter) {
93109
filterPushed = true;
94110

95111
if (filterType == FilterType.EFFICIENT) {
112+
// knn.filter on AOSS/serverless vector collections rejects script queries. If any WHERE
113+
// subtree compiled to a ScriptQueryBuilder (arithmetic, function calls, CASE, date math),
114+
// refuse to embed it under knn.filter instead of shipping a request that will fail at the
115+
// cluster with an opaque error.
116+
if (containsScriptQuery(whereQuery)) {
117+
throw new ExpressionEvaluationException(
118+
"filter_type=efficient does not support predicates that compile to script queries"
119+
+ " (arithmetic, function calls, CASE, date math). Rewrite the WHERE clause to"
120+
+ " use comparable/term/range predicates, or omit filter_type.");
121+
}
96122
QueryBuilder rebuiltKnn = rebuildKnnWithFilter.apply(whereQuery);
97123
requestBuilder.getSourceBuilder().query(rebuiltKnn);
98124
} else {
@@ -105,6 +131,14 @@ public boolean pushDownFilter(LogicalFilter filter) {
105131

106132
@Override
107133
public boolean pushDownLimit(LogicalLimit limit) {
134+
// OFFSET on vectorSearch() silently rewrites the search window and drops top results, which
135+
// defeats the entire point of a relevance-ranked top-k query. The parent path would push
136+
// `from: <offset>` into the OpenSearch request; reject it explicitly so users get a clear
137+
// error instead of surprising result shifts.
138+
if (limit.getOffset() != null && limit.getOffset() != 0) {
139+
throw new ExpressionEvaluationException(
140+
"OFFSET is not supported on vectorSearch(). Remove OFFSET and use LIMIT only.");
141+
}
108142
validateLimitWithinK(limit.getLimit());
109143
limitPushed = true;
110144
return super.pushDownLimit(limit);
@@ -151,6 +185,74 @@ private void validateLimitWithinK(int limit) {
151185
}
152186
}
153187

188+
/**
189+
* Returns true if any subexpression is a ReferenceExpression whose attr is "_score". Uses the
190+
* standard ExpressionNodeVisitor so compound predicates (AND/OR/NOT, function calls, CASE) are
191+
* walked uniformly.
192+
*/
193+
private static boolean containsScoreReference(Expression expr) {
194+
AtomicBoolean found = new AtomicBoolean(false);
195+
expr.accept(
196+
new ExpressionNodeVisitor<Void, Void>() {
197+
@Override
198+
public Void visitReference(ReferenceExpression node, Void context) {
199+
// Case-insensitive match so _SCORE, _Score, and any quoted/backticked variant that
200+
// preserves original casing cannot bypass the guard and reach the cluster as a range
201+
// query on a non-existent field.
202+
if (node.getAttr() != null && "_score".equalsIgnoreCase(node.getAttr())) {
203+
found.set(true);
204+
}
205+
return null;
206+
}
207+
},
208+
null);
209+
return found.get();
210+
}
211+
212+
/**
213+
* Recursively scans a QueryBuilder tree for any ScriptQueryBuilder. Handles the common wrappers
214+
* that FilterQueryBuilder produces: BoolQueryBuilder (must/should/mustNot/filter) and
215+
* ConstantScoreQueryBuilder. Other QueryBuilder subtypes are leaves for our purposes: if the
216+
* top-level builder itself is a ScriptQueryBuilder we catch it, otherwise we treat it as
217+
* script-free.
218+
*/
219+
private static boolean containsScriptQuery(QueryBuilder qb) {
220+
if (qb == null) {
221+
return false;
222+
}
223+
if (qb instanceof ScriptQueryBuilder) {
224+
return true;
225+
}
226+
if (qb instanceof BoolQueryBuilder) {
227+
BoolQueryBuilder bool = (BoolQueryBuilder) qb;
228+
for (QueryBuilder child : bool.must()) {
229+
if (containsScriptQuery(child)) {
230+
return true;
231+
}
232+
}
233+
for (QueryBuilder child : bool.filter()) {
234+
if (containsScriptQuery(child)) {
235+
return true;
236+
}
237+
}
238+
for (QueryBuilder child : bool.should()) {
239+
if (containsScriptQuery(child)) {
240+
return true;
241+
}
242+
}
243+
for (QueryBuilder child : bool.mustNot()) {
244+
if (containsScriptQuery(child)) {
245+
return true;
246+
}
247+
}
248+
return false;
249+
}
250+
if (qb instanceof ConstantScoreQueryBuilder) {
251+
return containsScriptQuery(((ConstantScoreQueryBuilder) qb).innerQuery());
252+
}
253+
return false;
254+
}
255+
154256
@Override
155257
public OpenSearchRequestBuilder build() {
156258
if (filterTypeExplicit && !filterPushed) {

opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/VectorSearchQueryBuilderTest.java

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -594,6 +594,163 @@ void pushDownFilterNonPushdownableWithoutExplicitFilterTypeFallsBack() {
594594
assertFalse(pushed, "Non-pushdownable filter should return false for in-memory fallback");
595595
}
596596

597+
// ── OFFSET rejection ────────────────────────────────────────────────
598+
599+
@Test
600+
void pushDownLimit_rejectsNonZeroOffset() {
601+
var requestBuilder = createRequestBuilder();
602+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
603+
var builder = new VectorSearchQueryBuilder(requestBuilder, knnQuery, Map.of("k", "5"));
604+
605+
var dummyChild = new LogicalValues(Collections.emptyList());
606+
// LIMIT 3 OFFSET 2: the planner passes both through LogicalLimit
607+
var limit = new LogicalLimit(dummyChild, 3, 2);
608+
609+
ExpressionEvaluationException ex =
610+
assertThrows(ExpressionEvaluationException.class, () -> builder.pushDownLimit(limit));
611+
assertTrue(
612+
ex.getMessage().contains("OFFSET is not supported on vectorSearch()"),
613+
"Expected OFFSET rejection message, got: " + ex.getMessage());
614+
assertTrue(
615+
ex.getMessage().contains("LIMIT only"),
616+
"Expected remediation guidance in message, got: " + ex.getMessage());
617+
}
618+
619+
@Test
620+
void pushDownLimit_acceptsZeroOffset() {
621+
var requestBuilder = createRequestBuilder();
622+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
623+
var builder = new VectorSearchQueryBuilder(requestBuilder, knnQuery, Map.of("k", "5"));
624+
625+
var dummyChild = new LogicalValues(Collections.emptyList());
626+
var limit = new LogicalLimit(dummyChild, 3, 0);
627+
628+
// Zero offset is the normal case; must continue to succeed.
629+
assertTrue(builder.pushDownLimit(limit));
630+
}
631+
632+
// ── WHERE on _score rejection ────────────────────────────────────────
633+
634+
@Test
635+
void pushDownFilter_rejectsScoreReferenceInWhere() {
636+
var requestBuilder = createRequestBuilder();
637+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
638+
var builder = new VectorSearchQueryBuilder(requestBuilder, knnQuery, Map.of("k", "5"));
639+
640+
// WHERE _score > 0.5 (note: _score is a synthetic column, not a stored field)
641+
var condition =
642+
DSL.greater(new ReferenceExpression("_score", ExprCoreType.FLOAT), DSL.literal(0.5));
643+
var dummyChild = new LogicalValues(Collections.emptyList());
644+
var filter = new LogicalFilter(dummyChild, condition);
645+
646+
ExpressionEvaluationException ex =
647+
assertThrows(ExpressionEvaluationException.class, () -> builder.pushDownFilter(filter));
648+
assertTrue(
649+
ex.getMessage().contains("WHERE on _score is not supported"),
650+
"Expected _score rejection message, got: " + ex.getMessage());
651+
assertTrue(
652+
ex.getMessage().contains("min_score"),
653+
"Expected remediation guidance pointing at option='min_score=...', got: "
654+
+ ex.getMessage());
655+
}
656+
657+
@Test
658+
void pushDownFilter_rejectsScoreReferenceInsideCompound() {
659+
var requestBuilder = createRequestBuilder();
660+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
661+
var builder = new VectorSearchQueryBuilder(requestBuilder, knnQuery, Map.of("k", "5"));
662+
663+
// WHERE state = 'TX' AND _score > 0.5: rejection must walk compound predicates
664+
var condition =
665+
DSL.and(
666+
DSL.equal(new ReferenceExpression("state", STRING), DSL.literal("TX")),
667+
DSL.greater(new ReferenceExpression("_score", ExprCoreType.FLOAT), DSL.literal(0.5)));
668+
var dummyChild = new LogicalValues(Collections.emptyList());
669+
var filter = new LogicalFilter(dummyChild, condition);
670+
671+
ExpressionEvaluationException ex =
672+
assertThrows(ExpressionEvaluationException.class, () -> builder.pushDownFilter(filter));
673+
assertTrue(
674+
ex.getMessage().contains("WHERE on _score is not supported"),
675+
"Expected _score rejection message, got: " + ex.getMessage());
676+
}
677+
678+
@Test
679+
void pushDownFilter_rejectsUppercaseScoreReference() {
680+
var requestBuilder = createRequestBuilder();
681+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
682+
var builder = new VectorSearchQueryBuilder(requestBuilder, knnQuery, Map.of("k", "5"));
683+
684+
// WHERE _SCORE > 0.5 must be rejected the same way as _score; the check is case-insensitive
685+
// so variants that preserve original casing cannot bypass the guard.
686+
var condition =
687+
DSL.greater(new ReferenceExpression("_SCORE", ExprCoreType.FLOAT), DSL.literal(0.5));
688+
var dummyChild = new LogicalValues(Collections.emptyList());
689+
var filter = new LogicalFilter(dummyChild, condition);
690+
691+
ExpressionEvaluationException ex =
692+
assertThrows(ExpressionEvaluationException.class, () -> builder.pushDownFilter(filter));
693+
assertTrue(
694+
ex.getMessage().contains("WHERE on _score is not supported"),
695+
"Expected _score rejection message, got: " + ex.getMessage());
696+
}
697+
698+
// ── filter_type=efficient rejects script subtrees ───────────────────
699+
700+
@Test
701+
void pushDownFilter_efficient_rejectsScriptSubtree() {
702+
var requestBuilder = createRequestBuilder();
703+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
704+
Function<QueryBuilder, QueryBuilder> rebuildWithFilter =
705+
whereQuery -> new WrapperQueryBuilder("{\"knn\":{\"filter\":\"embedded\"}}");
706+
var builder =
707+
new VectorSearchQueryBuilder(
708+
requestBuilder,
709+
knnQuery,
710+
Map.of("k", "5"),
711+
FilterType.EFFICIENT,
712+
true,
713+
rebuildWithFilter);
714+
715+
// WHERE price + 1 > 100: the outer > is not a direct field ref, so FilterQueryBuilder
716+
// falls through to buildScriptQuery() and produces a ScriptQueryBuilder. Under filter_type=
717+
// efficient that would be embedded under knn.filter, which AOSS rejects.
718+
var condition =
719+
DSL.greater(
720+
DSL.add(new ReferenceExpression("price", ExprCoreType.INTEGER), DSL.literal(1)),
721+
DSL.literal(100));
722+
var dummyChild = new LogicalValues(Collections.emptyList());
723+
var filter = new LogicalFilter(dummyChild, condition);
724+
725+
ExpressionEvaluationException ex =
726+
assertThrows(ExpressionEvaluationException.class, () -> builder.pushDownFilter(filter));
727+
assertTrue(
728+
ex.getMessage().contains("filter_type=efficient does not support"),
729+
"Expected script rejection message, got: " + ex.getMessage());
730+
assertTrue(
731+
ex.getMessage().contains("script queries"),
732+
"Expected script queries guidance in message, got: " + ex.getMessage());
733+
}
734+
735+
@Test
736+
void pushDownFilter_post_allowsScriptSubtree() {
737+
// Under POST mode, script queries are fine: the WHERE lives in an outer bool.filter
738+
// that OpenSearch evaluates against hits, not inside knn.filter. Verify we do not
739+
// false-positive on the POST branch.
740+
var requestBuilder = createRequestBuilder();
741+
var knnQuery = new WrapperQueryBuilder("{\"knn\":{}}");
742+
var builder = new VectorSearchQueryBuilder(requestBuilder, knnQuery, Map.of("k", "5"));
743+
744+
var condition =
745+
DSL.greater(
746+
DSL.add(new ReferenceExpression("price", ExprCoreType.INTEGER), DSL.literal(1)),
747+
DSL.literal(100));
748+
var dummyChild = new LogicalValues(Collections.emptyList());
749+
var filter = new LogicalFilter(dummyChild, condition);
750+
751+
assertTrue(builder.pushDownFilter(filter), "POST mode must still accept script predicates");
752+
}
753+
597754
@Test
598755
void buildSucceedsRadialWithSortEmbeddedLimit() {
599756
var requestBuilder = createRequestBuilder();

0 commit comments

Comments
 (0)