Skip to content

Commit c8bea4b

Browse files
committed
[Feature] Mark vectorSearch() experimental and default to efficient filtering
Mark the vectorSearch() table function as experimental in the user doc following the repo convention (title [Experimental] suffix), and flip the default WHERE filter placement from post-filtering to efficient pre-filtering so a query without filter_type embeds the predicate under knn.filter for ANN-time pruning. Production code: filterType=null in VectorSearchIndex now resolves to FilterType.EFFICIENT, and VectorSearchQueryBuilder's full constructor defaults to EFFICIENT when passed null. The test-only 3-arg constructor stays pinned to POST because it does not wire a rebuildKnnWithFilter callback and EFFICIENT mode requires one. Allow-list error messages are reworded to neutral wording ("vectorSearch WHERE pre-filtering does not support...") so default-path users never see internal filter_type=efficient terminology and get a clear "set filter_type=post" fallback hint. Doc updates the Filtering section to describe Omitted=efficient as the default, with post framed as the opt-in fallback for predicates outside the efficient allow-list. Example 4 shows the default knn.filter shape; Example 5 shows filter_type=post for arithmetic predicates. Tests: BETWEEN / NOT IN regression guards pin filter_type=post explicitly so they continue to assert the post-filter DSL shape. testPostFilterReturnsOnlyMatchingDocs pins filter_type=post so the test name still reflects what it exercises. New default-shape IT coverage asserts knn.filter embeds the predicate and there is no outer bool wrapping. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
1 parent c06c472 commit c8bea4b

7 files changed

Lines changed: 131 additions & 120 deletions

File tree

docs/user/dql/vector-search.rst

Lines changed: 43 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

2-
=============
3-
Vector Search
4-
=============
2+
==============================
3+
Vector Search [Experimental]
4+
==============================
55

66
.. rubric:: Table of contents
77

@@ -12,6 +12,9 @@ Vector Search
1212
Introduction
1313
============
1414

15+
``vectorSearch()`` is an experimental feature. Syntax, options, and
16+
pushdown behavior may change in future releases based on feedback.
17+
1518
The ``vectorSearch()`` table function runs a k-NN query against a ``knn_vector``
1619
field and exposes the matching documents as a relation in the ``FROM`` clause.
1720
It relies on the OpenSearch `k-NN plugin
@@ -170,42 +173,45 @@ A ``WHERE`` clause on non-vector fields of the ``vectorSearch()`` alias is
170173
pushed down to OpenSearch when it can be translated to an OpenSearch filter.
171174
Two placement strategies are available via the ``filter_type`` option:
172175

173-
- ``post`` — the ``WHERE`` predicate is applied as a non-scoring
176+
- ``efficient`` (default): the ``WHERE`` predicate is embedded directly
177+
inside the k-NN query (``knn.filter``), enabling pre-filtering during
178+
the ANN search. See the `k-NN filtering guide
179+
<https://docs.opensearch.org/latest/vector-search/filter-search-knn/efficient-knn-filtering/>`_
180+
for engine and method requirements.
181+
- ``post``: the ``WHERE`` predicate is applied as a non-scoring
174182
``bool.filter`` alongside the k-NN query. The k-NN query runs first and
175183
its results are then filtered.
176-
- ``efficient`` — the ``WHERE`` predicate is embedded directly inside the
177-
k-NN query (``knn.filter``), enabling pre-filtering during the ANN search.
178-
See the `k-NN filtering guide <https://docs.opensearch.org/latest/vector-search/filter-search-knn/efficient-knn-filtering/>`_
179-
for engine and method requirements.
180184

181185
Behavior depends on whether ``filter_type`` is specified:
182186

183-
- **Omitted** — pushdown is attempted using the ``post`` placement.
184-
Predicates that translate to native OpenSearch queries are pushed down as a
185-
``bool.filter`` alongside the k-NN query. Predicates that do not have a
186-
native equivalent (for example, arithmetic or function calls on indexed
187-
fields) are pushed down as an OpenSearch script query and evaluated
188-
server-side. Only when predicate translation itself fails does the engine
189-
fall back to evaluating the ``WHERE`` clause in memory after the k-NN
190-
results are returned. A query with no ``WHERE`` clause is valid.
191-
- **Explicit ``post``** — a ``WHERE`` clause is required and must be
192-
translatable to an OpenSearch filter query. If the ``WHERE`` clause is
193-
missing or cannot be translated, the query fails with an error.
194-
Specifying ``filter_type=post`` explicitly is useful when the query
195-
should fail with an error instead of silently falling back to
196-
in-memory filtering.
197-
- **Explicit ``efficient``** — a ``WHERE`` clause is required and must
198-
compile to a filter shape that can be embedded under ``knn.filter``.
187+
- **Omitted (default, ``efficient``)**: the ``WHERE`` predicate is
188+
embedded under ``knn.filter`` so the k-NN engine pre-filters candidates
189+
during the ANN search. A query with no ``WHERE`` clause is valid.
199190
``efficient`` supports simple native filters: ``term``, ``range``,
200191
``wildcard``, ``exists``, full-text family (``match``, ``match_phrase``,
201192
``match_phrase_prefix``, ``match_bool_prefix``, ``multi_match``,
202193
``query_string``, ``simple_query_string``), and boolean combinations of
203194
those filters. Predicates that compile to script queries (arithmetic,
204-
function calls, ``CASE``, date math), nested predicates, and other
205-
query shapes are not supported in this mode and return an error.
195+
function calls on indexed fields, ``CASE``, date math), nested
196+
predicates, and other query shapes are not supported under
197+
``knn.filter`` and return an error. Set ``filter_type=post`` to apply
198+
such predicates after the k-NN search.
199+
- **Explicit ``efficient``**: same contract as the default. Specifying
200+
it is equivalent and is useful when a query should be explicit about
201+
the placement strategy.
202+
- **Explicit ``post``**: a ``WHERE`` clause is required and must be
203+
translatable to an OpenSearch filter query. Predicates that translate
204+
to native OpenSearch queries are pushed down as a ``bool.filter``
205+
alongside the k-NN query. Predicates that do not have a native
206+
equivalent (for example, arithmetic or function calls on indexed
207+
fields) are pushed down as an OpenSearch script query and evaluated
208+
server-side. Only when predicate translation itself fails does the
209+
engine fall back to evaluating the ``WHERE`` clause in memory after
210+
the k-NN results are returned. Use ``filter_type=post`` when the
211+
predicate shape is not supported by ``efficient`` pre-filtering.
206212

207-
Example 4: Implicit pushdown (no ``filter_type``)
208-
-------------------------------------------------
213+
Example 4: Default pre-filtering (no ``filter_type``)
214+
-----------------------------------------------------
209215

210216
::
211217

@@ -223,10 +229,14 @@ Example 4: Implicit pushdown (no ``filter_type``)
223229
"""
224230
}
225231

226-
Example 5: Efficient (pre-)filtering
227-
------------------------------------
232+
The predicate is embedded under ``knn.filter`` so the k-NN engine
233+
pre-filters candidates during the ANN search.
228234

229-
::
235+
Example 5: Post-filtering fallback
236+
----------------------------------
237+
238+
Use ``filter_type=post`` for predicates that do not fit the ``efficient``
239+
allow-list, such as arithmetic or function calls on indexed fields::
230240

231241
POST /_plugins/_sql
232242
{
@@ -236,9 +246,9 @@ Example 5: Efficient (pre-)filtering
236246
table='my_vectors',
237247
field='embedding',
238248
vector='[0.1, 0.2, 0.3]',
239-
option='k=10,filter_type=efficient'
249+
option='k=10,filter_type=post'
240250
) AS v
241-
WHERE v.category = 'books'
251+
WHERE v.price * 1.1 < 100
242252
"""
243253
}
244254

integ-test/src/test/java/org/opensearch/sql/sql/VectorSearchExecutionIT.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,14 +135,16 @@ public void testTopKReturnsNearestSortedByScore() throws IOException {
135135
public void testPostFilterReturnsOnlyMatchingDocs() throws IOException {
136136
// Query from cluster B with WHERE state='TX' forces POST filtering to surface TX docs
137137
// (cluster A) even though the vector is closer to cluster B. k=10 covers all 6 docs so
138-
// post-filtering to state='TX' deterministically yields exactly {1,2,3}.
138+
// post-filtering to state='TX' deterministically yields exactly {1,2,3}. filter_type=post
139+
// is specified explicitly because the default placement is EFFICIENT — this test
140+
// guarantees POST continues to work when the user opts into it.
139141
JSONObject result =
140142
executeJdbcRequest(
141143
"SELECT v._id, v._score "
142144
+ "FROM vectorSearch(table='"
143145
+ TEST_INDEX
144146
+ "', field='embedding', "
145-
+ "vector='[9.0, 9.0]', option='k=10') AS v "
147+
+ "vector='[9.0, 9.0]', option='k=10,filter_type=post') AS v "
146148
+ "WHERE v.state = 'TX' "
147149
+ "LIMIT 10");
148150

integ-test/src/test/java/org/opensearch/sql/sql/VectorSearchExplainIT.java

Lines changed: 47 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -164,10 +164,10 @@ public void testExplainRadialMinScoreProducesKnnQuery() throws IOException {
164164
"Radial without WHERE should not embed a filter:\n" + knnJson, knnJson.contains("filter"));
165165
}
166166

167-
// ── Post-filter DSL shape ────────────────────────────────────────────
167+
// ── Default (EFFICIENT) pre-filter DSL shape ────────────────────────
168168

169169
@Test
170-
public void testExplainPostFilterProducesBoolQuery() throws IOException {
170+
public void testExplainDefaultFilterProducesKnnWithFilter() throws IOException {
171171
String explain =
172172
explainQuery(
173173
"SELECT v._id, v._score "
@@ -178,39 +178,32 @@ public void testExplainPostFilterProducesBoolQuery() throws IOException {
178178
+ "WHERE v.state = 'TX' "
179179
+ "LIMIT 10");
180180

181-
// Post-filter shape: outer bool.must=[knn], bool.filter=[term] — WHERE lives OUTSIDE the knn
182-
// payload. Verify by decoding the wrapper and asserting the predicate field is NOT embedded.
181+
// Default (EFFICIENT) shape: WHERE is embedded inside knn.filter, the knn JSON is base64-
182+
// encoded inside a WrapperQueryBuilder, and there is no outer bool/must wrapping.
183183
String sourceBuilderJson = extractSourceBuilderJson(explain);
184-
assertTrue(
185-
"Explain should contain bool query:\n" + sourceBuilderJson,
184+
assertFalse(
185+
"Default EFFICIENT mode should not produce bool query:\n" + sourceBuilderJson,
186186
sourceBuilderJson.contains("\"bool\""));
187-
assertTrue(
188-
"Explain should contain must clause (knn in scoring context):\n" + sourceBuilderJson,
187+
assertFalse(
188+
"Default EFFICIENT mode should not contain must clause:\n" + sourceBuilderJson,
189189
sourceBuilderJson.contains("\"must\""));
190-
assertTrue(
191-
"Explain should contain filter clause (WHERE in non-scoring context):\n"
192-
+ sourceBuilderJson,
193-
sourceBuilderJson.contains("\"filter\""));
194-
assertTrue(
195-
"Explain should contain the outer state predicate:\n" + sourceBuilderJson,
196-
sourceBuilderJson.contains("\"state.keyword\""));
197190

198191
String knnJson = decodeSoleKnnJson(explain);
199192
assertTrue("knn JSON should contain knn key:\n" + knnJson, knnJson.contains("\"knn\""));
200193
assertTrue(
201194
"knn JSON should target the embedding field:\n" + knnJson,
202195
knnJson.contains("\"embedding\""));
203196
assertTrue("knn JSON should contain k=10:\n" + knnJson, knnJson.contains("\"k\":10"));
204-
assertFalse(
205-
"Post-filter mode must not embed the WHERE predicate inside knn:\n" + knnJson,
206-
knnJson.contains("state"));
207-
assertFalse(
208-
"Post-filter mode must not embed a filter inside knn:\n" + knnJson,
197+
assertTrue(
198+
"Default EFFICIENT mode must embed filter inside knn:\n" + knnJson,
209199
knnJson.contains("filter"));
200+
assertTrue(
201+
"Default EFFICIENT mode must embed the WHERE predicate inside knn:\n" + knnJson,
202+
knnJson.contains("state"));
210203
}
211204

212205
@Test
213-
public void testExplainCompoundPredicateProducesBoolQuery() throws IOException {
206+
public void testExplainDefaultCompoundPredicateProducesKnnWithFilter() throws IOException {
214207
String explain =
215208
explainQuery(
216209
"SELECT v._id, v._score "
@@ -221,45 +214,35 @@ public void testExplainCompoundPredicateProducesBoolQuery() throws IOException {
221214
+ "WHERE v.state = 'TX' AND v.age > 30 "
222215
+ "LIMIT 10");
223216

224-
// Compound post-filter still uses outer bool.must=[knn]/bool.filter=[predicates]. Both WHERE
225-
// predicates must stay outside the knn payload; otherwise efficient mode could false-positive.
217+
// Compound default-mode WHERE must also route through knn.filter: no outer bool/must, and
218+
// both predicate fields embedded inside the knn payload.
226219
String sourceBuilderJson = extractSourceBuilderJson(explain);
227-
assertTrue(
228-
"Explain should contain bool query:\n" + sourceBuilderJson,
220+
assertFalse(
221+
"Default EFFICIENT mode should not produce bool query:\n" + sourceBuilderJson,
229222
sourceBuilderJson.contains("\"bool\""));
230-
assertTrue(
231-
"Explain should contain must clause (knn in scoring context):\n" + sourceBuilderJson,
223+
assertFalse(
224+
"Default EFFICIENT mode should not contain must clause:\n" + sourceBuilderJson,
232225
sourceBuilderJson.contains("\"must\""));
233-
assertTrue(
234-
"Explain should contain filter clause (compound WHERE in non-scoring context):\n"
235-
+ sourceBuilderJson,
236-
sourceBuilderJson.contains("\"filter\""));
237-
assertTrue(
238-
"Explain should contain the outer state predicate:\n" + sourceBuilderJson,
239-
sourceBuilderJson.contains("\"state.keyword\""));
240-
assertTrue(
241-
"Explain should contain the outer age predicate:\n" + sourceBuilderJson,
242-
sourceBuilderJson.contains("\"age\""));
243226

244227
String knnJson = decodeSoleKnnJson(explain);
245228
assertTrue("knn JSON should contain knn key:\n" + knnJson, knnJson.contains("\"knn\""));
246229
assertTrue(
247230
"knn JSON should target the embedding field:\n" + knnJson,
248231
knnJson.contains("\"embedding\""));
249232
assertTrue("knn JSON should contain k=10:\n" + knnJson, knnJson.contains("\"k\":10"));
250-
assertFalse(
251-
"Compound post-filter must not embed the state predicate inside knn:\n" + knnJson,
233+
assertTrue(
234+
"Compound default EFFICIENT must embed filter inside knn:\n" + knnJson,
235+
knnJson.contains("filter"));
236+
assertTrue(
237+
"Compound default EFFICIENT must embed the state predicate inside knn:\n" + knnJson,
252238
knnJson.contains("state"));
253-
assertFalse(
254-
"Compound post-filter must not embed the age predicate inside knn:\n" + knnJson,
239+
assertTrue(
240+
"Compound default EFFICIENT must embed the age predicate inside knn:\n" + knnJson,
255241
knnJson.contains("age"));
256-
assertFalse(
257-
"Compound post-filter must not embed a filter inside knn:\n" + knnJson,
258-
knnJson.contains("filter"));
259242
}
260243

261244
@Test
262-
public void testExplainRadialWithWhereProducesBoolQuery() throws IOException {
245+
public void testExplainDefaultRadialWithWhereProducesKnnWithFilter() throws IOException {
263246
String explain =
264247
explainQuery(
265248
"SELECT v._id, v._score "
@@ -270,22 +253,15 @@ public void testExplainRadialWithWhereProducesBoolQuery() throws IOException {
270253
+ "WHERE v.state = 'TX' "
271254
+ "LIMIT 100");
272255

273-
// Radial + WHERE should also keep the WHERE predicate in the outer bool.filter rather than
274-
// embedding it into the radial knn payload.
256+
// Radial + default WHERE must also use the EFFICIENT shape: no outer bool/must, radial
257+
// parameters preserved inside the knn payload, and the WHERE predicate embedded alongside.
275258
String sourceBuilderJson = extractSourceBuilderJson(explain);
276-
assertTrue(
277-
"Explain should contain bool query:\n" + sourceBuilderJson,
259+
assertFalse(
260+
"Default EFFICIENT mode should not produce bool query:\n" + sourceBuilderJson,
278261
sourceBuilderJson.contains("\"bool\""));
279-
assertTrue(
280-
"Explain should contain must clause (knn in scoring context):\n" + sourceBuilderJson,
262+
assertFalse(
263+
"Default EFFICIENT mode should not contain must clause:\n" + sourceBuilderJson,
281264
sourceBuilderJson.contains("\"must\""));
282-
assertTrue(
283-
"Explain should contain filter clause (WHERE in non-scoring context):\n"
284-
+ sourceBuilderJson,
285-
sourceBuilderJson.contains("\"filter\""));
286-
assertTrue(
287-
"Explain should contain the outer state predicate:\n" + sourceBuilderJson,
288-
sourceBuilderJson.contains("\"state.keyword\""));
289265

290266
String knnJson = decodeSoleKnnJson(explain);
291267
assertTrue("knn JSON should contain knn key:\n" + knnJson, knnJson.contains("\"knn\""));
@@ -295,12 +271,12 @@ public void testExplainRadialWithWhereProducesBoolQuery() throws IOException {
295271
assertTrue(
296272
"knn JSON should contain max_distance=10.5:\n" + knnJson,
297273
knnJson.contains("\"max_distance\":10.5"));
298-
assertFalse(
299-
"Radial post-filter must not embed the WHERE predicate inside knn:\n" + knnJson,
300-
knnJson.contains("state"));
301-
assertFalse(
302-
"Radial post-filter must not embed a filter inside knn:\n" + knnJson,
274+
assertTrue(
275+
"Radial default EFFICIENT must embed filter inside knn:\n" + knnJson,
303276
knnJson.contains("filter"));
277+
assertTrue(
278+
"Radial default EFFICIENT must embed the WHERE predicate inside knn:\n" + knnJson,
279+
knnJson.contains("state"));
304280
}
305281

306282
// ── Sort + LIMIT explain ─────────────────────────────────────────────
@@ -456,13 +432,16 @@ public void testEfficientFilterWithOrderByScoreDescSucceeds() throws IOException
456432

457433
@Test
458434
public void testBetweenPushesAsRange() throws IOException {
435+
// Pin filter_type=post to keep the regression guard aimed at the post-filter serialization
436+
// path: these assertions lock in the outer bool/must/filter shape that only appears when
437+
// WHERE is applied alongside knn rather than embedded under knn.filter.
459438
String explain =
460439
explainQuery(
461440
"SELECT v._id, v._score "
462441
+ "FROM vectorSearch(table='"
463442
+ TEST_INDEX
464443
+ "', field='embedding', "
465-
+ "vector='[1.0, 2.0, 3.0]', option='k=10') AS v "
444+
+ "vector='[1.0, 2.0, 3.0]', option='k=10,filter_type=post') AS v "
466445
+ "WHERE v.balance BETWEEN 50 AND 200 "
467446
+ "LIMIT 10");
468447

@@ -513,13 +492,16 @@ public void testBetweenPushesAsRange() throws IOException {
513492

514493
@Test
515494
public void testNotInPushesAsMustNotTerms() throws IOException {
495+
// Pin filter_type=post to keep the regression guard aimed at the post-filter serialization
496+
// path: these assertions lock in the outer bool/must/filter shape that only appears when
497+
// WHERE is applied alongside knn rather than embedded under knn.filter.
516498
String explain =
517499
explainQuery(
518500
"SELECT v._id, v._score "
519501
+ "FROM vectorSearch(table='"
520502
+ TEST_INDEX
521503
+ "', field='embedding', "
522-
+ "vector='[1.0, 2.0, 3.0]', option='k=10') AS v "
504+
+ "vector='[1.0, 2.0, 3.0]', option='k=10,filter_type=post') AS v "
523505
+ "WHERE v.gender NOT IN ('M', 'F') "
524506
+ "LIMIT 10");
525507

integ-test/src/test/java/org/opensearch/sql/sql/VectorSearchIT.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,8 @@ public void testEfficientModeRejectsScriptPredicate() throws IOException {
340340
+ "WHERE v.age + 1 > 30 "
341341
+ "LIMIT 5"));
342342

343-
assertThat(ex.getMessage(), containsString("filter_type=efficient does not support"));
343+
assertThat(
344+
ex.getMessage(), containsString("vectorSearch WHERE pre-filtering does not support"));
344345
assertThat(ex.getMessage(), containsString("script queries"));
345346
}
346347

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/VectorSearchIndex.java

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ public class VectorSearchIndex extends OpenSearchIndex {
2828
private final String field;
2929
private final float[] vector;
3030
private final Map<String, String> options;
31-
private final FilterType filterType; // null means default (POST)
31+
private final FilterType filterType; // null means default (EFFICIENT)
3232
// Nullable for back-compat with existing tests and the non-vector-search constructor. When
3333
// present, the scan defers a lazy k-NN plugin probe to open() so execution fails fast with a
3434
// clear SQL error if the plugin is missing.
@@ -62,7 +62,10 @@ public VectorSearchIndex(
6262
this(client, settings, indexName, field, vector, options, filterType, null);
6363
}
6464

65-
/** Default constructor — preserves existing call sites; uses no explicit filter type. */
65+
/**
66+
* Default constructor — preserves existing call sites; uses no explicit filter type, so the scan
67+
* falls back to the default placement ({@link FilterType#EFFICIENT}).
68+
*/
6669
public VectorSearchIndex(
6770
OpenSearchClient client,
6871
Settings settings,
@@ -97,7 +100,7 @@ public TableScanBuilder createScanBuilder() {
97100
whereQuery -> new WrapperQueryBuilder(buildKnnQueryJson(whereQuery.toString()));
98101

99102
boolean filterTypeExplicit = filterType != null;
100-
FilterType effectiveFilterType = filterType != null ? filterType : FilterType.POST;
103+
FilterType effectiveFilterType = filterType != null ? filterType : FilterType.EFFICIENT;
101104

102105
var queryBuilder =
103106
new VectorSearchQueryBuilder(

0 commit comments

Comments
 (0)