|
| 1 | + |
| 2 | +============================== |
| 3 | +Vector Search [Experimental] |
| 4 | +============================== |
| 5 | + |
| 6 | +.. rubric:: Table of contents |
| 7 | + |
| 8 | +.. contents:: |
| 9 | + :local: |
| 10 | + :depth: 2 |
| 11 | + |
| 12 | +Introduction |
| 13 | +============ |
| 14 | + |
| 15 | +``vectorSearch()`` is an experimental feature. Syntax, options, and |
| 16 | +pushdown behavior may change in future releases based on feedback. |
| 17 | + |
| 18 | +The ``vectorSearch()`` table function runs a k-NN query against a ``knn_vector`` |
| 19 | +field and exposes the matching documents as a relation in the ``FROM`` clause. |
| 20 | +It relies on the OpenSearch `k-NN plugin |
| 21 | +<https://docs.opensearch.org/latest/vector-search/>`_. The target index must |
| 22 | +map the vector field as ``knn_vector`` and the index must be created with |
| 23 | +``index.knn: true``. |
| 24 | + |
| 25 | +The SQL layer translates ``vectorSearch()`` into an OpenSearch search |
| 26 | +request whose body is native k-NN query DSL; the query vector is parsed |
| 27 | +into a numeric array before that DSL is emitted. |
| 28 | + |
| 29 | +Relevance is expressed through the OpenSearch ``_score`` metadata field, and |
| 30 | +results are returned ordered by ``_score DESC`` by default. |
| 31 | + |
| 32 | +vectorSearch |
| 33 | +============ |
| 34 | + |
| 35 | +Description |
| 36 | +----------- |
| 37 | + |
| 38 | +``vectorSearch(table='<index>', field='<vector-field>', vector='<array>', option='<key=value[,key=value]*>')`` |
| 39 | + |
| 40 | +All four arguments are required and must be passed by name as string |
| 41 | +literals. Positional arguments, or a mix of positional and named |
| 42 | +arguments, are not supported. For example, the following is invalid:: |
| 43 | + |
| 44 | + FROM vectorSearch('my_vectors', field='embedding', |
| 45 | + vector='[0.1,0.2]', option='k=5') AS v |
| 46 | + |
| 47 | +A table alias is required. Projected fields are referenced through the |
| 48 | +alias (``v._id``, ``v._score``, ``v.category``). |
| 49 | + |
| 50 | +If the ``opensearch-knn`` plugin is not installed on the target cluster, |
| 51 | +query execution fails with a ``vectorSearch() requires the k-NN plugin`` |
| 52 | +error. ``_explain`` continues to work without the plugin. |
| 53 | + |
| 54 | +Arguments |
| 55 | +--------- |
| 56 | + |
| 57 | +- ``table``: single concrete index or alias to search. Wildcards |
| 58 | + (``*``), comma-separated multi-index targets, ``_all``, ``.``, and |
| 59 | + ``..`` are not supported. The target index must have |
| 60 | + ``index.knn: true`` and map the target field as ``knn_vector``. A |
| 61 | + normal alias name is accepted. If the alias resolves to multiple |
| 62 | + backing indices, the SQL layer does not prevalidate that every |
| 63 | + backing index has a compatible ``knn_vector`` mapping, dimension, or |
| 64 | + engine; OpenSearch execution remains the source of truth for those |
| 65 | + checks. |
| 66 | +- ``field``: name of the ``knn_vector`` field. |
| 67 | +- ``vector``: query vector as a JSON-style array of numbers, passed as a |
| 68 | + string (for example, ``'[0.1, 0.2, 0.3]'``). Components must be |
| 69 | + comma-separated finite numbers. Semicolon, colon, and pipe separators |
| 70 | + are not supported, and empty components (for example, ``'[1.0,,2.0]'`` |
| 71 | + or ``'[1.0,]'``) return an error. The vector dimension must match the |
| 72 | + ``knn_vector`` mapping on the target index. |
| 73 | +- ``option``: comma-separated ``key=value`` pairs. Exactly one of ``k``, |
| 74 | + ``max_distance``, or ``min_score`` is required. ``filter_type`` is |
| 75 | + optional. |
| 76 | + |
| 77 | +Supported option keys |
| 78 | +--------------------- |
| 79 | + |
| 80 | +Option keys are lower-case and case-sensitive. ``K=5`` or |
| 81 | +``Filter_Type=post`` returns an "Unknown option key" error. |
| 82 | + |
| 83 | +- ``k``: top-k mode. Integer between 1 and 10000. The query returns up to |
| 84 | + ``k`` nearest neighbors. |
| 85 | +- ``max_distance``: radial mode. Non-negative number. Matches documents |
| 86 | + within the given distance of the query vector. ``LIMIT`` is required and |
| 87 | + caps the returned rows. |
| 88 | +- ``min_score``: radial mode. Non-negative number. Matches documents with |
| 89 | + score at or above the given threshold. ``LIMIT`` is required and caps |
| 90 | + the returned rows. |
| 91 | +- ``filter_type``: ``post`` or ``efficient``. Controls how a ``WHERE`` |
| 92 | + clause is applied. See `Filtering`_. |
| 93 | + |
| 94 | +``k``, ``max_distance``, and ``min_score`` are mutually exclusive; specify |
| 95 | +exactly one. |
| 96 | + |
| 97 | +Native k-NN tuning options (for example, ``method_parameters.ef_search``, |
| 98 | +``method_parameters.nprobes``, ``rescore.oversample_factor``) are not |
| 99 | +supported through ``vectorSearch()`` and return an "Unknown option |
| 100 | +key" error. |
| 101 | + |
| 102 | +Syntax |
| 103 | +------ |
| 104 | + |
| 105 | +:: |
| 106 | + |
| 107 | + SELECT <projection> |
| 108 | + FROM vectorSearch( |
| 109 | + table='<index>', |
| 110 | + field='<vector-field>', |
| 111 | + vector='<array>', |
| 112 | + option='<key=value[,key=value]*>' |
| 113 | + ) AS <alias> |
| 114 | + [WHERE <predicate on alias non-vector fields>] |
| 115 | + [ORDER BY <alias>._score DESC] |
| 116 | + [LIMIT <n>] |
| 117 | + |
| 118 | +Example 1: Top-k |
| 119 | +---------------- |
| 120 | + |
| 121 | +Return the five nearest neighbors of a query vector:: |
| 122 | + |
| 123 | + POST /_plugins/_sql |
| 124 | + { |
| 125 | + "query" : """ |
| 126 | + SELECT v._id, v._score |
| 127 | + FROM vectorSearch( |
| 128 | + table='my_vectors', |
| 129 | + field='embedding', |
| 130 | + vector='[0.1, 0.2, 0.3]', |
| 131 | + option='k=5' |
| 132 | + ) AS v |
| 133 | + """ |
| 134 | + } |
| 135 | + |
| 136 | +In top-k mode, the request size defaults to ``k``; adding ``LIMIT n`` further |
| 137 | +reduces the row count, but ``n`` must not exceed ``k``. |
| 138 | + |
| 139 | +Example 2: Radial search (``max_distance``) |
| 140 | +------------------------------------------- |
| 141 | + |
| 142 | +Return up to the specified ``LIMIT`` documents within a maximum distance |
| 143 | +of the query vector. ``LIMIT`` is required for radial searches; without |
| 144 | +it the result set would be unbounded:: |
| 145 | + |
| 146 | + POST /_plugins/_sql |
| 147 | + { |
| 148 | + "query" : """ |
| 149 | + SELECT v._id, v._score |
| 150 | + FROM vectorSearch( |
| 151 | + table='my_vectors', |
| 152 | + field='embedding', |
| 153 | + vector='[0.1, 0.2, 0.3]', |
| 154 | + option='max_distance=0.5' |
| 155 | + ) AS v |
| 156 | + LIMIT 100 |
| 157 | + """ |
| 158 | + } |
| 159 | + |
| 160 | +Example 3: Radial search (``min_score``) |
| 161 | +---------------------------------------- |
| 162 | + |
| 163 | +Return up to the specified ``LIMIT`` documents whose score is at or |
| 164 | +above the given threshold. ``LIMIT`` is required for radial searches; |
| 165 | +without it the result set would be unbounded:: |
| 166 | + |
| 167 | + POST /_plugins/_sql |
| 168 | + { |
| 169 | + "query" : """ |
| 170 | + SELECT v._id, v._score |
| 171 | + FROM vectorSearch( |
| 172 | + table='my_vectors', |
| 173 | + field='embedding', |
| 174 | + vector='[0.1, 0.2, 0.3]', |
| 175 | + option='min_score=0.8' |
| 176 | + ) AS v |
| 177 | + LIMIT 100 |
| 178 | + """ |
| 179 | + } |
| 180 | + |
| 181 | +Filtering |
| 182 | +========= |
| 183 | + |
| 184 | +A ``WHERE`` clause on non-vector fields of the ``vectorSearch()`` alias is |
| 185 | +pushed down to OpenSearch when it can be translated to an OpenSearch filter. |
| 186 | +Two placement strategies are available via the ``filter_type`` option: |
| 187 | + |
| 188 | +- ``efficient`` (default): the ``WHERE`` predicate is embedded directly |
| 189 | + inside the k-NN query (``knn.filter``), enabling native efficient |
| 190 | + k-NN filtering during vector search. Efficient filtering depends on |
| 191 | + native k-NN engine and method support; if the target index does not |
| 192 | + support ``knn.filter`` for the configured engine and method, set |
| 193 | + ``filter_type=post``. See the `k-NN filtering guide |
| 194 | + <https://docs.opensearch.org/latest/vector-search/filter-search-knn/efficient-knn-filtering/>`_ |
| 195 | + for engine and method requirements. |
| 196 | +- ``post``: the k-NN query is placed in a scoring (``bool.must``) |
| 197 | + context and the ``WHERE`` predicate is placed as a non-scoring |
| 198 | + ``bool.filter`` outside the k-NN clause. This is Boolean filter |
| 199 | + placement, not the REST ``post_filter`` parameter, and may return |
| 200 | + fewer than ``k`` rows when the filter is selective. |
| 201 | + |
| 202 | +Full-text predicates (``match``, ``match_phrase``, ``multi_match``, and |
| 203 | +the rest of the full-text family) under a ``WHERE`` clause are used as |
| 204 | +filters, not as hybrid keyword-vector score fusion. Their placement |
| 205 | +follows ``filter_type``: the default (``efficient``) embeds supported |
| 206 | +full-text predicates under ``knn.filter``, while ``post`` places them |
| 207 | +in ``bool.filter`` outside the k-NN clause. In both cases they restrict |
| 208 | +which candidates are retained but their text relevance score does not |
| 209 | +combine with the vector ``_score``. ``vectorSearch()`` is not a hybrid |
| 210 | +vector + text relevance scorer. |
| 211 | + |
| 212 | +Behavior depends on whether ``filter_type`` is specified: |
| 213 | + |
| 214 | +- **Omitted (default, ``efficient``)**: the ``WHERE`` predicate is |
| 215 | + embedded under ``knn.filter`` so the k-NN engine applies native |
| 216 | + efficient filtering during vector search. A query with no ``WHERE`` |
| 217 | + clause is valid. ``efficient`` supports simple native filters: |
| 218 | + ``term``, ``range``, ``wildcard``, ``exists``, full-text family |
| 219 | + (``match``, ``match_phrase``, ``match_phrase_prefix``, |
| 220 | + ``match_bool_prefix``, ``multi_match``, ``query_string``, |
| 221 | + ``simple_query_string``), and boolean combinations of those filters. |
| 222 | + Predicates that compile to script queries (arithmetic, function calls |
| 223 | + on indexed fields, ``CASE``, date math), nested predicates, and other |
| 224 | + query shapes are not supported under ``knn.filter`` and return an |
| 225 | + error. Set ``filter_type=post`` to apply such predicates after the |
| 226 | + k-NN search. If the predicate cannot be translated to an OpenSearch |
| 227 | + filter query at all (a distinct translation failure from the |
| 228 | + unsupported-shape cases above), the default path falls back to |
| 229 | + evaluating the ``WHERE`` clause in memory after the k-NN results are |
| 230 | + returned. |
| 231 | +- **Explicit ``efficient``**: same contract as the default. Specifying |
| 232 | + it is useful when a query should be explicit about the placement |
| 233 | + strategy and should fail if the predicate cannot be safely embedded |
| 234 | + under ``knn.filter``. |
| 235 | +- **Explicit ``post``**: a ``WHERE`` clause is required and must be |
| 236 | + translatable to an OpenSearch filter query. Predicates that translate |
| 237 | + to native OpenSearch queries are pushed down as a ``bool.filter`` |
| 238 | + alongside the k-NN query. Predicates that do not have a native |
| 239 | + equivalent (for example, arithmetic or function calls on indexed |
| 240 | + fields) are pushed down as an OpenSearch script query and evaluated |
| 241 | + server-side. If predicate translation itself fails, the query returns |
| 242 | + an error; there is no silent in-memory fallback under explicit |
| 243 | + ``post``. Use ``filter_type=post`` when the predicate shape is not |
| 244 | + supported by efficient filtering. |
| 245 | + |
| 246 | +Example 4: Default efficient filtering (no ``filter_type``) |
| 247 | +----------------------------------------------------------- |
| 248 | + |
| 249 | +:: |
| 250 | + |
| 251 | + POST /_plugins/_sql |
| 252 | + { |
| 253 | + "query" : """ |
| 254 | + SELECT v._id, v._score, v.category |
| 255 | + FROM vectorSearch( |
| 256 | + table='my_vectors', |
| 257 | + field='embedding', |
| 258 | + vector='[0.1, 0.2, 0.3]', |
| 259 | + option='k=10' |
| 260 | + ) AS v |
| 261 | + WHERE v.category = 'books' |
| 262 | + """ |
| 263 | + } |
| 264 | + |
| 265 | +The predicate is embedded under ``knn.filter`` so the k-NN engine |
| 266 | +applies native efficient filtering during vector search. |
| 267 | + |
| 268 | +Example 5: Post-filtering for predicates not supported by efficient mode |
| 269 | +------------------------------------------------------------------------ |
| 270 | + |
| 271 | +Use ``filter_type=post`` for predicates that do not fit the ``efficient`` |
| 272 | +allow-list, such as arithmetic or function calls on indexed fields:: |
| 273 | + |
| 274 | + POST /_plugins/_sql |
| 275 | + { |
| 276 | + "query" : """ |
| 277 | + SELECT v._id, v._score, v.category |
| 278 | + FROM vectorSearch( |
| 279 | + table='my_vectors', |
| 280 | + field='embedding', |
| 281 | + vector='[0.1, 0.2, 0.3]', |
| 282 | + option='k=10,filter_type=post' |
| 283 | + ) AS v |
| 284 | + WHERE v.price * 1.1 < 100 |
| 285 | + """ |
| 286 | + } |
| 287 | + |
| 288 | +Scoring, sorting, and limits |
| 289 | +============================ |
| 290 | + |
| 291 | +- ``vectorSearch()`` exposes the OpenSearch ``_score`` metadata field on the |
| 292 | + alias. For an alias ``v``, select it as ``v._score``. |
| 293 | +- ``_score`` can be selected and referenced in ``ORDER BY``, but it cannot |
| 294 | + appear in ``WHERE``. Use ``option='min_score=...'`` for score-threshold |
| 295 | + vector search. |
| 296 | +- Results are returned in ``_score DESC`` order by default. The only |
| 297 | + supported ``ORDER BY`` expression is ``<alias>._score DESC`` (for |
| 298 | + example, ``v._score DESC``). |
| 299 | +- In top-k mode (``k=N``), ``LIMIT n`` is optional; when present, ``n`` must |
| 300 | + be ``≤ k``. |
| 301 | +- In radial mode (``max_distance`` or ``min_score``), ``LIMIT`` is required. |
| 302 | +- ``OFFSET`` is not supported on ``vectorSearch()``. Use ``LIMIT`` only. |
| 303 | + |
| 304 | +Limitations |
| 305 | +=========== |
| 306 | + |
| 307 | +The following are not supported on ``vectorSearch()``: |
| 308 | + |
| 309 | +- ``GROUP BY`` and aggregations directly over a ``vectorSearch()`` |
| 310 | + relation are not supported and return an error. |
| 311 | +- Operators wrapped around a ``vectorSearch()`` subquery are rejected |
| 312 | + when they would run after ``vectorSearch()`` has already produced a |
| 313 | + finite result set, because they can silently yield zero, skipped, or |
| 314 | + incorrectly ordered rows. Specifically, an outer ``WHERE``, |
| 315 | + ``ORDER BY``, ``OFFSET`` (non-zero), ``GROUP BY``, aggregation, or |
| 316 | + ``DISTINCT`` applied to a ``vectorSearch()`` subquery returns an |
| 317 | + error. Place ``WHERE`` predicates inside the subquery, directly on |
| 318 | + the ``vectorSearch()`` alias, so that they participate in ``WHERE`` |
| 319 | + pushdown. A plain outer ``LIMIT`` (without ``OFFSET``) wrapping a |
| 320 | + ``vectorSearch()`` subquery is allowed and caps the returned rows. |
| 321 | +- ``JOIN`` between a ``vectorSearch()`` relation and another relation is |
| 322 | + not supported. |
| 323 | +- ``UNION`` / ``INTERSECT`` / ``EXCEPT`` combining a ``vectorSearch()`` |
| 324 | + relation with another relation is not supported. |
| 325 | +- Multiple ``vectorSearch()`` calls in the same query are not supported. |
| 326 | +- The query vector must be supplied as a literal. Parameterized vectors |
| 327 | + (for example, values bound from another column) are not supported. |
| 328 | +- Indexes that define a user field named ``_score`` cannot be queried |
| 329 | + with ``vectorSearch()`` because ``_score`` is reserved for the |
| 330 | + synthetic vector score exposed on the alias. Rename the field or query |
| 331 | + the index with a plain ``SELECT``. |
0 commit comments