Skip to content

Commit 1465de8

Browse files
committed
Add docvalue_fields support documentation for knn_vector fields
Add documentation for retrieving knn_vector fields using docvalue_fields, introduced in OpenSearch 3.7. This feature reads vectors directly from doc values instead of _source, providing up to 2x throughput improvement for JSON transport. Supports binary (base64, default) and array formats across all k-NN engines, data types, and compression levels with no reindexing required. - Add "Retrieve vectors using doc values" section to performance tuning - Add "Retrieving vector fields using docvalue_fields" section to retrieve specific fields guide - Update search API and gRPC search API references with knn_vector format details Signed-off-by: Navneet Verma <navneev@amazon.com>
1 parent e3e6897 commit 1465de8

4 files changed

Lines changed: 275 additions & 2 deletions

File tree

_api-reference/grpc-apis/search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ The [`SearchRequestBody`](https://github.com/opensearch-project/opensearch-proto
8585
| `highlight` | [`Highlight`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L1727) | Highlights matched terms in the result snippets. |
8686
| `track_total_hits` | [`TrackHits`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L252) | Whether to return the total hit count. |
8787
| `indices_boost` | `map<string, float>` | **Deprecated.** Use `indices_boost_2` instead. |
88-
| `docvalue_fields` | `repeated` [`FieldAndFormat`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L1964) | The fields returned using doc values. Optionally, this field can be formatted for readability. |
88+
| `docvalue_fields` | `repeated` [`FieldAndFormat`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L1964) | The fields returned using `doc_values`. Optionally, this field can be formatted for readability. For `knn_vector` fields, the supported formats are `binary` (default, returns base64-encoded vectors) and `array` (returns JSON numeric arrays). For more information, see [Retrieving vector fields using `docvalue_fields`]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/#retrieving-vector-fields-using-docvalue_fields). |
8989
| `min_score` | `float` | The minimum score required in order for a document to be included in the results. |
9090
| `post_filter` | [`QueryContainer`](#querycontainer-fields) | Filters hits after aggregations are applied. |
9191
| `profile` | `bool` | Enables profiling to analyze query performance. |

_api-reference/search-apis/search.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ All fields are optional.
9696
Field | Type | Description
9797
:--- | :--- | :---
9898
`aggs` | Object | In the optional `aggs` parameter, you can define any number of aggregations. Each aggregation is defined by its name and one of the types of aggregations that OpenSearch supports. For more information, see [Aggregations]({{site.url}}{{site.baseurl}}/aggregations/).
99-
`docvalue_fields` | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
99+
`docvalue_fields` | Array of objects | The fields that OpenSearch should return using their `doc_values` forms. Specify a format to return results in a certain format, such as date and time. For `knn_vector` fields, the supported formats are `binary` (default, returns base64-encoded vectors) and `array` (returns JSON numeric arrays). For more information, see [Retrieving vector fields using `docvalue_fields`]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/#retrieving-vector-fields-using-docvalue_fields).
100100
`fields` | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
101101
`explain` | String | Whether to return details about how OpenSearch computed the document's score. Default is `false`.
102102
`from` | Integer | The starting index to search from. Default is 0.

_search-plugins/searching-data/retrieve-specific-fields.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,157 @@ The response contains the `author` and `publication_date` fields:
255255
}
256256
}
257257
```
258+
<!-- vale off -->
259+
### Retrieving vector fields using docvalue_fields
260+
<!-- vale on -->
261+
**Introduced 3.7**
262+
{: .label .label-purple }
263+
264+
Use `docvalue_fields` to retrieve `knn_vector` fields directly from doc values, which avoids decompressing and deserializing the full `_source`. This is significantly faster when retrieving a large number of vectors in a single search request.
265+
266+
This feature supports all vector data types (`float`, `byte`, and `binary`), all compression levels, and all k-NN engines (Lucene, Faiss, and NMSLIB). You can use it on existing indexes without reindexing.
267+
{: .note}
268+
269+
For performance tuning guidance, see [Retrieve vectors using doc values]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning-search/#retrieve-vectors-using-doc-values).
270+
271+
The following output formats are supported:
272+
273+
| Format | Description |
274+
| :--- | :--- |
275+
| `binary` (default) | Returns vectors as base64-encoded little-endian byte strings. Provides approximately 2x throughput improvement over the `array` format for JSON transport and reduces response payload size by 30–40%. |
276+
| `array` | Returns vectors as JSON numeric arrays. |
277+
278+
The following example demonstrates how to retrieve a vector field using `docvalue_fields`.
279+
280+
1. Create an index with a `knn_vector` field:
281+
282+
```json
283+
PUT /my_vector_index
284+
{
285+
"settings": {
286+
"index.knn": true
287+
},
288+
"mappings": {
289+
"properties": {
290+
"my_vector": {
291+
"type": "knn_vector",
292+
"dimension": 4
293+
},
294+
"title": {
295+
"type": "text"
296+
}
297+
}
298+
}
299+
}
300+
```
301+
{% include copy-curl.html %}
302+
303+
2. Index a document:
304+
305+
```json
306+
POST /my_vector_index/_doc/1
307+
{
308+
"my_vector": [1.0, 2.0, 3.0, 4.0],
309+
"title": "Sample document"
310+
}
311+
```
312+
{% include copy-curl.html %}
313+
314+
3. Retrieve the vector using `docvalue_fields` with the default `binary` format:
315+
316+
```json
317+
POST /my_vector_index/_search
318+
{
319+
"_source": false,
320+
"docvalue_fields": ["my_vector"],
321+
"query": {
322+
"knn": {
323+
"my_vector": {
324+
"vector": [1.0, 2.0, 3.0, 4.0],
325+
"k": 5
326+
}
327+
}
328+
}
329+
}
330+
```
331+
{% include copy-curl.html %}
332+
333+
The response returns the vector as a base64-encoded string:
334+
335+
```json
336+
{
337+
"hits": {
338+
"hits": [
339+
{
340+
"_id": "1",
341+
"_score": 1.0,
342+
"fields": {
343+
"my_vector": ["AACAPwAAAEAAAEBAAACAQA=="]
344+
}
345+
}
346+
]
347+
}
348+
}
349+
```
350+
351+
4. To retrieve the vector as a JSON numeric array, specify the `array` format:
352+
353+
```json
354+
POST /my_vector_index/_search
355+
{
356+
"_source": false,
357+
"docvalue_fields": [{"field": "my_vector", "format": "array"}],
358+
"query": {
359+
"knn": {
360+
"my_vector": {
361+
"vector": [1.0, 2.0, 3.0, 4.0],
362+
"k": 5
363+
}
364+
}
365+
}
366+
}
367+
```
368+
{% include copy-curl.html %}
369+
370+
The response returns the vector as a numeric array:
371+
372+
```json
373+
{
374+
"hits": {
375+
"hits": [
376+
{
377+
"_id": "1",
378+
"_score": 1.0,
379+
"fields": {
380+
"my_vector": [[1.0, 2.0, 3.0, 4.0]]
381+
}
382+
}
383+
]
384+
}
385+
}
386+
```
387+
388+
To retrieve other document fields from `_source` while getting vectors through doc values, exclude the vector field from `_source`:
389+
390+
```json
391+
POST /my_vector_index/_search
392+
{
393+
"_source": {
394+
"excludes": ["my_vector"]
395+
},
396+
"docvalue_fields": [{"field": "my_vector", "format": "array"}],
397+
"query": {
398+
"knn": {
399+
"my_vector": {
400+
"vector": [1.0, 2.0, 3.0, 4.0],
401+
"k": 5
402+
}
403+
}
404+
}
405+
}
406+
```
407+
{% include copy-curl.html %}
408+
258409
<!-- vale off -->
259410
### Using docvalue_fields with nested objects
260411
<!-- vale on -->

_vector-search/performance-tuning-search.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,125 @@ GET /my-index/_search
9292
{% include copy-curl.html %}
9393

9494
For more information, see [Retrieve specific fields]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/).
95+
96+
## Retrieve vectors using doc values
97+
**Introduced 3.7**
98+
{: .label .label-purple }
99+
100+
Use `docvalue_fields` to retrieve vector fields directly from the on-disk columnar storage, which avoids decompressing and deserializing the full `_source`. This approach is significantly faster when retrieving a large number of vectors in a single search request.
101+
102+
This feature works with all k-NN engines (Lucene, Faiss, and NMSLIB), all vector data types (`float`, `byte`, and `binary`), and all compression levels. You can use it on existing indexes without reindexing.
103+
{: .note}
104+
105+
For best performance, exclude the vector field from `_source` by using `_source.excludes` or by setting `_source` to `false`. This ensures that OpenSearch reads vectors only from doc values and does not redundantly decompress them from the stored source.
106+
{: .tip}
107+
108+
### Supported formats
109+
110+
The following table describes the available output formats for vector doc values.
111+
112+
| Format | Description |
113+
| :--- | :--- |
114+
| `binary` (default) | Returns vectors as base64-encoded little-endian byte strings. Provides approximately 2x throughput improvement over `array` for JSON transport and reduces response payload size by 30–40%. |
115+
| `array` | Returns vectors as JSON numeric arrays. |
116+
117+
### Examples
118+
119+
The following example retrieves vectors using the default `binary` format:
120+
121+
```json
122+
GET /my-index/_search
123+
{
124+
"_source": false,
125+
"docvalue_fields": ["vector_field"],
126+
"query": {
127+
"knn": {
128+
"vector_field": {
129+
"vector": [0.1, 0.2, 0.3],
130+
"k": 10
131+
}
132+
}
133+
}
134+
}
135+
```
136+
{% include copy-curl.html %}
137+
138+
The response contains the vector as a base64-encoded string in the `fields` object:
139+
140+
```json
141+
{
142+
"hits": {
143+
"hits": [
144+
{
145+
"_index": "my-index",
146+
"_id": "1",
147+
"_score": 1.0,
148+
"fields": {
149+
"vector_field": ["zczMPc3MTD6amZk+"]
150+
}
151+
}
152+
]
153+
}
154+
}
155+
```
156+
157+
To return vectors as a JSON numeric array, specify the `array` format:
158+
159+
```json
160+
GET /my-index/_search
161+
{
162+
"_source": false,
163+
"docvalue_fields": [{"field": "vector_field", "format": "array"}],
164+
"query": {
165+
"knn": {
166+
"vector_field": {
167+
"vector": [0.1, 0.2, 0.3],
168+
"k": 10
169+
}
170+
}
171+
}
172+
}
173+
```
174+
{% include copy-curl.html %}
175+
176+
The response contains the vector as a numeric array:
177+
178+
```json
179+
{
180+
"hits": {
181+
"hits": [
182+
{
183+
"_index": "my-index",
184+
"_id": "1",
185+
"_score": 1.0,
186+
"fields": {
187+
"vector_field": [[0.1, 0.2, 0.3]]
188+
}
189+
}
190+
]
191+
}
192+
}
193+
```
194+
195+
To retrieve other document fields from `_source` while getting vectors through doc values, exclude the vector field from `_source`:
196+
197+
```json
198+
GET /my-index/_search
199+
{
200+
"_source": {
201+
"excludes": ["vector_field"]
202+
},
203+
"docvalue_fields": [{"field": "vector_field", "format": "array"}],
204+
"query": {
205+
"knn": {
206+
"vector_field": {
207+
"vector": [0.1, 0.2, 0.3],
208+
"k": 10
209+
}
210+
}
211+
}
212+
}
213+
```
214+
{% include copy-curl.html %}
215+
216+
For more information, see [Retrieving vector fields using docvalue_fields]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/#retrieving-vector-fields-using-docvalue_fields).

0 commit comments

Comments
 (0)