Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _api-reference/grpc-apis/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ The [`SearchRequestBody`](https://github.com/opensearch-project/opensearch-proto
| `highlight` | [`Highlight`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L1727) | Highlights matched terms in the result snippets. |
| `track_total_hits` | [`TrackHits`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L252) | Whether to return the total hit count. |
| `indices_boost` | `map<string, float>` | **Deprecated.** Use `indices_boost_2` instead. |
| `docvalue_fields` | `repeated` [`FieldAndFormat`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L1964) | The fields returned using doc values. Optionally, this field can be formatted for readability. |
| `docvalue_fields` | `repeated` [`FieldAndFormat`](https://github.com/opensearch-project/opensearch-protobufs/blob/1.4.0/protos/schemas/common.proto#L1964) | The fields to return in their `doc_values` form. You can include a format for the returned values (for example, a date format). For `knn_vector` fields, supported formats are `binary` (default, Base64-encoded) and `array` (JSON numeric arrays). For more information, see [Retrieving vector fields using `docvalue_fields`]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/#retrieving-vector-fields-using-docvalue_fields). |
| `min_score` | `float` | The minimum score required in order for a document to be included in the results. |
| `post_filter` | [`QueryContainer`](#querycontainer-fields) | Filters hits after aggregations are applied. |
| `profile` | `bool` | Enables profiling to analyze query performance. |
Expand Down
2 changes: 1 addition & 1 deletion _api-reference/search-apis/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ All fields are optional.
Field | Type | Description
:--- | :--- | :---
`aggs` | Object | In the optional `aggs` parameter, you can define any number of aggregations. Each aggregation is defined by its name and one of the types of aggregations that OpenSearch supports. For more information, see [Aggregations]({{site.url}}{{site.baseurl}}/aggregations/).
`docvalue_fields` | Array of objects | The fields that OpenSearch should return using their docvalue forms. Specify a format to return results in a certain format, such as date and time.
`docvalue_fields` | Array of objects | The fields to return in their `doc_values` form. You can include a format for the returned values (for example, a date format). For `knn_vector` fields, supported formats are `binary` (default, base64-encoded) and `array` (JSON numeric arrays). For more information, see [Retrieving vector fields using `docvalue_fields`]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/#retrieving-vector-fields-using-docvalue_fields).
`fields` | Array | The fields to search for in the request. Specify a format to return results in a certain format, such as date and time.
`explain` | String | Whether to return details about how OpenSearch computed the document's score. Default is `false`.
`from` | Integer | The starting index to search from. Default is 0.
Expand Down
153 changes: 153 additions & 0 deletions _search-plugins/searching-data/retrieve-specific-fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,9 +159,11 @@ GET /my_index/_search
{% include copy-curl.html %}

Additionally, you can use [most fields]({{site.url}}{{site.baseurl}}/query-dsl/full-text/multi-match/#most-fields) and [field aliases]({{site.url}}{{site.baseurl}}/mappings/supported-field-types/alias/) in the `fields` parameter because it queries both the document `_source` and `_mappings` of the index.

<!-- vale off -->
## Searching with docvalue_fields
<!-- vale on -->

To retrieve specific fields from the index, you can also use the `docvalue_fields` parameter. This parameter works slightly differently as compared to the `fields` parameter. It retrieves information from doc values rather than from the `_source` field, which is more efficient for fields that are not analyzed, like keyword, date, and numeric fields. Doc values have a columnar storage format optimized for efficient sorting and aggregations. It stores the values on disk in a way that is easy to read. When you use `docvalue_fields`, OpenSearch reads the values directly from this optimized storage format. It is useful for retrieving values of fields that are primarily used for sorting, aggregations, and for use in scripts.

The following example demonstrates how to use the `docvalue_fields` parameter.
Expand Down Expand Up @@ -255,9 +257,160 @@ The response contains the `author` and `publication_date` fields:
}
}
```
<!-- vale off -->
### Retrieving vector fields using docvalue_fields
<!-- vale on -->
**Introduced 3.7**
{: .label .label-purple }

You can retrieve `knn_vector` fields using `docvalue_fields` instead of the `_source`. This is faster because OpenSearch reads the vector directly from `doc_values` rather than parsing the full `_source` document.

Retrieving `knn_vector` fields from `doc_values` supports all vector data types (`float`, `byte`, and `binary`), all compression levels, and all k-NN engines (Lucene, Faiss, and NMSLIB). You can use it on existing indexes without reindexing.

For performance tuning guidance, see [Retrieve vectors using doc values]({{site.url}}{{site.baseurl}}/vector-search/performance-tuning-search/#retrieve-vectors-using-doc-values).

The following output formats are supported.

| Format | Description |
| :--- | :--- |
| `binary` (Default) | Returns vectors as Base64-encoded little-endian byte strings. Provides approximately 2x throughput improvement over the `array` format for JSON transport and reduces response payload size by 30--40%. |
| `array` | Returns vectors as JSON numeric arrays. |

To retrieve a vector field using `docvalue_fields`, follow these steps:

1. Create an index with a `knn_vector` field:

```json
PUT /my_vector_index
{
"settings": {
"index.knn": true
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 4
},
"title": {
"type": "text"
}
}
}
}
```
{% include copy-curl.html %}

2. Index a document:

```json
POST /my_vector_index/_doc/1
{
"my_vector": [1.0, 2.0, 3.0, 4.0],
"title": "Sample document"
}
```
{% include copy-curl.html %}

3. Retrieve the vector using `docvalue_fields` using the default `binary` format:

```json
POST /my_vector_index/_search
{
"_source": false,
"docvalue_fields": ["my_vector"],
"query": {
"knn": {
"my_vector": {
"vector": [1.0, 2.0, 3.0, 4.0],
"k": 5
}
}
}
}
```
{% include copy-curl.html %}

The response returns the vector as a Base64-encoded string:

```json
{
"hits": {
"hits": [
{
"_id": "1",
"_score": 1.0,
"fields": {
"my_vector": ["AACAPwAAAEAAAEBAAACAQA=="]
}
}
]
}
}
```

4. To retrieve the vector as a JSON numeric array, specify the `array` format:

```json
POST /my_vector_index/_search
{
"_source": false,
"docvalue_fields": [{"field": "my_vector", "format": "array"}],
"query": {
"knn": {
"my_vector": {
"vector": [1.0, 2.0, 3.0, 4.0],
"k": 5
}
}
}
}
```
{% include copy-curl.html %}

The response returns the vector as a numeric array:

```json
{
"hits": {
"hits": [
{
"_id": "1",
"_score": 1.0,
"fields": {
"my_vector": [[1.0, 2.0, 3.0, 4.0]]
}
}
]
}
}
```

To retrieve other document fields from the `_source` while retrieving vectors using `doc_values`, exclude the vector field from the `_source`:

```json
POST /my_vector_index/_search
{
"_source": {
"excludes": ["my_vector"]
},
"docvalue_fields": [{"field": "my_vector", "format": "array"}],
"query": {
"knn": {
"my_vector": {
"vector": [1.0, 2.0, 3.0, 4.0],
"k": 5
}
}
}
}
```
{% include copy-curl.html %}

<!-- vale off -->
### Using docvalue_fields with nested objects
<!-- vale on -->

In OpenSearch, if you want to retrieve doc values for nested objects, you cannot directly use the `docvalue_fields` parameter because it will return an empty array. Instead, you should use the `inner_hits` parameter with its own `docvalue_fields` property, as shown in the following example.

1. Define the index mappings:
Expand Down
11 changes: 11 additions & 0 deletions _vector-search/performance-tuning-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,14 @@ GET /my-index/_search
{% include copy-curl.html %}

For more information, see [Retrieve specific fields]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/).

## Retrieve vectors using doc values
**Introduced 3.7**
{: .label .label-purple }

Use `docvalue_fields` to retrieve vector fields directly from on-disk columnar storage, which avoids reading and parsing the full `_source` document. This approach is significantly faster when retrieving a large number of vectors in a single search request.

For best performance, exclude the vector field from the `_source` by using `_source.excludes` or by setting `_source` to `false`. This ensures that OpenSearch reads vectors only from `doc_values` and does not redundantly decompress them from the stored `_source`.
{: .tip}

For supported formats and examples, see [Retrieving vector fields using `docvalue_fields`]({{site.url}}{{site.baseurl}}/search-plugins/searching-data/retrieve-specific-fields/#retrieving-vector-fields-using-docvalue_fields).
Loading