Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 102 additions & 14 deletions docs-website/reference/haystack-api/document_stores_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Deserializes the component from a dictionary.
save_to_disk(path: str) -> None
```

Write the database and its' data to disk as a JSON file.
Write the database and its data to disk as a JSON file.

**Parameters:**

Expand All @@ -117,7 +117,7 @@ Write the database and its' data to disk as a JSON file.
load_from_disk(path: str) -> InMemoryDocumentStore
```

Load the database and its' data from disk as a JSON file.
Load the database and its data from disk as a JSON file.

**Parameters:**

Expand All @@ -133,7 +133,7 @@ Load the database and its' data from disk as a JSON file.
count_documents() -> int
```

Returns the number of how many documents are present in the DocumentStore.
Returns the number of documents present in the DocumentStore.

#### filter_documents

Expand All @@ -143,12 +143,10 @@ filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol
documentation.

**Parameters:**

- **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
- **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply. For a detailed specification of the filters, refer to the
[documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering).

**Returns:**

Expand Down Expand Up @@ -176,7 +174,7 @@ Deletes all documents with matching document_ids from the DocumentStore.

**Parameters:**

- **document_ids** (<code>list\[str\]</code>) – The object_ids to delete.
- **document_ids** (<code>list\[str\]</code>) – The document_ids to delete.

#### delete_all_documents

Expand Down Expand Up @@ -229,6 +227,98 @@ Deletes all documents that match the provided filters.

- <code>ValueError</code> – if filters have invalid syntax.

#### count_documents_by_filter

```python
count_documents_by_filter(filters: dict[str, Any]) -> int
```

Returns the number of documents that match the provided filters.

**Parameters:**

- **filters** (<code>dict\[str, Any\]</code>) – The filters to apply.
For a detailed specification of the filters, refer to the
[documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering).

**Returns:**

- <code>int</code> – The number of documents that match the filters.

#### count_unique_metadata_by_filter

```python
count_unique_metadata_by_filter(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]
```

Returns the number of unique values for each specified metadata field from documents matching the filters.

**Parameters:**

- **filters** (<code>dict\[str, Any\]</code>) – The filters to apply.
For a detailed specification of the filters, refer to the
[documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering).
- **metadata_fields** (<code>list\[str\]</code>) – List of field names to count unique values for.
Field names can include or omit the "meta." prefix.

**Returns:**

- <code>dict\[str, int\]</code> – A dictionary mapping each metadata field name (without "meta." prefix)
to the count of its unique values among the filtered documents.

#### get_metadata_fields_info

```python
get_metadata_fields_info() -> dict[str, dict[str, str]]
```

Returns information about the metadata fields present in the stored documents.

Types are inferred from the stored values (keyword, int, float, boolean).

**Returns:**

- <code>dict\[str, dict\[str, str\]\]</code> – A dictionary mapping each metadata field name to a dict with a "type" key.

#### get_metadata_field_min_max

```python
get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]
```

Returns the minimum and maximum values for the given metadata field across all documents.

**Parameters:**

- **metadata_field** (<code>str</code>) – The metadata field name. Can include or omit the "meta." prefix.

**Returns:**

- <code>dict\[str, Any\]</code> – A dictionary with "min" and "max" keys. Returns `{"min": None, "max": None}`
if the field is missing or has no values.

#### get_metadata_field_unique_values

```python
get_metadata_field_unique_values(
metadata_field: str, search_term: str | None = None
) -> tuple[list[str], int]
```

Returns unique values for a metadata field, optionally filtered by a search term in content.

**Parameters:**

- **metadata_field** (<code>str</code>) – The metadata field name. Can include or omit the "meta." prefix.
- **search_term** (<code>str | None</code>) – If set, only documents whose content contains this term (case-insensitive)
are considered.

**Returns:**

- <code>tuple\[list\[str\], int\]</code> – A tuple of (list of unique values, total count of unique values).

#### bm25_retrieval

```python
Expand Down Expand Up @@ -291,7 +381,7 @@ Retrieves documents that are most similar to the query embedding using a vector
count_documents_async() -> int
```

Returns the number of how many documents are present in the DocumentStore.
Returns the number of documents present in the DocumentStore.

#### filter_documents_async

Expand All @@ -301,12 +391,10 @@ filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol
documentation.

**Parameters:**

- **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply to the document list.
- **filters** (<code>dict\[str, Any\] | None</code>) – The filters to apply. For a detailed specification of the filters, refer to the
[documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering).

**Returns:**

Expand Down Expand Up @@ -334,7 +422,7 @@ Deletes all documents with matching document_ids from the DocumentStore.

**Parameters:**

- **document_ids** (<code>list\[str\]</code>) – The object_ids to delete.
- **document_ids** (<code>list\[str\]</code>) – The document_ids to delete.

#### bm25_retrieval_async

Expand Down