diff --git a/docs-website/reference/integrations-api/mongodb_atlas.md b/docs-website/reference/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference/integrations-api/mongodb_atlas.md +++ b/docs-website/reference/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.19/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.19/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.19/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.19/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.20/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.20/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.20/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.20/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.21/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.21/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.21/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.21/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.22/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.22/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.22/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.22/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.23/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.23/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.23/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.23/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.24/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.24/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.24/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.24/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.25/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.25/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.25/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.25/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters diff --git a/docs-website/reference_versioned_docs/version-2.26/integrations-api/mongodb_atlas.md b/docs-website/reference_versioned_docs/version-2.26/integrations-api/mongodb_atlas.md index 552ebfacce..3aa6e43140 100644 --- a/docs-website/reference_versioned_docs/version-2.26/integrations-api/mongodb_atlas.md +++ b/docs-website/reference_versioned_docs/version-2.26/integrations-api/mongodb_atlas.md @@ -5,11 +5,8 @@ description: "MongoDB Atlas integration for Haystack" slug: "/integrations-mongodb-atlas" --- - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.embedding\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.embedding_retriever ### MongoDBAtlasEmbeddingRetriever @@ -20,6 +17,7 @@ during the creation of the index (i.e. cosine, dot product, or euclidean). See M information. Usage example: + ```python import numpy as np from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -39,125 +37,113 @@ The example above retrieves the 10 most similar documents to a random query embe MongoDBAtlasDocumentStore. Note that dimensions of the query_embedding must match the dimensions of the embeddings stored in the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasEmbeddingRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` Create the MongoDBAtlasDocumentStore component. -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `vector_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `vector_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. +**Raises:** - +- ValueError – If `document_store` is not an instance of `MongoDBAtlasDocumentStore`. -#### MongoDBAtlasEmbeddingRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasEmbeddingRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasEmbeddingRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasEmbeddingRetriever ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasEmbeddingRetriever – Deserialized component. -#### MongoDBAtlasEmbeddingRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -#### MongoDBAtlasEmbeddingRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query_embedding: list[float], - filters: dict[str, Any] | None = None, - top_k: int | None = None) -> dict[str, list[Document]] +run_async( + query_embedding: list[float], + filters: dict[str, Any] | None = None, + top_k: int | None = None, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore, based on the provided embedding - similarity. -**Arguments**: +**Parameters:** -- `query_embedding`: Embedding of the query. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. +- **query_embedding** (list\[float\]) – Embedding of the query. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int | None) – Maximum number of Documents to return. Overrides the value specified at initialization. -**Returns**: +**Returns:** -A dictionary with the following keys: +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query_embedding` - - -## Module haystack\_integrations.components.retrievers.mongodb\_atlas.full\_text\_retriever - - +## haystack_integrations.components.retrievers.mongodb_atlas.full_text_retriever ### MongoDBAtlasFullTextRetriever @@ -167,6 +153,7 @@ The full-text search is dependent on the full_text_search_index used in the Mong See MongoDBAtlasDocumentStore for more information. Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasFullTextRetriever @@ -184,155 +171,144 @@ print(results["documents"]) The example above retrieves the 10 most similar documents to the query "Lorem ipsum" from the MongoDBAtlasDocumentStore. - - -#### MongoDBAtlasFullTextRetriever.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - document_store: MongoDBAtlasDocumentStore, - filters: dict[str, Any] | None = None, - top_k: int = 10, - filter_policy: str | FilterPolicy = FilterPolicy.REPLACE) +__init__( + *, + document_store: MongoDBAtlasDocumentStore, + filters: dict[str, Any] | None = None, + top_k: int = 10, + filter_policy: str | FilterPolicy = FilterPolicy.REPLACE +) ``` -**Arguments**: - -- `document_store`: An instance of MongoDBAtlasDocumentStore. -- `filters`: Filters applied to the retrieved Documents. Make sure that the fields used in the filters are -included in the configuration of the `full_text_search_index`. The configuration must be done manually -in the Web UI of MongoDB Atlas. -- `top_k`: Maximum number of Documents to return. -- `filter_policy`: Policy to determine how filters are applied. +**Parameters:** -**Raises**: +- **document_store** (MongoDBAtlasDocumentStore) – An instance of MongoDBAtlasDocumentStore. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. Make sure that the fields used in the filters are + included in the configuration of the `full_text_search_index`. The configuration must be done manually + in the Web UI of MongoDB Atlas. +- **top_k** (int) – Maximum number of Documents to return. +- **filter_policy** (str | FilterPolicy) – Policy to determine how filters are applied. -- `ValueError`: If `document_store` is not an instance of MongoDBAtlasDocumentStore. +**Raises:** - +- ValueError – If `document_store` is not an instance of MongoDBAtlasDocumentStore. -#### MongoDBAtlasFullTextRetriever.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: - -Dictionary with serialized data. +**Returns:** - +- dict\[str, Any\] – Dictionary with serialized data. -#### MongoDBAtlasFullTextRetriever.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasFullTextRetriever" +from_dict(data: dict[str, Any]) -> MongoDBAtlasFullTextRetriever ``` Deserializes the component from a dictionary. -**Arguments**: +**Parameters:** -- `data`: Dictionary to deserialize from. +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -**Returns**: +**Returns:** -Deserialized component. +- MongoDBAtlasFullTextRetriever – Deserialized component. - - -#### MongoDBAtlasFullTextRetriever.run +#### run ```python -@component.output_types(documents=list[Document]) -def run(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -#### MongoDBAtlasFullTextRetriever.run\_async +#### run_async ```python -@component.output_types(documents=list[Document]) -async def run_async(query: str | list[str], - fuzzy: dict[str, int] | None = None, - match_criteria: Literal["any", "all"] | None = None, - score: dict[str, dict] | None = None, - synonyms: str | None = None, - filters: dict[str, Any] | None = None, - top_k: int = 10) -> dict[str, list[Document]] +run_async( + query: str | list[str], + fuzzy: dict[str, int] | None = None, + match_criteria: Literal["any", "all"] | None = None, + score: dict[str, dict] | None = None, + synonyms: str | None = None, + filters: dict[str, Any] | None = None, + top_k: int = 10, +) -> dict[str, list[Document]] ``` Asynchronously retrieve documents from the MongoDBAtlasDocumentStore by full-text search. -**Arguments**: - -- `query`: The query string or a list of query strings to search for. -If the query contains multiple terms, Atlas Search evaluates each term separately for matches. -- `fuzzy`: Enables finding strings similar to the search term(s). -Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, -and `maxExpansions`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `match_criteria`: Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. -For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `score`: Specifies the scoring method for matching results. Supported options include `boost`, `constant`, -and `function`. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/`fields`). -- `synonyms`: The name of the synonym mapping definition in the index. This value cannot be an empty string. -Note, `synonyms` can not be used with `fuzzy`. -- `filters`: Filters applied to the retrieved Documents. The way runtime filters are applied depends on -the `filter_policy` chosen at retriever initialization. See init method docstring for more -details. -- `top_k`: Maximum number of Documents to return. Overrides the value specified at initialization. - -**Returns**: - -A dictionary with the following keys: +**Parameters:** + +- **query** (str | list\[str\]) – The query string or a list of query strings to search for. + If the query contains multiple terms, Atlas Search evaluates each term separately for matches. +- **fuzzy** (dict\[str, int\] | None) – Enables finding strings similar to the search term(s). + Note, `fuzzy` cannot be used with `synonyms`. Configurable options include `maxEdits`, `prefixLength`, + and `maxExpansions`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **match_criteria** (Literal['any', 'all'] | None) – Defines how terms in the query are matched. Supported options are `"any"` and `"all"`. + For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **score** (dict\[str, dict\] | None) – Specifies the scoring method for matching results. Supported options include `boost`, `constant`, + and `function`. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/text/#fields). +- **synonyms** (str | None) – The name of the synonym mapping definition in the index. This value cannot be an empty string. + Note, `synonyms` can not be used with `fuzzy`. +- **filters** (dict\[str, Any\] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on + the `filter_policy` chosen at retriever initialization. See init method docstring for more + details. +- **top_k** (int) – Maximum number of Documents to return. Overrides the value specified at initialization. + +**Returns:** + +- dict\[str, list\[Document\]\] – A dictionary with the following keys: - `documents`: List of Documents most similar to the given `query` - - -## Module haystack\_integrations.document\_stores.mongodb\_atlas.document\_store - - +## haystack_integrations.document_stores.mongodb_atlas.document_store ### MongoDBAtlasDocumentStore @@ -360,6 +336,7 @@ For more details on MongoDB Atlas, see the official MongoDB Atlas [documentation](https://www.mongodb.com/docs/atlas/getting-started/). Usage example: + ```python from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore @@ -370,329 +347,291 @@ store = MongoDBAtlasDocumentStore(database_name="your_existing_db", print(store.count_documents()) ``` - - -#### MongoDBAtlasDocumentStore.\_\_init\_\_ +#### __init__ ```python -def __init__(*, - mongo_connection_string: Secret = Secret.from_env_var( - "MONGO_CONNECTION_STRING"), - database_name: str, - collection_name: str, - vector_search_index: str, - full_text_search_index: str, - embedding_field: str = "embedding", - content_field: str = "content") +__init__( + *, + mongo_connection_string: Secret = Secret.from_env_var( + "MONGO_CONNECTION_STRING" + ), + database_name: str, + collection_name: str, + vector_search_index: str, + full_text_search_index: str, + embedding_field: str = "embedding", + content_field: str = "content" +) ``` Creates a new MongoDBAtlasDocumentStore instance. -**Arguments**: - -- `mongo_connection_string`: MongoDB Atlas connection string in the format: -`"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. -This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. -This value will be read automatically from the env var "MONGO_CONNECTION_STRING". -- `database_name`: Name of the database to use. -- `collection_name`: Name of the collection to use. To use this document store for embedding retrieval, -this collection needs to have a vector search index set up on the `embedding` field. -- `vector_search_index`: The name of the vector search index to use for vector search operations. -Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB -Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/`std`-label-avs-create-index). -- `full_text_search_index`: The name of the search index to use for full-text search operations. -Create a full_text_search_index in the Atlas web UI and specify the init params of -MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas -[documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). -- `embedding_field`: The name of the field containing document embeddings. Default is "embedding". -- `content_field`: The name of the field containing the document content. Default is "content". -This field allows defining which field to load into the Haystack Document object as content. -It can be particularly useful when integrating with an existing collection for retrieval. We discourage -using this parameter when working with collections created by Haystack. - -**Raises**: - -- `ValueError`: If the collection name contains invalid characters. - - +**Parameters:** -#### MongoDBAtlasDocumentStore.\_\_del\_\_ - -```python -def __del__() -> None -``` +- **mongo_connection_string** (Secret) – MongoDB Atlas connection string in the format: + `"mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"`. + This can be obtained on the MongoDB Atlas Dashboard by clicking on the `CONNECT` button. + This value will be read automatically from the env var "MONGO_CONNECTION_STRING". +- **database_name** (str) – Name of the database to use. +- **collection_name** (str) – Name of the collection to use. To use this document store for embedding retrieval, + this collection needs to have a vector search index set up on the `embedding` field. +- **vector_search_index** (str) – The name of the vector search index to use for vector search operations. + Create a vector_search_index in the Atlas web UI and specify the init params of MongoDBAtlasDocumentStore. For more details refer to MongoDB + Atlas [documentation](https://www.mongodb.com/docs/atlas/atlas-vector-search/create-index/#std-label-avs-create-index). +- **full_text_search_index** (str) – The name of the search index to use for full-text search operations. + Create a full_text_search_index in the Atlas web UI and specify the init params of + MongoDBAtlasDocumentStore. For more details refer to MongoDB Atlas + [documentation](https://www.mongodb.com/docs/atlas/atlas-search/create-index/). +- **embedding_field** (str) – The name of the field containing document embeddings. Default is "embedding". +- **content_field** (str) – The name of the field containing the document content. Default is "content". + This field allows defining which field to load into the Haystack Document object as content. + It can be particularly useful when integrating with an existing collection for retrieval. We discourage + using this parameter when working with collections created by Haystack. -Destructor method to close MongoDB connections when the instance is destroyed. +**Raises:** - +- ValueError – If the collection name contains invalid characters. -#### MongoDBAtlasDocumentStore.to\_dict +#### to_dict ```python -def to_dict() -> dict[str, Any] +to_dict() -> dict[str, Any] ``` Serializes the component to a dictionary. -**Returns**: +**Returns:** -Dictionary with serialized data. +- dict\[str, Any\] – Dictionary with serialized data. - - -#### MongoDBAtlasDocumentStore.from\_dict +#### from_dict ```python -@classmethod -def from_dict(cls, data: dict[str, Any]) -> "MongoDBAtlasDocumentStore" +from_dict(data: dict[str, Any]) -> MongoDBAtlasDocumentStore ``` Deserializes the component from a dictionary. -**Arguments**: - -- `data`: Dictionary to deserialize from. +**Parameters:** -**Returns**: +- **data** (dict\[str, Any\]) – Dictionary to deserialize from. -Deserialized component. +**Returns:** - +- MongoDBAtlasDocumentStore – Deserialized component. -#### MongoDBAtlasDocumentStore.count\_documents +#### count_documents ```python -def count_documents() -> int +count_documents() -> int ``` Returns how many documents are present in the document store. -**Returns**: +**Returns:** -The number of documents in the document store. +- int – The number of documents in the document store. - - -#### MongoDBAtlasDocumentStore.count\_documents\_async +#### count_documents_async ```python -async def count_documents_async() -> int +count_documents_async() -> int ``` Asynchronously returns how many documents are present in the document store. -**Returns**: - -The number of documents in the document store. +**Returns:** - +- int – The number of documents in the document store. -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter +#### count_documents_by_filter ```python -def count_documents_by_filter(filters: dict[str, Any]) -> int +count_documents_by_filter(filters: dict[str, Any]) -> int ``` Applies a filter and counts the documents that matched it. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to the document list. +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -**Returns**: +**Returns:** -The number of documents that match the filter. +- int – The number of documents that match the filter. - - -#### MongoDBAtlasDocumentStore.count\_documents\_by\_filter\_async +#### count_documents_by_filter_async ```python -async def count_documents_by_filter_async(filters: dict[str, Any]) -> int +count_documents_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously applies a filter and counts the documents that matched it. -**Arguments**: - -- `filters`: The filters to apply to the document list. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. -The number of documents that match the filter. +**Returns:** - +- int – The number of documents that match the filter. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter +#### count_unique_metadata_by_filter ```python -def count_unique_metadata_by_filter( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Applies a filter selecting documents and counts the unique values for each meta field of the matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.count\_unique\_metadata\_by\_filter\_async +#### count_unique_metadata_by_filter_async ```python -async def count_unique_metadata_by_filter_async( - filters: dict[str, Any], metadata_fields: list[str]) -> dict[str, int] +count_unique_metadata_by_filter_async( + filters: dict[str, Any], metadata_fields: list[str] +) -> dict[str, int] ``` Asynchronously applies a filter selecting documents and counts the unique values for each meta field of the - matched documents. -**Arguments**: - -- `filters`: The filters to apply to the document list. -- `metadata_fields`: The metadata fields to count unique values for. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to the document list. +- **metadata_fields** (list\[str\]) – The metadata fields to count unique values for. -A dictionary where the keys are the metadata field names and the values are the count of unique -values. +**Returns:** - +- dict\[str, int\] – A dictionary where the keys are the metadata field names and the values are the count of unique + values. -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info +#### get_metadata_fields_info ```python -def get_metadata_fields_info() -> dict[str, dict] +get_metadata_fields_info() -> dict[str, dict] ``` Returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: +**Returns:** -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_fields\_info\_async +#### get_metadata_fields_info_async ```python -async def get_metadata_fields_info_async() -> dict[str, dict] +get_metadata_fields_info_async() -> dict[str, dict] ``` Asynchronously returns the metadata fields and their corresponding types. Since MongoDB is schemaless, this method samples the latest 50 documents to infer the fields and their types. -**Returns**: - -A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. +**Returns:** - +- dict\[str, dict\] – A dictionary where the keys are the metadata field names and the values are dictionary with 'type'. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max +#### get_metadata_field_min_max ```python -def get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max(metadata_field: str) -> dict[str, Any] ``` For a given metadata field, find its max and min value. -**Arguments**: +**Parameters:** -- `metadata_field`: The metadata field to get the min and max values for. +- **metadata_field** (str) – The metadata field to get the min and max values for. -**Returns**: +**Returns:** -A dictionary with 'min' and 'max' keys. +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. - - -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_min\_max\_async +#### get_metadata_field_min_max_async ```python -async def get_metadata_field_min_max_async( - metadata_field: str) -> dict[str, Any] +get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any] ``` Asynchronously for a given metadata field, find its max and min value. -**Arguments**: - -- `metadata_field`: The metadata field to get the min and max values for. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to get the min and max values for. -A dictionary with 'min' and 'max' keys. +**Returns:** - +- dict\[str, Any\] – A dictionary with 'min' and 'max' keys. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values +#### get_metadata_field_unique_values ```python -def get_metadata_field_unique_values(metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Retrieves unique values for a field matching a search_term or all possible values if no search term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.get\_metadata\_field\_unique\_values\_async +#### get_metadata_field_unique_values_async ```python -async def get_metadata_field_unique_values_async( - metadata_field: str, - search_term: str | None = None, - from_: int = 0, - size: int = 10) -> tuple[list[str], int] +get_metadata_field_unique_values_async( + metadata_field: str, + search_term: str | None = None, + from_: int = 0, + size: int = 10, +) -> tuple[list[str], int] ``` Asynchronously retrieves unique values for a field matching a search_term or all possible values if no search - term is given. -**Arguments**: - -- `metadata_field`: The metadata field to retrieve unique values for. -- `search_term`: The search term to filter values. Matches as a case-insensitive substring. -- `from_`: The starting index for pagination. -- `size`: The number of values to return. +**Parameters:** -**Returns**: +- **metadata_field** (str) – The metadata field to retrieve unique values for. +- **search_term** (str | None) – The search term to filter values. Matches as a case-insensitive substring. +- **from\_** (int) – The starting index for pagination. +- **size** (int) – The number of values to return. -A tuple containing a list of unique values and the total count of unique values matching the -search term. +**Returns:** - +- tuple\[list\[str\], int\] – A tuple containing a list of unique values and the total count of unique values matching the + search term. -#### MongoDBAtlasDocumentStore.filter\_documents +#### filter_documents ```python -def filter_documents(filters: dict[str, Any] | None = None) -> list[Document] +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] ``` Returns the documents that match the filters provided. @@ -700,21 +639,18 @@ Returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: +**Parameters:** -- `filters`: The filters to apply. It returns only the documents that match the filters. +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -**Returns**: +**Returns:** -A list of Documents that match the given filters. +- list\[Document\] – A list of Documents that match the given filters. - - -#### MongoDBAtlasDocumentStore.filter\_documents\_async +#### filter_documents_async ```python -async def filter_documents_async( - filters: dict[str, Any] | None = None) -> list[Document] +filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document] ``` Asynchronously returns the documents that match the filters provided. @@ -722,205 +658,184 @@ Asynchronously returns the documents that match the filters provided. For a detailed specification of the filters, refer to the Haystack [documentation](https://docs.haystack.deepset.ai/docs/metadata-filtering). -**Arguments**: - -- `filters`: The filters to apply. It returns only the documents that match the filters. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\] | None) – The filters to apply. It returns only the documents that match the filters. -A list of Documents that match the given filters. +**Returns:** - +- list\[Document\] – A list of Documents that match the given filters. -#### MongoDBAtlasDocumentStore.write\_documents +#### write_documents ```python -def write_documents(documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: +**Parameters:** -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -**Raises**: +**Returns:** -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +- int – The number of documents written to the document store. -**Returns**: +**Raises:** -The number of documents written to the document store. +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. - - -#### MongoDBAtlasDocumentStore.write\_documents\_async +#### write_documents_async ```python -async def write_documents_async( - documents: list[Document], - policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int +write_documents_async( + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE +) -> int ``` Writes documents into the MongoDB Atlas collection. -**Arguments**: - -- `documents`: A list of Documents to write to the document store. -- `policy`: The duplicate policy to use when writing documents. +**Parameters:** -**Raises**: +- **documents** (list\[Document\]) – A list of Documents to write to the document store. +- **policy** (DuplicatePolicy) – The duplicate policy to use when writing documents. -- `DuplicateDocumentError`: If a document with the same ID already exists in the document store -and the policy is set to DuplicatePolicy.FAIL (or not specified). -- `ValueError`: If the documents are not of type Document. +**Returns:** -**Returns**: +- int – The number of documents written to the document store. -The number of documents written to the document store. +**Raises:** - +- DuplicateDocumentError – If a document with the same ID already exists in the document store + and the policy is set to DuplicatePolicy.FAIL (or not specified). +- ValueError – If the documents are not of type Document. -#### MongoDBAtlasDocumentStore.delete\_documents +#### delete_documents ```python -def delete_documents(document_ids: list[str]) -> None +delete_documents(document_ids: list[str]) -> None ``` Deletes all documents with a matching document_ids from the document store. -**Arguments**: - -- `document_ids`: the document ids to delete +**Parameters:** - +- **document_ids** (list\[str\]) – the document ids to delete -#### MongoDBAtlasDocumentStore.delete\_documents\_async +#### delete_documents_async ```python -async def delete_documents_async(document_ids: list[str]) -> None +delete_documents_async(document_ids: list[str]) -> None ``` Asynchronously deletes all documents with a matching document_ids from the document store. -**Arguments**: +**Parameters:** -- `document_ids`: the document ids to delete +- **document_ids** (list\[str\]) – the document ids to delete - - -#### MongoDBAtlasDocumentStore.delete\_by\_filter +#### delete_by_filter ```python -def delete_by_filter(filters: dict[str, Any]) -> int +delete_by_filter(filters: dict[str, Any]) -> int ``` Deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.delete\_by\_filter\_async +#### delete_by_filter_async ```python -async def delete_by_filter_async(filters: dict[str, Any]) -> int +delete_by_filter_async(filters: dict[str, Any]) -> int ``` Asynchronously deletes all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for deletion. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for deletion. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -The number of documents deleted. +**Returns:** - +- int – The number of documents deleted. -#### MongoDBAtlasDocumentStore.update\_by\_filter +#### update_by_filter ```python -def update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Updates the metadata of all documents that match the provided filters. -**Arguments**: +**Parameters:** -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -**Returns**: +**Returns:** -The number of documents updated. +- int – The number of documents updated. - - -#### MongoDBAtlasDocumentStore.update\_by\_filter\_async +#### update_by_filter_async ```python -async def update_by_filter_async(filters: dict[str, Any], - meta: dict[str, Any]) -> int +update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int ``` Asynchronously updates the metadata of all documents that match the provided filters. -**Arguments**: - -- `filters`: The filters to apply to select documents for updating. -For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) -- `meta`: The metadata fields to update. +**Parameters:** -**Returns**: +- **filters** (dict\[str, Any\]) – The filters to apply to select documents for updating. + For filter syntax, see [Haystack metadata filtering](https://docs.haystack.deepset.ai/docs/metadata-filtering) +- **meta** (dict\[str, Any\]) – The metadata fields to update. -The number of documents updated. +**Returns:** - +- int – The number of documents updated. -#### MongoDBAtlasDocumentStore.delete\_all\_documents +#### delete_all_documents ```python -def delete_all_documents(*, recreate_collection: bool = False) -> None +delete_all_documents(*, recreate_collection: bool = False) -> None ``` Deletes all documents in the document store. -**Arguments**: - -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +**Parameters:** - +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. -#### MongoDBAtlasDocumentStore.delete\_all\_documents\_async +#### delete_all_documents_async ```python -async def delete_all_documents_async(*, - recreate_collection: bool = False - ) -> None +delete_all_documents_async(*, recreate_collection: bool = False) -> None ``` Asynchronously deletes all documents in the document store. -**Arguments**: +**Parameters:** -- `recreate_collection`: If True, the collection will be dropped and recreated with the original -configuration and indexes. If False, all documents will be deleted while preserving the collection. -Recreating the collection is faster for very large collections. +- **recreate_collection** (bool) – If True, the collection will be dropped and recreated with the original + configuration and indexes. If False, all documents will be deleted while preserving the collection. + Recreating the collection is faster for very large collections. +## haystack_integrations.document_stores.mongodb_atlas.filters