| title | Azure AI Search |
|---|---|
| id | integrations-azure_ai_search |
| description | Azure AI Search integration for Haystack |
| slug | /integrations-azure_ai_search |
Retrieves documents from the AzureAISearchDocumentStore using a vector similarity metric.
Must be connected to the AzureAISearchDocumentStore to run.
__init__(
*,
document_store: AzureAISearchDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
**kwargs: Any
) -> NoneCreate the AzureAISearchEmbeddingRetriever component.
Parameters:
- document_store (
AzureAISearchDocumentStore) – An instance of AzureAISearchDocumentStore to use with the Retriever. - filters (
dict[str, Any] | None) – Filters applied when fetching documents from the Document Store. - top_k (
int) – Maximum number of documents to return. - filter_policy (
str | FilterPolicy) – Policy to determine how filters are applied. - kwargs (
Any) – Additional keyword arguments to pass to the Azure AI's search endpoint. Some of the supported parameters:query_type: A string indicating the type of query to perform. Possible values are 'simple','full' and 'semantic'.semantic_configuration_name: The name of semantic configuration to be used when processing semantic queries. For more information on parameters, see the official Azure AI Search documentation.
to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict(data: dict[str, Any]) -> AzureAISearchEmbeddingRetrieverDeserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
AzureAISearchEmbeddingRetriever– Deserialized component.
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]Retrieve documents from the AzureAISearchDocumentStore.
Parameters:
- query_embedding (
list[float]) – A list of floats representing the query embedding. - filters (
dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on thefilter_policychosen at retriever initialization. See__init__method docstring for more details. - top_k (
int | None) – The maximum number of documents to retrieve.
Returns:
dict[str, list[Document]]– Dictionary with the following keys:documents: A list of documents retrieved from the AzureAISearchDocumentStore.
Document store using Azure AI Search as the backend.
__init__(
*,
api_key: Secret = Secret.from_env_var(
"AZURE_AI_SEARCH_API_KEY", strict=False
),
azure_endpoint: Secret = Secret.from_env_var(
"AZURE_AI_SEARCH_ENDPOINT", strict=True
),
index_name: str = "default",
embedding_dimension: int = 768,
metadata_fields: dict[str, SearchField | type] | None = None,
vector_search_configuration: VectorSearch | None = None,
include_search_metadata: bool = False,
azure_token_credential: TokenCredential | None = None,
**index_creation_kwargs: Any
) -> NoneCreates a new instance of AzureAISearchDocumentStore.
Parameters:
- azure_endpoint (
Secret) – The URL endpoint of an Azure AI Search service. - api_key (
Secret) – The API key to use for authentication. - index_name (
str) – Name of index in Azure AI Search, if it doesn't exist it will be created. - embedding_dimension (
int) – Dimension of the embeddings. - metadata_fields (
dict[str, SearchField | type] | None) – A dictionary mapping metadata field names to their corresponding field definitions. Each field can be defined either as: - A SearchField object to specify detailed field configuration like type, searchability, and filterability
- A Python type (
str,bool,int,float, ordatetime) to create a simple filterable field
These fields are automatically added when creating the search index. Example:
metadata_fields={
"Title": SearchField(
name="Title",
type="Edm.String",
searchable=True,
filterable=True
),
"Pages": int
}- vector_search_configuration (
VectorSearch | None) – Configuration option related to vector search. Default configuration uses the HNSW algorithm with cosine similarity to handle vector searches. - include_search_metadata (
bool) – Whether to include Azure AI Search metadata fields in the returned documents. When set to True, themetafield of the returned documents will contain the @search.score, @search.reranker_score, @search.highlights, @search.captions, and other fields returned by Azure AI Search. - azure_token_credential (
TokenCredential | None) – An AzureTokenCredentialinstance used to authenticate requests. When provided, this takes priority overapi_key. - index_creation_kwargs (
Any) – Optional keyword parameters to be passed toSearchIndexclass during index creation. Some of the supported parameters: -semantic_search: Defines semantic configuration of the search index. This parameter is needed to enable semantic search capabilities in index. -similarity: The type of similarity algorithm to be used when scoring and ranking the documents matching a search query. The similarity algorithm can only be defined at index creation time and cannot be modified on existing indexes.
For more information on parameters, see the official Azure AI Search documentation.
client: SearchClientReturn the Azure SearchClient, creating the index if it does not exist.
to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict(data: dict[str, Any]) -> AzureAISearchDocumentStoreDeserializes the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
AzureAISearchDocumentStore– Deserialized component.
count_documents() -> intReturns how many documents are present in the search index.
Returns:
int– list of retrieved documents.
count_documents_by_filter(filters: dict[str, Any]) -> intReturns the count of documents that match the provided filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to the document list. For filter syntax, see Haystack metadata filtering
Returns:
int– The number of documents that match the filters.
count_unique_metadata_by_filter(
filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]Counts unique values for each specified metadata field in documents matching the filters.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents. - metadata_fields (
list[str]) – List of field names to count unique values for.
Returns:
dict[str, int]– Dictionary mapping field names to counts of unique values.
get_metadata_fields_info() -> dict[str, dict[str, str]]Returns the information about metadata fields in the index.
Returns:
dict[str, dict[str, str]]– Dictionary mapping field names to type information.
get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]Returns the minimum and maximum values for the given metadata field.
Parameters:
- metadata_field (
str) – The metadata field to get the minimum and maximum values for.
Returns:
dict[str, Any]– A dictionary with the keys "min" and "max".
get_metadata_field_unique_values(
metadata_field: str,
search_term: str | None = None,
from_: int = 0,
size: int = 10,
) -> tuple[list[str], int]Retrieves unique values for a metadata field with optional search and pagination.
Parameters:
- metadata_field (
str) – The metadata field to get unique values for. - search_term (
str | None) – Optional search term to filter unique values. - from_ (
int) – Starting offset for pagination. - size (
int) – Number of values to return.
Returns:
tuple[list[str], int]– Tuple of (list of unique values, total count of matching values).
query_sql(query: str) -> AnyExecutes an SQL query if supported by the document store backend.
Azure AI Search does not support SQL queries.
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> intWrites the provided documents to search index.
Parameters:
- documents (
list[Document]) – documents to write to the index. - policy (
DuplicatePolicy) – Policy to determine how duplicates are handled.
Returns:
int– the number of documents added to index.
Raises:
ValueError– If the documents are not of type Document.TypeError– If the document ids are not strings.
delete_documents(document_ids: list[str]) -> NoneDeletes all documents with a matching document_ids from the search index.
Parameters:
- document_ids (
list[str]) – ids of the documents to be deleted.
delete_all_documents(recreate_index: bool = False) -> NoneDeletes all documents in the document store.
Parameters:
- recreate_index (
bool) – If True, the index will be deleted and recreated with the original schema. If False, all documents will be deleted while preserving the index.
delete_by_filter(filters: dict[str, Any]) -> intDeletes all documents that match the provided filters.
Azure AI Search does not support server-side delete by query, so this method first searches for matching documents, then deletes them in a batch operation.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering
Returns:
int– The number of documents deleted.
update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> intUpdates the fields of all documents that match the provided filters.
Azure AI Search does not support server-side update by query, so this method first searches for matching documents, then updates them using merge operations.
Parameters:
- filters (
dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering - meta (
dict[str, Any]) – The fields to update. These fields must exist in the index schema.
Returns:
int– The number of documents updated.
get_documents_by_id(document_ids: list[str]) -> list[Document]Retrieves documents by their IDs.
Parameters:
- document_ids (
list[str]) – IDs of the documents to retrieve.
Returns:
list[Document]– List of documents with the given IDs.
search_documents(search_text: str = '*', top_k: int = 10) -> list[Document]Returns all documents that match the provided search_text.
If search_text is None, returns all documents.
Parameters:
- search_text (
str) – the text to search for in the Document list. - top_k (
int) – Maximum number of documents to return.
Returns:
list[Document]– A list of Documents that match the given search_text.
filter_documents(filters: dict[str, Any] | None = None) -> list[Document]Returns the documents that match the provided filters.
Filters should be given as a dictionary supporting filtering by metadata. For details on filters, see the metadata filtering documentation.
Parameters:
- filters (
dict[str, Any] | None) – the filters to apply to the document list.
Returns:
list[Document]– A list of Documents that match the given filters.