haystack/docs-website/reference_versioned_docs/version-2.29/integrations-api/azure_ai_search.md at f04ba18c50942524bdf6eff5b31b697a3e6866ed · deepset-ai/haystack

title	Azure AI Search
id	integrations-azure_ai_search
description	Azure AI Search integration for Haystack
slug	/integrations-azure_ai_search

haystack_integrations.components.retrievers.azure_ai_search.embedding_retriever

AzureAISearchEmbeddingRetriever

Retrieves documents from the AzureAISearchDocumentStore using a vector similarity metric.

Must be connected to the AzureAISearchDocumentStore to run.

init

__init__(
    *,
    document_store: AzureAISearchDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE,
    **kwargs: Any
) -> None

Create the AzureAISearchEmbeddingRetriever component.

Parameters:

document_store (AzureAISearchDocumentStore) – An instance of AzureAISearchDocumentStore to use with the Retriever.
filters (dict[str, Any] | None) – Filters applied when fetching documents from the Document Store.
top_k (int) – Maximum number of documents to return.
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.
kwargs (Any) – Additional keyword arguments to pass to the Azure AI's search endpoint. Some of the supported parameters:
- query_type: A string indicating the type of query to perform. Possible values are 'simple','full' and 'semantic'.
- semantic_configuration_name: The name of semantic configuration to be used when processing semantic queries. For more information on parameters, see the official Azure AI Search documentation.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> AzureAISearchEmbeddingRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

AzureAISearchEmbeddingRetriever – Deserialized component.

run

run(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
) -> dict[str, list[Document]]

Retrieve documents from the AzureAISearchDocumentStore.

Parameters:

query_embedding (list[float]) – A list of floats representing the query embedding.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See __init__ method docstring for more details.
top_k (int | None) – The maximum number of documents to retrieve.

Returns:

dict[str, list[Document]] – Dictionary with the following keys:
documents: A list of documents retrieved from the AzureAISearchDocumentStore.

haystack_integrations.document_stores.azure_ai_search.document_store

AzureAISearchDocumentStore

Document store using Azure AI Search as the backend.

init

__init__(
    *,
    api_key: Secret = Secret.from_env_var(
        "AZURE_AI_SEARCH_API_KEY", strict=False
    ),
    azure_endpoint: Secret = Secret.from_env_var(
        "AZURE_AI_SEARCH_ENDPOINT", strict=True
    ),
    index_name: str = "default",
    embedding_dimension: int = 768,
    metadata_fields: dict[str, SearchField | type] | None = None,
    vector_search_configuration: VectorSearch | None = None,
    include_search_metadata: bool = False,
    azure_token_credential: TokenCredential | None = None,
    **index_creation_kwargs: Any
) -> None

Creates a new instance of AzureAISearchDocumentStore.

Parameters:

azure_endpoint (Secret) – The URL endpoint of an Azure AI Search service.
api_key (Secret) – The API key to use for authentication.
index_name (str) – Name of index in Azure AI Search, if it doesn't exist it will be created.
embedding_dimension (int) – Dimension of the embeddings.
metadata_fields (dict[str, SearchField | type] | None) – A dictionary mapping metadata field names to their corresponding field definitions. Each field can be defined either as:
A SearchField object to specify detailed field configuration like type, searchability, and filterability
A Python type (str, bool, int, float, or datetime) to create a simple filterable field

These fields are automatically added when creating the search index. Example:

metadata_fields={
    "Title": SearchField(
        name="Title",
        type="Edm.String",
        searchable=True,
        filterable=True
    ),
    "Pages": int
}

vector_search_configuration (VectorSearch | None) – Configuration option related to vector search. Default configuration uses the HNSW algorithm with cosine similarity to handle vector searches.
include_search_metadata (bool) – Whether to include Azure AI Search metadata fields in the returned documents. When set to True, the meta field of the returned documents will contain the @search.score, @search.reranker_score, @search.highlights, @search.captions, and other fields returned by Azure AI Search.
azure_token_credential (TokenCredential | None) – An Azure TokenCredential instance used to authenticate requests. When provided, this takes priority over api_key.
index_creation_kwargs (Any) – Optional keyword parameters to be passed to SearchIndex class during index creation. Some of the supported parameters: - semantic_search: Defines semantic configuration of the search index. This parameter is needed to enable semantic search capabilities in index. - similarity: The type of similarity algorithm to be used when scoring and ranking the documents matching a search query. The similarity algorithm can only be defined at index creation time and cannot be modified on existing indexes.

For more information on parameters, see the official Azure AI Search documentation.

client

client: SearchClient

Return the Azure SearchClient, creating the index if it does not exist.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> AzureAISearchDocumentStore

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

AzureAISearchDocumentStore – Deserialized component.

count_documents

count_documents() -> int

Returns how many documents are present in the search index.

Returns:

int – list of retrieved documents.

count_documents_by_filter

count_documents_by_filter(filters: dict[str, Any]) -> int

Returns the count of documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to the document list. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents that match the filters.

count_unique_metadata_by_filter

count_unique_metadata_by_filter(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Counts unique values for each specified metadata field in documents matching the filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents.
metadata_fields (list[str]) – List of field names to count unique values for.

Returns:

dict[str, int] – Dictionary mapping field names to counts of unique values.

get_metadata_fields_info

get_metadata_fields_info() -> dict[str, dict[str, str]]

Returns the information about metadata fields in the index.

Returns:

dict[str, dict[str, str]] – Dictionary mapping field names to type information.

get_metadata_field_min_max

get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]

Returns the minimum and maximum values for the given metadata field.

Parameters:

metadata_field (str) – The metadata field to get the minimum and maximum values for.

Returns:

dict[str, Any] – A dictionary with the keys "min" and "max".

get_metadata_field_unique_values

get_metadata_field_unique_values(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int = 10,
) -> tuple[list[str], int]

Retrieves unique values for a metadata field with optional search and pagination.

Parameters:

metadata_field (str) – The metadata field to get unique values for.
search_term (str | None) – Optional search term to filter unique values.
from_ (int) – Starting offset for pagination.
size (int) – Number of values to return.

Returns:

tuple[list[str], int] – Tuple of (list of unique values, total count of matching values).

query_sql

query_sql(query: str) -> Any

Executes an SQL query if supported by the document store backend.

Azure AI Search does not support SQL queries.

write_documents

write_documents(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Writes the provided documents to search index.

Parameters:

documents (list[Document]) – documents to write to the index.
policy (DuplicatePolicy) – Policy to determine how duplicates are handled.

Returns:

int – the number of documents added to index.

Raises:

ValueError – If the documents are not of type Document.
TypeError – If the document ids are not strings.

delete_documents

delete_documents(document_ids: list[str]) -> None

Deletes all documents with a matching document_ids from the search index.

Parameters:

document_ids (list[str]) – ids of the documents to be deleted.

delete_all_documents

delete_all_documents(recreate_index: bool = False) -> None

Deletes all documents in the document store.

Parameters:

recreate_index (bool) – If True, the index will be deleted and recreated with the original schema. If False, all documents will be deleted while preserving the index.

delete_by_filter

delete_by_filter(filters: dict[str, Any]) -> int

Deletes all documents that match the provided filters.

Azure AI Search does not support server-side delete by query, so this method first searches for matching documents, then deletes them in a batch operation.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents deleted.

update_by_filter

update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int

Updates the fields of all documents that match the provided filters.

Azure AI Search does not support server-side update by query, so this method first searches for matching documents, then updates them using merge operations.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering
meta (dict[str, Any]) – The fields to update. These fields must exist in the index schema.

Returns:

int – The number of documents updated.

get_documents_by_id

get_documents_by_id(document_ids: list[str]) -> list[Document]

Retrieves documents by their IDs.

Parameters:

document_ids (list[str]) – IDs of the documents to retrieve.

Returns:

list[Document] – List of documents with the given IDs.

search_documents

search_documents(search_text: str = '*', top_k: int = 10) -> list[Document]

Returns all documents that match the provided search_text.

If search_text is None, returns all documents.

Parameters:

search_text (str) – the text to search for in the Document list.
top_k (int) – Maximum number of documents to return.

Returns:

list[Document] – A list of Documents that match the given search_text.

filter_documents

filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Returns the documents that match the provided filters.

Filters should be given as a dictionary supporting filtering by metadata. For details on filters, see the metadata filtering documentation.

Parameters:

filters (dict[str, Any] | None) – the filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

haystack_integrations.components.retrievers.azure_ai_search.embedding_retriever

AzureAISearchEmbeddingRetriever

init

to_dict

from_dict

run

haystack_integrations.document_stores.azure_ai_search.document_store

AzureAISearchDocumentStore

init

client

to_dict

from_dict

count_documents

count_documents_by_filter

count_unique_metadata_by_filter

get_metadata_fields_info

get_metadata_field_min_max

get_metadata_field_unique_values

query_sql

write_documents

delete_documents

delete_all_documents

delete_by_filter

update_by_filter

get_documents_by_id

search_documents

filter_documents

haystack_integrations.document_stores.azure_ai_search.filters

FilesExpand file tree

azure_ai_search.md

Latest commit

History

azure_ai_search.md

File metadata and controls

haystack_integrations.components.retrievers.azure_ai_search.embedding_retriever

AzureAISearchEmbeddingRetriever

init

to_dict

from_dict

run

haystack_integrations.document_stores.azure_ai_search.document_store

AzureAISearchDocumentStore

init

client

to_dict

from_dict

count_documents

count_documents_by_filter

count_unique_metadata_by_filter

get_metadata_fields_info

get_metadata_field_min_max

get_metadata_field_unique_values

query_sql

write_documents

delete_documents

delete_all_documents

delete_by_filter

update_by_filter

get_documents_by_id

search_documents

filter_documents

haystack_integrations.document_stores.azure_ai_search.filters