haystack/docs-website/reference_versioned_docs/version-2.29/integrations-api/weaviate.md at f04ba18c50942524bdf6eff5b31b697a3e6866ed · deepset-ai/haystack

title	Weaviate
id	integrations-weaviate
description	Weaviate integration for Haystack
slug	/integrations-weaviate

haystack_integrations.components.retrievers.weaviate.bm25_retriever

WeaviateBM25Retriever

A component for retrieving documents from Weaviate using the BM25 algorithm.

Example usage:

from haystack_integrations.document_stores.weaviate.document_store import (
    WeaviateDocumentStore,
)
from haystack_integrations.components.retrievers.weaviate.bm25_retriever import (
    WeaviateBM25Retriever,
)

document_store = WeaviateDocumentStore(url="http://localhost:8080")
retriever = WeaviateBM25Retriever(document_store=document_store)
retriever.run(query="How to make a pizza", top_k=3)

init

__init__(
    *,
    document_store: WeaviateDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> None

Create a new instance of WeaviateBM25Retriever.

Parameters:

document_store (WeaviateDocumentStore) – Instance of WeaviateDocumentStore that will be used from this retriever.
filters (dict[str, Any] | None) – Custom filters applied when running the retriever
top_k (int) – Maximum number of documents to return
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> WeaviateBM25Retriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

WeaviateBM25Retriever – Deserialized component.

run

run(
    query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
) -> dict[str, list[Document]]

Retrieves documents from Weaviate using the BM25 algorithm.

Parameters:

query (str) – The query text.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – The maximum number of documents to return.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of documents returned by the search engine.

run_async

run_async(
    query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
) -> dict[str, list[Document]]

Asynchronously retrieves documents from Weaviate using the BM25 algorithm.

Parameters:

query (str) – The query text.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – The maximum number of documents to return.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of documents returned by the search engine.

haystack_integrations.components.retrievers.weaviate.embedding_retriever

WeaviateEmbeddingRetriever

A retriever that uses Weaviate's vector search to find similar documents based on the embeddings of the query.

init

__init__(
    *,
    document_store: WeaviateDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    distance: float | None = None,
    certainty: float | None = None,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> None

Creates a new instance of WeaviateEmbeddingRetriever.

Parameters:

document_store (WeaviateDocumentStore) – Instance of WeaviateDocumentStore that will be used from this retriever.
filters (dict[str, Any] | None) – Custom filters applied when running the retriever.
top_k (int) – Maximum number of documents to return.
distance (float | None) – The maximum allowed distance between Documents' embeddings.
certainty (float | None) – Normalized distance between the result item and the search vector.
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.

Raises:

ValueError – If both distance and certainty are provided. See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more about distance and certainty parameters.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> WeaviateEmbeddingRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

WeaviateEmbeddingRetriever – Deserialized component.

run

run(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
    distance: float | None = None,
    certainty: float | None = None,
) -> dict[str, list[Document]]

Retrieves documents from Weaviate using the vector search.

Parameters:

query_embedding (list[float]) – Embedding of the query.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – The maximum number of documents to return.
distance (float | None) – The maximum allowed distance between Documents' embeddings.
certainty (float | None) – Normalized distance between the result item and the search vector.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of documents returned by the search engine.

Raises:

ValueError – If both distance and certainty are provided. See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more about distance and certainty parameters.

run_async

run_async(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
    distance: float | None = None,
    certainty: float | None = None,
) -> dict[str, list[Document]]

Asynchronously retrieves documents from Weaviate using the vector search.

Parameters:

query_embedding (list[float]) – Embedding of the query.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – The maximum number of documents to return.
distance (float | None) – The maximum allowed distance between Documents' embeddings.
certainty (float | None) – Normalized distance between the result item and the search vector.

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of documents returned by the search engine.

Raises:

ValueError – If both distance and certainty are provided. See https://weaviate.io/developers/weaviate/api/graphql/search-operators#variables to learn more about distance and certainty parameters.

haystack_integrations.components.retrievers.weaviate.hybrid_retriever

WeaviateHybridRetriever

A retriever that uses Weaviate's hybrid search to find similar documents based on the embeddings of the query.

init

__init__(
    *,
    document_store: WeaviateDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    alpha: float = 0.7,
    max_vector_distance: float | None = None,
    filter_policy: str | FilterPolicy = FilterPolicy.REPLACE
) -> None

Creates a new instance of WeaviateHybridRetriever.

Parameters:

document_store (WeaviateDocumentStore) – Instance of WeaviateDocumentStore that will be used from this retriever.
filters (dict[str, Any] | None) – Custom filters applied when running the retriever.
top_k (int) – Maximum number of documents to return.
alpha (float) – Blending factor for hybrid retrieval in Weaviate. Must be in the range [0.0, 1.0].

Weaviate hybrid search combines keyword (BM25) and vector scores into a single ranking. alpha controls how much each part contributes to the final score:

alpha = 0.0: only keyword (BM25) scoring is used.
alpha = 1.0: only vector similarity scoring is used.
Values in between blend the two; higher values favor the vector score, lower values favor BM25.

By default, 0.7 is used which is the Weaviate server default.

See the official Weaviate docs on Hybrid Search parameters for more details:

Hybrid search parameters
Hybrid Search
max_vector_distance (float | None) – Optional threshold that restricts the vector part of the hybrid search to candidates within a maximum vector distance. Candidates with a distance larger than this threshold are excluded from the vector portion before blending.

Use this to prune low-quality vector matches while still benefitting from keyword recall. Leave None to use Weaviate's default behavior without an explicit cutoff.

See the official Weaviate docs on Hybrid Search parameters for more details:

Hybrid search parameters
Hybrid Search
filter_policy (str | FilterPolicy) – Policy to determine how filters are applied.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> WeaviateHybridRetriever

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – Dictionary to deserialize from.

Returns:

WeaviateHybridRetriever – Deserialized component.

run

run(
    query: str,
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
    alpha: float | None = None,
    max_vector_distance: float | None = None,
) -> dict[str, list[Document]]

Retrieves documents from Weaviate using hybrid search.

Parameters:

query (str) – The query text.
query_embedding (list[float]) – Embedding of the query.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – The maximum number of documents to return.
alpha (float | None) – Blending factor for hybrid retrieval in Weaviate. Must be in the range [0.0, 1.0].

Weaviate hybrid search combines keyword (BM25) and vector scores into a single ranking. alpha controls how much each part contributes to the final score:

alpha = 0.0: only keyword (BM25) scoring is used.
alpha = 1.0: only vector similarity scoring is used.
Values in between blend the two; higher values favor the vector score, lower values favor BM25.

If None, the Weaviate server default is used.

See the official Weaviate docs on Hybrid Search parameters for more details:

Hybrid search parameters
Hybrid Search
max_vector_distance (float | None) – Optional threshold that restricts the vector part of the hybrid search to candidates within a maximum vector distance. Candidates with a distance larger than this threshold are excluded from the vector portion before blending.

Use this to prune low-quality vector matches while still benefitting from keyword recall. Leave None to use Weaviate's default behavior without an explicit cutoff.

See the official Weaviate docs on Hybrid Search parameters for more details:

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of documents returned by the search engine.

run_async

run_async(
    query: str,
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
    alpha: float | None = None,
    max_vector_distance: float | None = None,
) -> dict[str, list[Document]]

Asynchronously retrieves documents from Weaviate using hybrid search.

Parameters:

query (str) – The query text.
query_embedding (list[float]) – Embedding of the query.
filters (dict[str, Any] | None) – Filters applied to the retrieved Documents. The way runtime filters are applied depends on the filter_policy chosen at retriever initialization. See init method docstring for more details.
top_k (int | None) – The maximum number of documents to return.
alpha (float | None) – Blending factor for hybrid retrieval in Weaviate. Must be in the range [0.0, 1.0].

Weaviate hybrid search combines keyword (BM25) and vector scores into a single ranking. alpha controls how much each part contributes to the final score:

alpha = 0.0: only keyword (BM25) scoring is used.
alpha = 1.0: only vector similarity scoring is used.
Values in between blend the two; higher values favor the vector score, lower values favor BM25.

If None, the Weaviate server default is used.

See the official Weaviate docs on Hybrid Search parameters for more details:

Hybrid search parameters
Hybrid Search
max_vector_distance (float | None) – Optional threshold that restricts the vector part of the hybrid search to candidates within a maximum vector distance. Candidates with a distance larger than this threshold are excluded from the vector portion before blending.

Use this to prune low-quality vector matches while still benefitting from keyword recall. Leave None to use Weaviate's default behavior without an explicit cutoff.

See the official Weaviate docs on Hybrid Search parameters for more details:

Returns:

dict[str, list[Document]] – A dictionary with the following keys:
documents: List of documents returned by the search engine.

haystack_integrations.document_stores.weaviate.auth

SupportedAuthTypes

Bases: Enum

Supported auth credentials for WeaviateDocumentStore.

from_class

from_class(auth_class: type[AuthCredentials]) -> SupportedAuthTypes

Return the SupportedAuthTypes enum value corresponding to the given auth credentials class.

AuthCredentials

Bases: ABC

Base class for all auth credentials supported by WeaviateDocumentStore.

Can be used to deserialize from dict any of the supported auth credentials.

to_dict

to_dict() -> dict[str, Any]

Converts the object to a dictionary representation for serialization.

from_dict

from_dict(data: dict[str, Any]) -> AuthCredentials

Converts a dictionary representation to an auth credentials object.

resolve_value

resolve_value() -> (
    WeaviateAuthApiKey
    | WeaviateAuthBearerToken
    | WeaviateAuthClientCredentials
    | WeaviateAuthClientPassword
)

Resolves all the secrets in the auth credentials object and returns the corresponding Weaviate object.

All subclasses must implement this method.

AuthApiKey

Bases: AuthCredentials

AuthCredentials for API key authentication.

By default it will load api_key from the environment variable WEAVIATE_API_KEY.

resolve_value

resolve_value() -> WeaviateAuthApiKey

Resolve the API key secret and return the corresponding Weaviate auth object.

AuthBearerToken

Bases: AuthCredentials

AuthCredentials for Bearer token authentication.

By default it will load access_token from the environment variable WEAVIATE_ACCESS_TOKEN, and refresh_token from the environment variable WEAVIATE_REFRESH_TOKEN. WEAVIATE_REFRESH_TOKEN environment variable is optional.

resolve_value

resolve_value() -> WeaviateAuthBearerToken

Resolve the bearer token secrets and return the corresponding Weaviate auth object.

AuthClientCredentials

Bases: AuthCredentials

AuthCredentials for client credentials authentication.

By default it will load client_secret from the environment variable WEAVIATE_CLIENT_SECRET, and scope from the environment variable WEAVIATE_SCOPE. WEAVIATE_SCOPE environment variable is optional, if set it can either be a string or a list of space separated strings. e.g "scope1" or "scope1 scope2".

resolve_value

resolve_value() -> WeaviateAuthClientCredentials

Resolve the client credentials secrets and return the corresponding Weaviate auth object.

AuthClientPassword

Bases: AuthCredentials

AuthCredentials for username and password authentication.

By default it will load username from the environment variable WEAVIATE_USERNAME, password from the environment variable WEAVIATE_PASSWORD, and scope from the environment variable WEAVIATE_SCOPE. WEAVIATE_SCOPE environment variable is optional, if set it can either be a string or a list of space separated strings. e.g "scope1" or "scope1 scope2".

resolve_value

resolve_value() -> WeaviateAuthClientPassword

Resolve the username and password secrets and return the corresponding Weaviate auth object.

haystack_integrations.document_stores.weaviate.document_store

WeaviateDocumentStore

A WeaviateDocumentStore instance you can use with Weaviate Cloud Services or self-hosted instances.

Usage example with Weaviate Cloud Services:

import os
from haystack_integrations.document_stores.weaviate.auth import AuthApiKey
from haystack_integrations.document_stores.weaviate.document_store import (
    WeaviateDocumentStore,
)

os.environ["WEAVIATE_API_KEY"] = "MY_API_KEY"

document_store = WeaviateDocumentStore(
    url="rAnD0mD1g1t5.something.weaviate.cloud",
    auth_client_secret=AuthApiKey(),
)

Usage example with self-hosted Weaviate:

from haystack_integrations.document_stores.weaviate.document_store import (
    WeaviateDocumentStore,
)

document_store = WeaviateDocumentStore(url="http://localhost:8080")

init

__init__(
    *,
    url: str | None = None,
    collection_settings: dict[str, Any] | None = None,
    auth_client_secret: AuthCredentials | None = None,
    additional_headers: dict | None = None,
    embedded_options: EmbeddedOptions | None = None,
    additional_config: AdditionalConfig | None = None,
    grpc_port: int = 50051,
    grpc_secure: bool = False
) -> None

Create a new instance of WeaviateDocumentStore and connects to the Weaviate instance.

Parameters:

url (str | None) – The URL to the weaviate instance.
collection_settings (dict[str, Any] | None) – The collection settings to use. If None, it will use a collection named default with the following properties:
_original_id: text
content: text
blob_data: blob
blob_mime_type: text
score: number The Document meta fields are omitted in the default collection settings as we can't make assumptions on the structure of the meta field. We heavily recommend to create a custom collection with the correct meta properties for your use case. Another option is relying on the automatic schema generation, but that's not recommended for production use. See the official Weaviate documentation for more information on collections and their properties.
auth_client_secret (AuthCredentials | None) – Authentication credentials. Can be one of the following types depending on the authentication mode:
AuthBearerToken to use existing access and (optionally, but recommended) refresh tokens
AuthClientPassword to use username and password for oidc Resource Owner Password flow
AuthClientCredentials to use a client secret for oidc client credential flow
AuthApiKey to use an API key
additional_headers (dict | None) – Additional headers to include in the requests. Can be used to set OpenAI/HuggingFace keys. OpenAI/HuggingFace key looks like this:

{"X-OpenAI-Api-Key": "<THE-KEY>"}, {"X-HuggingFace-Api-Key": "<THE-KEY>"}

embedded_options (EmbeddedOptions | None) – If set, create an embedded Weaviate cluster inside the client. For a full list of options see weaviate.embedded.EmbeddedOptions.
additional_config (AdditionalConfig | None) – Additional and advanced configuration options for weaviate.
grpc_port (int) – The port to use for the gRPC connection.
grpc_secure (bool) – Whether to use a secure channel for the underlying gRPC API.

client

client: weaviate.WeaviateClient

Return the synchronous Weaviate client, creating and connecting it if necessary.

async_client

async_client: weaviate.WeaviateAsyncClient

Return the asynchronous Weaviate client, creating and connecting it if necessary.

collection

collection: Collection[dict[str, Any], None]

Return the synchronous Weaviate collection, initializing it via the client if necessary.

async_collection

async_collection: CollectionAsync[dict[str, Any], None]

Return the asynchronous Weaviate collection, initializing it via the async client if necessary.

close

close() -> None

Close the synchronous Weaviate client connection.

close_async

close_async() -> None

Close the asynchronous Weaviate client connection.

to_dict

to_dict() -> dict[str, Any]

Serializes the component to a dictionary.

Returns:

dict[str, Any] – Dictionary with serialized data.

from_dict

from_dict(data: dict[str, Any]) -> WeaviateDocumentStore

Deserializes the component from a dictionary.

Parameters:

data (dict[str, Any]) – The dictionary to deserialize from.

Returns:

WeaviateDocumentStore – The deserialized component.

count_documents

count_documents() -> int

Returns the number of documents present in the DocumentStore.

count_documents_async

count_documents_async() -> int

Asynchronously returns the number of documents present in the DocumentStore.

count_documents_by_filter

count_documents_by_filter(filters: dict[str, Any]) -> int

Returns the number of documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to count documents. For filter syntax, see Haystack metadata filtering.

Returns:

int – The number of documents that match the filters.

count_documents_by_filter_async

count_documents_by_filter_async(filters: dict[str, Any]) -> int

Asynchronously returns the number of documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to count documents. For filter syntax, see Haystack metadata filtering.

Returns:

int – The number of documents that match the filters.

get_metadata_fields_info

get_metadata_fields_info() -> dict[str, dict[str, str]]

Returns metadata field names and their types, excluding special fields.

Special fields (content, blob_data, blob_mime_type, _original_id, score) are excluded as they are not user metadata fields.

Returns:

dict[str, dict[str, str]] – A dictionary where keys are field names and values are dictionaries containing type information, e.g.:

{
    'number': {'type': 'int'},
    'date': {'type': 'date'},
    'category': {'type': 'text'},
    'status': {'type': 'text'}
}

get_metadata_fields_info_async

get_metadata_fields_info_async() -> dict[str, dict[str, str]]

Asynchronously returns metadata field names and their types, excluding special fields.

Special fields (content, blob_data, blob_mime_type, _original_id, score) are excluded as they are not user metadata fields.

Returns:

dict[str, dict[str, str]] – A dictionary where keys are field names and values are dictionaries containing type information, e.g.:

{
    'number': {'type': 'int'},
    'date': {'type': 'date'},
    'category': {'type': 'text'},
    'status': {'type': 'text'}
}

get_metadata_field_min_max

get_metadata_field_min_max(metadata_field: str) -> dict[str, Any]

Returns the minimum and maximum values for a numeric or date metadata field.

Parameters:

metadata_field (str) – The metadata field name to get min/max for. Can be prefixed with 'meta.' (e.g., 'meta.year' or 'year').

Returns:

dict[str, Any] – A dictionary with 'min' and 'max' keys containing the respective values.

Raises:

ValueError – If the field is not found or doesn't support min/max operations.

get_metadata_field_min_max_async

get_metadata_field_min_max_async(metadata_field: str) -> dict[str, Any]

Asynchronously returns the minimum and maximum values for a numeric or date metadata field.

Parameters:

metadata_field (str) – The metadata field name to get min/max for. Can be prefixed with 'meta.' (e.g., 'meta.year' or 'year').

Returns:

dict[str, Any] – A dictionary with 'min' and 'max' keys containing the respective values.

Raises:

ValueError – If the field is not found or doesn't support min/max operations.

count_unique_metadata_by_filter

count_unique_metadata_by_filter(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Returns the count of unique values for each specified metadata field.

Parameters:

filters (dict[str, Any]) – The filters to apply when counting unique values. For filter syntax, see Haystack metadata filtering.
metadata_fields (list[str]) – List of metadata field names to count unique values for. Field names can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').

Returns:

dict[str, int] – A dictionary mapping field names to counts of unique values.

Raises:

ValueError – If any of the requested fields don't exist in the collection schema.

count_unique_metadata_by_filter_async

count_unique_metadata_by_filter_async(
    filters: dict[str, Any], metadata_fields: list[str]
) -> dict[str, int]

Asynchronously returns the count of unique values for each specified metadata field.

Parameters:

filters (dict[str, Any]) – The filters to apply when counting unique values. For filter syntax, see Haystack metadata filtering.
metadata_fields (list[str]) – List of metadata field names to count unique values for. Field names can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').

Returns:

dict[str, int] – A dictionary mapping field names to counts of unique values.

Raises:

ValueError – If any of the requested fields don't exist in the collection schema.

get_metadata_field_unique_values

get_metadata_field_unique_values(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int = 10000,
) -> tuple[list[str], int]

Returns unique values for a metadata field with pagination support.

Parameters:

metadata_field (str) – The metadata field name to get unique values for. Can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').
search_term (str | None) – Optional term to filter documents by content before extracting unique values. If provided, only documents whose content contains this term will be considered. Note: Uses substring matching (case-sensitive, no stemming).
from_ (int) – The starting offset for pagination (0-indexed). Defaults to 0.
size (int) – The maximum number of unique values to return. Defaults to 10000.

Returns:

tuple[list[str], int] – A tuple of (list of unique values, total count of unique values).

Raises:

ValueError – If the field is not found in the collection schema.

get_metadata_field_unique_values_async

get_metadata_field_unique_values_async(
    metadata_field: str,
    search_term: str | None = None,
    from_: int = 0,
    size: int = 10000,
) -> tuple[list[str], int]

Asynchronously returns unique values for a metadata field with pagination support.

Parameters:

metadata_field (str) – The metadata field name to get unique values for. Can be prefixed with 'meta.' (e.g., 'meta.category' or 'category').
search_term (str | None) – Optional term to filter documents by content before extracting unique values. If provided, only documents whose content contains this term will be considered. Note: Uses substring matching (case-sensitive, no stemming).
from_ (int) – The starting offset for pagination (0-indexed). Defaults to 0.
size (int) – The maximum number of unique values to return. Defaults to 10000.

Returns:

tuple[list[str], int] – A tuple of (list of unique values, total count of unique values).

Raises:

ValueError – If the field is not found in the collection schema.

filter_documents

filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol documentation.

Note: The contains filter operator is case-sensitive (substring matching). For case-insensitive matching, normalize the value before building the filter.

Parameters:

filters (dict[str, Any] | None) – The filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

filter_documents_async

filter_documents_async(filters: dict[str, Any] | None = None) -> list[Document]

Asynchronously returns the documents that match the filters provided.

For a detailed specification of the filters, refer to the DocumentStore.filter_documents() protocol documentation.

Note: The contains filter operator is case-sensitive (substring matching). For case-insensitive matching, normalize the value before building the filter.

Parameters:

filters (dict[str, Any] | None) – The filters to apply to the document list.

Returns:

list[Document] – A list of Documents that match the given filters.

write_documents

write_documents(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Writes documents to Weaviate using the specified policy.

We recommend using a OVERWRITE policy as it's faster than other policies for Weaviate since it uses the batch API. We can't use the batch API for other policies as it doesn't return any information whether the document already exists or not. That prevents us from returning errors when using the FAIL policy or skipping a Document when using the SKIP policy.

Parameters:

documents (list[Document]) – A list of documents to write into the document store.
policy (DuplicatePolicy) – DuplicatePolicy to apply when a document with the same ID already exists in the document store.

Returns:

int – The number of documents written.

Raises:

ValueError – When input is not valid.
DuplicateDocumentError – When duplicate documents are found and using a FAIL policy.
DocumentStoreError – When documents have failed to be batch written.

write_documents_async

write_documents_async(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Asynchronously writes documents to Weaviate using the specified policy.

Parameters:

documents (list[Document]) – A list of documents to write into the document store.
policy (DuplicatePolicy) – DuplicatePolicy to apply when a document with the same ID already exists in the document store.

Returns:

int – The number of documents written.

Raises:

ValueError – When input is not valid.
DuplicateDocumentError – When duplicate documents are found and using a FAIL policy.
DocumentStoreError – When documents have failed to be batch written.

delete_documents

delete_documents(document_ids: list[str]) -> None

Deletes all documents with matching document_ids from the DocumentStore.

Parameters:

document_ids (list[str]) – The object_ids to delete.

delete_documents_async

delete_documents_async(document_ids: list[str]) -> None

Asynchronously deletes all documents with matching document_ids from the DocumentStore.

Parameters:

document_ids (list[str]) – The object_ids to delete.

delete_all_documents

delete_all_documents(
    *, recreate_index: bool = False, batch_size: int = 1000
) -> None

Deletes all documents in a collection.

If recreate_index is False, it keeps the collection but deletes documents iteratively. If recreate_index is True, the collection is dropped and faithfully recreated. This is recommended for performance reasons.

Parameters:

recreate_index (bool) – Use drop and recreate strategy. (recommended for performance)
batch_size (int) – Only relevant if recreate_index is false. Defines the deletion batch size. Note that this parameter needs to be less or equal to the set QUERY_MAXIMUM_RESULTS variable set for the weaviate deployment (default is 10000). Reference: https://docs.weaviate.io/weaviate/manage-objects/delete#delete-all-objects

delete_all_documents_async

delete_all_documents_async(
    *, recreate_index: bool = False, batch_size: int = 1000
) -> None

Asynchronously deletes all documents in a collection.

Parameters:

recreate_index (bool) – Use drop and recreate strategy. (recommended for performance)
batch_size (int) – Only relevant if recreate_index is false. Defines the deletion batch size. Note that this parameter needs to be less or equal to the set QUERY_MAXIMUM_RESULTS variable set for the weaviate deployment (default is 10000). Reference: https://docs.weaviate.io/weaviate/manage-objects/delete#delete-all-objects

delete_by_filter

delete_by_filter(filters: dict[str, Any]) -> int

Deletes all documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents deleted.

delete_by_filter_async

delete_by_filter_async(filters: dict[str, Any]) -> int

Asynchronously deletes all documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for deletion. For filter syntax, see Haystack metadata filtering

Returns:

int – The number of documents deleted.

update_by_filter

update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int

Updates the metadata of all documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering
meta (dict[str, Any]) – The metadata fields to update. These will be merged with existing metadata.

Returns:

int – The number of documents updated.

update_by_filter_async

update_by_filter_async(filters: dict[str, Any], meta: dict[str, Any]) -> int

Asynchronously updates the metadata of all documents that match the provided filters.

Parameters:

filters (dict[str, Any]) – The filters to apply to select documents for updating. For filter syntax, see Haystack metadata filtering
meta (dict[str, Any]) – The metadata fields to update. These will be merged with existing metadata.

Returns:

int – The number of documents updated.

FilesExpand file tree

weaviate.md

Latest commit

History

weaviate.md

File metadata and controls

haystack_integrations.components.retrievers.weaviate.bm25_retriever

WeaviateBM25Retriever

init

to_dict

from_dict

run

run_async

haystack_integrations.components.retrievers.weaviate.embedding_retriever

WeaviateEmbeddingRetriever

init

to_dict

from_dict

run

run_async

haystack_integrations.components.retrievers.weaviate.hybrid_retriever

WeaviateHybridRetriever

init

to_dict

from_dict

run

run_async

haystack_integrations.document_stores.weaviate.auth

SupportedAuthTypes

from_class

AuthCredentials

to_dict

from_dict

resolve_value

AuthApiKey

resolve_value

AuthBearerToken

resolve_value

AuthClientCredentials

resolve_value

AuthClientPassword

resolve_value

haystack_integrations.document_stores.weaviate.document_store

WeaviateDocumentStore

init

client

async_client

collection

async_collection

close

close_async

to_dict

from_dict

count_documents

count_documents_async

count_documents_by_filter

count_documents_by_filter_async

get_metadata_fields_info

get_metadata_fields_info_async

get_metadata_field_min_max

get_metadata_field_min_max_async

count_unique_metadata_by_filter

count_unique_metadata_by_filter_async

get_metadata_field_unique_values

get_metadata_field_unique_values_async

filter_documents

filter_documents_async

write_documents

write_documents_async

delete_documents

delete_documents_async

delete_all_documents

delete_all_documents_async

delete_by_filter

delete_by_filter_async

update_by_filter

update_by_filter_async