haystack/docs-website/reference/integrations-api/vespa.md at 364bc68340f0200ee66f49fa8ea6d62d28edce36 · deepset-ai/haystack

title	Vespa
id	integrations-vespa
description	Vespa integration for Haystack
slug	/integrations-vespa

haystack_integrations.components.retrievers.vespa.embedding_retriever

VespaEmbeddingRetriever

Retrieve documents from Vespa using dense vector similarity.

init

__init__(
    *,
    document_store: VespaDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    ranking: str | None = DEFAULT_SEMANTIC_RANKING,
    query_tensor_name: str = "query_embedding",
    target_hits: int | None = None
) -> None

Create a Vespa embedding retriever.

Parameters:

document_store (VespaDocumentStore) – Configured VespaDocumentStore for your application, for example VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc") aligned with your Vespa schema. See https://docs.vespa.ai/en/basics/documents.html and the integration package README.
filters (dict[str, Any] | None) – Optional static Haystack metadata filters unless overridden in :meth:run, for example {"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html.
top_k (int) – Default maximum number of documents to return per query (for example 10).
ranking (str | None) – Vespa rank profile used after nearest-neighbor retrieval, for example semantic for a profile that scores with closeness(field, embedding). Defaults to semantic. Pass None to use the schema default profile. See https://docs.vespa.ai/en/basics/ranking.html.
query_tensor_name (str) – Name of the query tensor in YQL and in input.query(...) in your rank profile. For example query_embedding matches the default semantic profile. See https://docs.vespa.ai/en/nearest-neighbor-search.html.
target_hits (int | None) – Optional nearest-neighbor targetHits value, for example 10 or 100: how many neighbors are considered per content node before first-phase ranking. See https://docs.vespa.ai/en/nearest-neighbor-search.html.

Raises:

ValueError – If document_store is not an instance of VespaDocumentStore.

run

run(
    query_embedding: list[float],
    filters: dict[str, Any] | None = None,
    top_k: int | None = None,
) -> dict[str, list[Document]]

Retrieve documents from Vespa.

Parameters:

query_embedding (list[float]) – Dense query embedding.
filters (dict[str, Any] | None) – Filters applied when fetching documents from the Document Store.
top_k (int | None) – Maximum number of documents to return.

Returns:

dict[str, list[Document]] – Retrieved documents.

haystack_integrations.components.retrievers.vespa.keyword_retriever

VespaKeywordRetriever

Retrieve documents from Vespa using lexical search.

init

__init__(
    *,
    document_store: VespaDocumentStore,
    filters: dict[str, Any] | None = None,
    top_k: int = 10,
    ranking: str | None = DEFAULT_BM25_RANKING
) -> None

Create a Vespa keyword retriever.

Parameters:

document_store (VespaDocumentStore) – Configured VespaDocumentStore for your application, for example VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc") so it matches the deployed schema and endpoint. See https://docs.vespa.ai/en/basics/documents.html and the integration package README.
filters (dict[str, Any] | None) – Optional static Haystack metadata filters applied on each retrieval unless overridden in :meth:run, for example {"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html.
top_k (int) – Default maximum number of documents to return per query (for example 10).
ranking (str | None) – Vespa rank profile for lexical matches, for example bm25 for a profile that uses bm25(content). Defaults to bm25. Pass None to use the schema default. See https://docs.vespa.ai/en/basics/ranking.html.

Raises:

ValueError – If document_store is not an instance of VespaDocumentStore.

run

run(
    query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
) -> dict[str, list[Document]]

Retrieve documents from Vespa.

Parameters:

query (str) – Query text.
filters (dict[str, Any] | None) – Filters applied when fetching documents from the Document Store.
top_k (int | None) – Maximum number of documents to return.

Returns:

dict[str, list[Document]] – Retrieved documents.

haystack_integrations.document_stores.vespa.document_store

VespaDocumentStore

Document store backed by an existing Vespa application.

init

__init__(
    *,
    url: str | None = None,
    port: int = 8080,
    cert: Secret | None = None,
    key: Secret | None = None,
    vespa_cloud_secret_token: Secret | None = None,
    additional_headers: dict[str, str] | None = None,
    content_cluster_name: str = "content",
    schema: str = "doc",
    namespace: str | None = None,
    groupname: str | None = None,
    content_field: str = "content",
    embedding_field: str = "embedding",
    id_field: str = "id",
    metadata_fields: list[str] | None = None,
    query_limit: int = DEFAULT_QUERY_LIMIT
) -> None

Create a new Vespa document store.

Parameters:

url (str | None) – Vespa endpoint base URL. If omitted, the VESPA_URL environment variable is used.
port (int) – Vespa HTTP port.
cert (Secret | None) – Secret resolving to the data plane certificate file path for mTLS authentication.
key (Secret | None) – Secret resolving to the data plane key file path for mTLS authentication.
vespa_cloud_secret_token (Secret | None) – Vespa Cloud data plane secret token for token authentication. If omitted, the VESPA_CLOUD_SECRET_TOKEN environment variable is used when set, matching pyvespa.
additional_headers (dict[str, str] | None) – Additional headers to send to the Vespa application.
content_cluster_name (str) – Vespa content cluster name.
schema (str) – Vespa schema name to read from and write to.
namespace (str | None) – Vespa namespace. Defaults to the schema name when omitted.
groupname (str | None) – Optional Vespa group name.
content_field (str) – Vespa field containing the document text.
embedding_field (str) – Vespa field containing the dense embedding.
id_field (str) – Optional Vespa field containing the document id in query responses. Vespa document IDs are always written via data_id. If this field is missing in the schema or summaries, the integration falls back to parsing the Vespa document path.
metadata_fields (list[str] | None) – Optional allowlist of metadata fields to feed and return.
query_limit (int) – Maximum number of documents returned by bulk queries. Defaults to 400 to stay within Vespa's common query hit limit unless explicitly overridden.

app

app: Any

Return the underlying pyvespa Vespa HTTP client.

It is built from this store's url, port, and authentication settings (cert, key, vespa_cloud_secret_token, additional_headers) so mTLS, bearer token, and custom headers from the constructor (or environment) are applied.

to_dict

to_dict() -> dict[str, Any]

Serialize the document store to a dictionary.

Uses the same init-parameter names as :meth:__init__ and default_to_dict so nested serialization stays aligned with Haystack's default component serialization.

Returns:

dict[str, Any] – Serialized document store data.

count_documents

count_documents() -> int

Return the total number of documents in Vespa.

Returns:

int – Document count.

count_documents_by_filter

count_documents_by_filter(filters: dict[str, Any]) -> int

Return the number of documents matching the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack metadata filters.

Returns:

int – Count of matching documents.

write_documents

write_documents(
    documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> int

Write documents to Vespa.

Parameters:

documents (list[Document]) – Documents to store.
policy (DuplicatePolicy) – Duplicate handling policy.

Returns:

int – Number of documents written.

delete_documents

delete_documents(document_ids: list[str]) -> None

Delete documents by id.

Parameters:

document_ids (list[str]) – Document ids to delete.

delete_all_documents

delete_all_documents() -> None

Delete all documents for this store's schema, namespace, and content cluster.

Implemented with pyvespa Vespa.delete_all_docs (Document V1 bulk delete).

delete_by_filter

delete_by_filter(filters: dict[str, Any]) -> int

Delete all documents matching the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack metadata filters.

Returns:

int – Number of deleted documents.

update_by_filter

update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int

Update metadata fields for documents matching the provided filters.

Parameters:

filters (dict[str, Any]) – Haystack metadata filters.
meta (dict[str, Any]) – Metadata values to merge into the matched documents.

Returns:

int – Number of updated documents.

get_documents_by_id

get_documents_by_id(document_ids: list[str]) -> list[Document]

Retrieve documents by their ids.

Parameters:

document_ids (list[str]) – Document ids to fetch.

Returns:

list[Document] – Matching documents.

filter_documents

filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Retrieve documents matching the provided filters.

Parameters:

filters (dict[str, Any] | None) – Haystack metadata filters.

Returns:

list[Document] – Matching documents.

get_metadata_fields_info

get_metadata_fields_info() -> dict[str, dict[str, str]]

Return best-effort metadata field information based on configured fields.

Returns:

dict[str, dict[str, str]] – Field metadata information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

haystack_integrations.components.retrievers.vespa.embedding_retriever

VespaEmbeddingRetriever

init

run

haystack_integrations.components.retrievers.vespa.keyword_retriever

VespaKeywordRetriever

init

run

haystack_integrations.document_stores.vespa.document_store

VespaDocumentStore

init

app

to_dict

count_documents

count_documents_by_filter

write_documents

delete_documents

delete_all_documents

delete_by_filter

update_by_filter

get_documents_by_id

filter_documents

get_metadata_fields_info

haystack_integrations.document_stores.vespa.filters

FilesExpand file tree

vespa.md

Latest commit

History

vespa.md

File metadata and controls

haystack_integrations.components.retrievers.vespa.embedding_retriever

VespaEmbeddingRetriever

init

run

haystack_integrations.components.retrievers.vespa.keyword_retriever

VespaKeywordRetriever

init

run

haystack_integrations.document_stores.vespa.document_store

VespaDocumentStore

init

app

to_dict

count_documents

count_documents_by_filter

write_documents

delete_documents

delete_all_documents

delete_by_filter

update_by_filter

get_documents_by_id

filter_documents

get_metadata_fields_info

haystack_integrations.document_stores.vespa.filters