| title | Vespa |
|---|---|
| id | integrations-vespa |
| description | Vespa integration for Haystack |
| slug | /integrations-vespa |
Retrieve documents from Vespa using dense vector similarity.
__init__(
*,
document_store: VespaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
ranking: str | None = DEFAULT_SEMANTIC_RANKING,
query_tensor_name: str = "query_embedding",
target_hits: int | None = None
) -> NoneCreate a Vespa embedding retriever.
Parameters:
- document_store (
VespaDocumentStore) – ConfiguredVespaDocumentStorefor your application, for exampleVespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")aligned with your Vespa schema. See https://docs.vespa.ai/en/basics/documents.html and the integration package README. - filters (
dict[str, Any] | None) – Optional static Haystack metadata filters unless overridden in :meth:run, for example{"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html. - top_k (
int) – Default maximum number of documents to return per query (for example10). - ranking (
str | None) – Vespa rank profile used after nearest-neighbor retrieval, for examplesemanticfor a profile that scores withcloseness(field, embedding). Defaults tosemantic. PassNoneto use the schema default profile. See https://docs.vespa.ai/en/basics/ranking.html. - query_tensor_name (
str) – Name of the query tensor in YQL and ininput.query(...)in your rank profile. For examplequery_embeddingmatches the defaultsemanticprofile. See https://docs.vespa.ai/en/nearest-neighbor-search.html. - target_hits (
int | None) – Optional nearest-neighbortargetHitsvalue, for example10or100: how many neighbors are considered per content node before first-phase ranking. See https://docs.vespa.ai/en/nearest-neighbor-search.html.
Raises:
ValueError– Ifdocument_storeis not an instance of VespaDocumentStore.
run(
query_embedding: list[float],
filters: dict[str, Any] | None = None,
top_k: int | None = None,
) -> dict[str, list[Document]]Retrieve documents from Vespa.
Parameters:
- query_embedding (
list[float]) – Dense query embedding. - filters (
dict[str, Any] | None) – Filters applied when fetching documents from the Document Store. - top_k (
int | None) – Maximum number of documents to return.
Returns:
dict[str, list[Document]]– Retrieved documents.
Retrieve documents from Vespa using lexical search.
__init__(
*,
document_store: VespaDocumentStore,
filters: dict[str, Any] | None = None,
top_k: int = 10,
ranking: str | None = DEFAULT_BM25_RANKING
) -> NoneCreate a Vespa keyword retriever.
Parameters:
- document_store (
VespaDocumentStore) – ConfiguredVespaDocumentStorefor your application, for exampleVespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")so it matches the deployed schema and endpoint. See https://docs.vespa.ai/en/basics/documents.html and the integration package README. - filters (
dict[str, Any] | None) – Optional static Haystack metadata filters applied on each retrieval unless overridden in :meth:run, for example{"field": "meta.category", "operator": "==", "value": "news"}. See https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html. - top_k (
int) – Default maximum number of documents to return per query (for example10). - ranking (
str | None) – Vespa rank profile for lexical matches, for examplebm25for a profile that usesbm25(content). Defaults tobm25. PassNoneto use the schema default. See https://docs.vespa.ai/en/basics/ranking.html.
Raises:
ValueError– Ifdocument_storeis not an instance of VespaDocumentStore.
run(
query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
) -> dict[str, list[Document]]Retrieve documents from Vespa.
Parameters:
- query (
str) – Query text. - filters (
dict[str, Any] | None) – Filters applied when fetching documents from the Document Store. - top_k (
int | None) – Maximum number of documents to return.
Returns:
dict[str, list[Document]]– Retrieved documents.
Document store backed by an existing Vespa application.
__init__(
*,
url: str | None = None,
port: int = 8080,
cert: Secret | None = None,
key: Secret | None = None,
vespa_cloud_secret_token: Secret | None = None,
additional_headers: dict[str, str] | None = None,
content_cluster_name: str = "content",
schema: str = "doc",
namespace: str | None = None,
groupname: str | None = None,
content_field: str = "content",
embedding_field: str = "embedding",
id_field: str = "id",
metadata_fields: list[str] | None = None,
query_limit: int = DEFAULT_QUERY_LIMIT
) -> NoneCreate a new Vespa document store.
Parameters:
- url (
str | None) – Vespa endpoint base URL. If omitted, theVESPA_URLenvironment variable is used. - port (
int) – Vespa HTTP port. - cert (
Secret | None) – Secret resolving to the data plane certificate file path for mTLS authentication. - key (
Secret | None) – Secret resolving to the data plane key file path for mTLS authentication. - vespa_cloud_secret_token (
Secret | None) – Vespa Cloud data plane secret token for token authentication. If omitted, theVESPA_CLOUD_SECRET_TOKENenvironment variable is used when set, matching pyvespa. - additional_headers (
dict[str, str] | None) – Additional headers to send to the Vespa application. - content_cluster_name (
str) – Vespa content cluster name. - schema (
str) – Vespa schema name to read from and write to. - namespace (
str | None) – Vespa namespace. Defaults to the schema name when omitted. - groupname (
str | None) – Optional Vespa group name. - content_field (
str) – Vespa field containing the document text. - embedding_field (
str) – Vespa field containing the dense embedding. - id_field (
str) – Optional Vespa field containing the document id in query responses. Vespa document IDs are always written viadata_id. If this field is missing in the schema or summaries, the integration falls back to parsing the Vespa document path. - metadata_fields (
list[str] | None) – Optional allowlist of metadata fields to feed and return. - query_limit (
int) – Maximum number of documents returned by bulk queries. Defaults to 400 to stay within Vespa's common query hit limit unless explicitly overridden.
app: AnyReturn the underlying pyvespa Vespa HTTP client.
It is built from this store's url, port, and authentication settings
(cert, key, vespa_cloud_secret_token, additional_headers) so mTLS, bearer token,
and custom headers from the constructor (or environment) are applied.
to_dict() -> dict[str, Any]Serialize the document store to a dictionary.
Uses the same init-parameter names as :meth:__init__ and default_to_dict so nested serialization stays
aligned with Haystack's default component serialization.
Returns:
dict[str, Any]– Serialized document store data.
count_documents() -> intReturn the total number of documents in Vespa.
Returns:
int– Document count.
count_documents_by_filter(filters: dict[str, Any]) -> intReturn the number of documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack metadata filters.
Returns:
int– Count of matching documents.
write_documents(
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
) -> intWrite documents to Vespa.
Parameters:
- documents (
list[Document]) – Documents to store. - policy (
DuplicatePolicy) – Duplicate handling policy.
Returns:
int– Number of documents written.
delete_documents(document_ids: list[str]) -> NoneDelete documents by id.
Parameters:
- document_ids (
list[str]) – Document ids to delete.
delete_all_documents() -> NoneDelete all documents for this store's schema, namespace, and content cluster.
Implemented with pyvespa Vespa.delete_all_docs (Document V1 bulk delete).
delete_by_filter(filters: dict[str, Any]) -> intDelete all documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack metadata filters.
Returns:
int– Number of deleted documents.
update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> intUpdate metadata fields for documents matching the provided filters.
Parameters:
- filters (
dict[str, Any]) – Haystack metadata filters. - meta (
dict[str, Any]) – Metadata values to merge into the matched documents.
Returns:
int– Number of updated documents.
get_documents_by_id(document_ids: list[str]) -> list[Document]Retrieve documents by their ids.
Parameters:
- document_ids (
list[str]) – Document ids to fetch.
Returns:
list[Document]– Matching documents.
filter_documents(filters: dict[str, Any] | None = None) -> list[Document]Retrieve documents matching the provided filters.
Parameters:
- filters (
dict[str, Any] | None) – Haystack metadata filters.
Returns:
list[Document]– Matching documents.
get_metadata_fields_info() -> dict[str, dict[str, str]]Return best-effort metadata field information based on configured fields.
Returns:
dict[str, dict[str, str]]– Field metadata information.