|
| 1 | +--- |
| 2 | +title: "Vespa" |
| 3 | +id: integrations-vespa |
| 4 | +description: "Vespa integration for Haystack" |
| 5 | +slug: "/integrations-vespa" |
| 6 | +--- |
| 7 | + |
| 8 | + |
| 9 | +## haystack_integrations.components.retrievers.vespa.embedding_retriever |
| 10 | + |
| 11 | +### VespaEmbeddingRetriever |
| 12 | + |
| 13 | +Retrieve documents from Vespa using dense vector similarity. |
| 14 | + |
| 15 | +#### __init__ |
| 16 | + |
| 17 | +```python |
| 18 | +__init__( |
| 19 | + *, |
| 20 | + document_store: VespaDocumentStore, |
| 21 | + filters: dict[str, Any] | None = None, |
| 22 | + top_k: int = 10, |
| 23 | + ranking: str | None = DEFAULT_SEMANTIC_RANKING, |
| 24 | + query_tensor_name: str = "query_embedding", |
| 25 | + target_hits: int | None = None |
| 26 | +) -> None |
| 27 | +``` |
| 28 | + |
| 29 | +Create a Vespa embedding retriever. |
| 30 | + |
| 31 | +**Parameters:** |
| 32 | + |
| 33 | +- **document_store** (<code>VespaDocumentStore</code>) – Configured `VespaDocumentStore` for your application, for example |
| 34 | + `VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")` aligned with your |
| 35 | + Vespa schema. See https://docs.vespa.ai/en/basics/documents.html and the integration package README. |
| 36 | +- **filters** (<code>dict\[str, Any\] | None</code>) – Optional static Haystack metadata filters unless overridden in :meth:`run`, for example |
| 37 | + `{"field": "meta.category", "operator": "==", "value": "news"}`. See |
| 38 | + https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html. |
| 39 | +- **top_k** (<code>int</code>) – Default maximum number of documents to return per query (for example `10`). |
| 40 | +- **ranking** (<code>str | None</code>) – Vespa rank profile used after nearest-neighbor retrieval, for example `semantic` for a |
| 41 | + profile that scores with `closeness(field, embedding)`. Defaults to `semantic`. Pass `None` to use the |
| 42 | + schema default profile. See https://docs.vespa.ai/en/basics/ranking.html. |
| 43 | +- **query_tensor_name** (<code>str</code>) – Name of the query tensor in YQL and in `input.query(...)` in your rank profile. |
| 44 | + For example `query_embedding` matches the default `semantic` profile. See |
| 45 | + https://docs.vespa.ai/en/nearest-neighbor-search.html. |
| 46 | +- **target_hits** (<code>int | None</code>) – Optional nearest-neighbor `targetHits` value, for example `10` or `100`: how many |
| 47 | + neighbors are considered per content node before first-phase ranking. See |
| 48 | + https://docs.vespa.ai/en/nearest-neighbor-search.html. |
| 49 | + |
| 50 | +**Raises:** |
| 51 | + |
| 52 | +- <code>ValueError</code> – If `document_store` is not an instance of VespaDocumentStore. |
| 53 | + |
| 54 | +#### run |
| 55 | + |
| 56 | +```python |
| 57 | +run( |
| 58 | + query_embedding: list[float], |
| 59 | + filters: dict[str, Any] | None = None, |
| 60 | + top_k: int | None = None, |
| 61 | +) -> dict[str, list[Document]] |
| 62 | +``` |
| 63 | + |
| 64 | +Retrieve documents from Vespa. |
| 65 | + |
| 66 | +**Parameters:** |
| 67 | + |
| 68 | +- **query_embedding** (<code>list\[float\]</code>) – Dense query embedding. |
| 69 | +- **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store. |
| 70 | +- **top_k** (<code>int | None</code>) – Maximum number of documents to return. |
| 71 | + |
| 72 | +**Returns:** |
| 73 | + |
| 74 | +- <code>dict\[str, list\[Document\]\]</code> – Retrieved documents. |
| 75 | + |
| 76 | +## haystack_integrations.components.retrievers.vespa.keyword_retriever |
| 77 | + |
| 78 | +### VespaKeywordRetriever |
| 79 | + |
| 80 | +Retrieve documents from Vespa using lexical search. |
| 81 | + |
| 82 | +#### __init__ |
| 83 | + |
| 84 | +```python |
| 85 | +__init__( |
| 86 | + *, |
| 87 | + document_store: VespaDocumentStore, |
| 88 | + filters: dict[str, Any] | None = None, |
| 89 | + top_k: int = 10, |
| 90 | + ranking: str | None = DEFAULT_BM25_RANKING |
| 91 | +) -> None |
| 92 | +``` |
| 93 | + |
| 94 | +Create a Vespa keyword retriever. |
| 95 | + |
| 96 | +**Parameters:** |
| 97 | + |
| 98 | +- **document_store** (<code>VespaDocumentStore</code>) – Configured `VespaDocumentStore` for your application, for example |
| 99 | + `VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")` so it matches the deployed |
| 100 | + schema and endpoint. See https://docs.vespa.ai/en/basics/documents.html and the integration package README. |
| 101 | +- **filters** (<code>dict\[str, Any\] | None</code>) – Optional static Haystack metadata filters applied on each retrieval unless overridden in |
| 102 | + :meth:`run`, for example `{"field": "meta.category", "operator": "==", "value": "news"}`. See |
| 103 | + https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html. |
| 104 | +- **top_k** (<code>int</code>) – Default maximum number of documents to return per query (for example `10`). |
| 105 | +- **ranking** (<code>str | None</code>) – Vespa rank profile for lexical matches, for example `bm25` for a profile that uses |
| 106 | + `bm25(content)`. Defaults to `bm25`. Pass `None` to use the schema default. See |
| 107 | + https://docs.vespa.ai/en/basics/ranking.html. |
| 108 | + |
| 109 | +**Raises:** |
| 110 | + |
| 111 | +- <code>ValueError</code> – If `document_store` is not an instance of VespaDocumentStore. |
| 112 | + |
| 113 | +#### run |
| 114 | + |
| 115 | +```python |
| 116 | +run( |
| 117 | + query: str, filters: dict[str, Any] | None = None, top_k: int | None = None |
| 118 | +) -> dict[str, list[Document]] |
| 119 | +``` |
| 120 | + |
| 121 | +Retrieve documents from Vespa. |
| 122 | + |
| 123 | +**Parameters:** |
| 124 | + |
| 125 | +- **query** (<code>str</code>) – Query text. |
| 126 | +- **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store. |
| 127 | +- **top_k** (<code>int | None</code>) – Maximum number of documents to return. |
| 128 | + |
| 129 | +**Returns:** |
| 130 | + |
| 131 | +- <code>dict\[str, list\[Document\]\]</code> – Retrieved documents. |
| 132 | + |
| 133 | +## haystack_integrations.document_stores.vespa.document_store |
| 134 | + |
| 135 | +### VespaDocumentStore |
| 136 | + |
| 137 | +Document store backed by an existing [Vespa](https://vespa.ai/) application. |
| 138 | + |
| 139 | +#### __init__ |
| 140 | + |
| 141 | +```python |
| 142 | +__init__( |
| 143 | + *, |
| 144 | + url: str | None = None, |
| 145 | + port: int = 8080, |
| 146 | + cert: Secret | None = None, |
| 147 | + key: Secret | None = None, |
| 148 | + vespa_cloud_secret_token: Secret | None = None, |
| 149 | + additional_headers: dict[str, str] | None = None, |
| 150 | + content_cluster_name: str = "content", |
| 151 | + schema: str = "doc", |
| 152 | + namespace: str | None = None, |
| 153 | + groupname: str | None = None, |
| 154 | + content_field: str = "content", |
| 155 | + embedding_field: str = "embedding", |
| 156 | + id_field: str = "id", |
| 157 | + metadata_fields: list[str] | None = None, |
| 158 | + query_limit: int = DEFAULT_QUERY_LIMIT |
| 159 | +) -> None |
| 160 | +``` |
| 161 | + |
| 162 | +Create a new Vespa document store. |
| 163 | + |
| 164 | +**Parameters:** |
| 165 | + |
| 166 | +- **url** (<code>str | None</code>) – Vespa endpoint base URL. If omitted, the `VESPA_URL` environment variable is used. |
| 167 | +- **port** (<code>int</code>) – Vespa HTTP port. |
| 168 | +- **cert** (<code>Secret | None</code>) – Secret resolving to the data plane certificate file path for mTLS authentication. |
| 169 | +- **key** (<code>Secret | None</code>) – Secret resolving to the data plane key file path for mTLS authentication. |
| 170 | +- **vespa_cloud_secret_token** (<code>Secret | None</code>) – Vespa Cloud data plane secret token for token authentication. |
| 171 | + If omitted, the `VESPA_CLOUD_SECRET_TOKEN` environment variable is used when set, matching pyvespa. |
| 172 | +- **additional_headers** (<code>dict\[str, str\] | None</code>) – Additional headers to send to the Vespa application. |
| 173 | +- **content_cluster_name** (<code>str</code>) – Vespa content cluster name. |
| 174 | +- **schema** (<code>str</code>) – Vespa schema name to read from and write to. |
| 175 | +- **namespace** (<code>str | None</code>) – Vespa namespace. Defaults to the schema name when omitted. |
| 176 | +- **groupname** (<code>str | None</code>) – Optional Vespa group name. |
| 177 | +- **content_field** (<code>str</code>) – Vespa field containing the document text. |
| 178 | +- **embedding_field** (<code>str</code>) – Vespa field containing the dense embedding. |
| 179 | +- **id_field** (<code>str</code>) – Optional Vespa field containing the document id in query responses. |
| 180 | + Vespa document IDs are always written via `data_id`. If this field is missing in the |
| 181 | + schema or summaries, the integration falls back to parsing the Vespa document path. |
| 182 | +- **metadata_fields** (<code>list\[str\] | None</code>) – Optional allowlist of metadata fields to feed and return. |
| 183 | +- **query_limit** (<code>int</code>) – Maximum number of documents returned by bulk queries. Defaults to 400 to |
| 184 | + stay within Vespa's common query hit limit unless explicitly overridden. |
| 185 | + |
| 186 | +#### app |
| 187 | + |
| 188 | +```python |
| 189 | +app: Any |
| 190 | +``` |
| 191 | + |
| 192 | +Return the underlying `pyvespa` `Vespa` HTTP client. |
| 193 | + |
| 194 | +It is built from this store's `url`, `port`, and authentication settings |
| 195 | +(`cert`, `key`, `vespa_cloud_secret_token`, `additional_headers`) so mTLS, bearer token, |
| 196 | +and custom headers from the constructor (or environment) are applied. |
| 197 | + |
| 198 | +#### to_dict |
| 199 | + |
| 200 | +```python |
| 201 | +to_dict() -> dict[str, Any] |
| 202 | +``` |
| 203 | + |
| 204 | +Serialize the document store to a dictionary. |
| 205 | + |
| 206 | +Uses the same init-parameter names as :meth:`__init__` and `default_to_dict` so nested serialization stays |
| 207 | +aligned with Haystack's default component serialization. |
| 208 | + |
| 209 | +**Returns:** |
| 210 | + |
| 211 | +- <code>dict\[str, Any\]</code> – Serialized document store data. |
| 212 | + |
| 213 | +#### count_documents |
| 214 | + |
| 215 | +```python |
| 216 | +count_documents() -> int |
| 217 | +``` |
| 218 | + |
| 219 | +Return the total number of documents in Vespa. |
| 220 | + |
| 221 | +**Returns:** |
| 222 | + |
| 223 | +- <code>int</code> – Document count. |
| 224 | + |
| 225 | +#### count_documents_by_filter |
| 226 | + |
| 227 | +```python |
| 228 | +count_documents_by_filter(filters: dict[str, Any]) -> int |
| 229 | +``` |
| 230 | + |
| 231 | +Return the number of documents matching the provided filters. |
| 232 | + |
| 233 | +**Parameters:** |
| 234 | + |
| 235 | +- **filters** (<code>dict\[str, Any\]</code>) – Haystack metadata filters. |
| 236 | + |
| 237 | +**Returns:** |
| 238 | + |
| 239 | +- <code>int</code> – Count of matching documents. |
| 240 | + |
| 241 | +#### write_documents |
| 242 | + |
| 243 | +```python |
| 244 | +write_documents( |
| 245 | + documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE |
| 246 | +) -> int |
| 247 | +``` |
| 248 | + |
| 249 | +Write documents to Vespa. |
| 250 | + |
| 251 | +**Parameters:** |
| 252 | + |
| 253 | +- **documents** (<code>list\[Document\]</code>) – Documents to store. |
| 254 | +- **policy** (<code>DuplicatePolicy</code>) – Duplicate handling policy. |
| 255 | + |
| 256 | +**Returns:** |
| 257 | + |
| 258 | +- <code>int</code> – Number of documents written. |
| 259 | + |
| 260 | +#### delete_documents |
| 261 | + |
| 262 | +```python |
| 263 | +delete_documents(document_ids: list[str]) -> None |
| 264 | +``` |
| 265 | + |
| 266 | +Delete documents by id. |
| 267 | + |
| 268 | +**Parameters:** |
| 269 | + |
| 270 | +- **document_ids** (<code>list\[str\]</code>) – Document ids to delete. |
| 271 | + |
| 272 | +#### delete_all_documents |
| 273 | + |
| 274 | +```python |
| 275 | +delete_all_documents() -> None |
| 276 | +``` |
| 277 | + |
| 278 | +Delete all documents for this store's schema, namespace, and content cluster. |
| 279 | + |
| 280 | +Implemented with pyvespa `Vespa.delete_all_docs` (Document V1 bulk delete). |
| 281 | + |
| 282 | +#### delete_by_filter |
| 283 | + |
| 284 | +```python |
| 285 | +delete_by_filter(filters: dict[str, Any]) -> int |
| 286 | +``` |
| 287 | + |
| 288 | +Delete all documents matching the provided filters. |
| 289 | + |
| 290 | +**Parameters:** |
| 291 | + |
| 292 | +- **filters** (<code>dict\[str, Any\]</code>) – Haystack metadata filters. |
| 293 | + |
| 294 | +**Returns:** |
| 295 | + |
| 296 | +- <code>int</code> – Number of deleted documents. |
| 297 | + |
| 298 | +#### update_by_filter |
| 299 | + |
| 300 | +```python |
| 301 | +update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int |
| 302 | +``` |
| 303 | + |
| 304 | +Update metadata fields for documents matching the provided filters. |
| 305 | + |
| 306 | +**Parameters:** |
| 307 | + |
| 308 | +- **filters** (<code>dict\[str, Any\]</code>) – Haystack metadata filters. |
| 309 | +- **meta** (<code>dict\[str, Any\]</code>) – Metadata values to merge into the matched documents. |
| 310 | + |
| 311 | +**Returns:** |
| 312 | + |
| 313 | +- <code>int</code> – Number of updated documents. |
| 314 | + |
| 315 | +#### get_documents_by_id |
| 316 | + |
| 317 | +```python |
| 318 | +get_documents_by_id(document_ids: list[str]) -> list[Document] |
| 319 | +``` |
| 320 | + |
| 321 | +Retrieve documents by their ids. |
| 322 | + |
| 323 | +**Parameters:** |
| 324 | + |
| 325 | +- **document_ids** (<code>list\[str\]</code>) – Document ids to fetch. |
| 326 | + |
| 327 | +**Returns:** |
| 328 | + |
| 329 | +- <code>list\[Document\]</code> – Matching documents. |
| 330 | + |
| 331 | +#### filter_documents |
| 332 | + |
| 333 | +```python |
| 334 | +filter_documents(filters: dict[str, Any] | None = None) -> list[Document] |
| 335 | +``` |
| 336 | + |
| 337 | +Retrieve documents matching the provided filters. |
| 338 | + |
| 339 | +**Parameters:** |
| 340 | + |
| 341 | +- **filters** (<code>dict\[str, Any\] | None</code>) – Haystack metadata filters. |
| 342 | + |
| 343 | +**Returns:** |
| 344 | + |
| 345 | +- <code>list\[Document\]</code> – Matching documents. |
| 346 | + |
| 347 | +#### get_metadata_fields_info |
| 348 | + |
| 349 | +```python |
| 350 | +get_metadata_fields_info() -> dict[str, dict[str, str]] |
| 351 | +``` |
| 352 | + |
| 353 | +Return best-effort metadata field information based on configured fields. |
| 354 | + |
| 355 | +**Returns:** |
| 356 | + |
| 357 | +- <code>dict\[str, dict\[str, str\]\]</code> – Field metadata information. |
| 358 | + |
| 359 | +## haystack_integrations.document_stores.vespa.filters |
0 commit comments