Skip to content

Commit 364bc68

Browse files
bogdankosticgithub-actions[bot]
authored andcommitted
Sync Core Integrations API reference (vespa) on Docusaurus
1 parent d230b15 commit 364bc68

13 files changed

Lines changed: 4667 additions & 0 deletions

File tree

Lines changed: 359 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,359 @@
1+
---
2+
title: "Vespa"
3+
id: integrations-vespa
4+
description: "Vespa integration for Haystack"
5+
slug: "/integrations-vespa"
6+
---
7+
8+
9+
## haystack_integrations.components.retrievers.vespa.embedding_retriever
10+
11+
### VespaEmbeddingRetriever
12+
13+
Retrieve documents from Vespa using dense vector similarity.
14+
15+
#### __init__
16+
17+
```python
18+
__init__(
19+
*,
20+
document_store: VespaDocumentStore,
21+
filters: dict[str, Any] | None = None,
22+
top_k: int = 10,
23+
ranking: str | None = DEFAULT_SEMANTIC_RANKING,
24+
query_tensor_name: str = "query_embedding",
25+
target_hits: int | None = None
26+
) -> None
27+
```
28+
29+
Create a Vespa embedding retriever.
30+
31+
**Parameters:**
32+
33+
- **document_store** (<code>VespaDocumentStore</code>) – Configured `VespaDocumentStore` for your application, for example
34+
`VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")` aligned with your
35+
Vespa schema. See https://docs.vespa.ai/en/basics/documents.html and the integration package README.
36+
- **filters** (<code>dict\[str, Any\] | None</code>) – Optional static Haystack metadata filters unless overridden in :meth:`run`, for example
37+
`{"field": "meta.category", "operator": "==", "value": "news"}`. See
38+
https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html.
39+
- **top_k** (<code>int</code>) – Default maximum number of documents to return per query (for example `10`).
40+
- **ranking** (<code>str | None</code>) – Vespa rank profile used after nearest-neighbor retrieval, for example `semantic` for a
41+
profile that scores with `closeness(field, embedding)`. Defaults to `semantic`. Pass `None` to use the
42+
schema default profile. See https://docs.vespa.ai/en/basics/ranking.html.
43+
- **query_tensor_name** (<code>str</code>) – Name of the query tensor in YQL and in `input.query(...)` in your rank profile.
44+
For example `query_embedding` matches the default `semantic` profile. See
45+
https://docs.vespa.ai/en/nearest-neighbor-search.html.
46+
- **target_hits** (<code>int | None</code>) – Optional nearest-neighbor `targetHits` value, for example `10` or `100`: how many
47+
neighbors are considered per content node before first-phase ranking. See
48+
https://docs.vespa.ai/en/nearest-neighbor-search.html.
49+
50+
**Raises:**
51+
52+
- <code>ValueError</code> – If `document_store` is not an instance of VespaDocumentStore.
53+
54+
#### run
55+
56+
```python
57+
run(
58+
query_embedding: list[float],
59+
filters: dict[str, Any] | None = None,
60+
top_k: int | None = None,
61+
) -> dict[str, list[Document]]
62+
```
63+
64+
Retrieve documents from Vespa.
65+
66+
**Parameters:**
67+
68+
- **query_embedding** (<code>list\[float\]</code>) – Dense query embedding.
69+
- **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store.
70+
- **top_k** (<code>int | None</code>) – Maximum number of documents to return.
71+
72+
**Returns:**
73+
74+
- <code>dict\[str, list\[Document\]\]</code> – Retrieved documents.
75+
76+
## haystack_integrations.components.retrievers.vespa.keyword_retriever
77+
78+
### VespaKeywordRetriever
79+
80+
Retrieve documents from Vespa using lexical search.
81+
82+
#### __init__
83+
84+
```python
85+
__init__(
86+
*,
87+
document_store: VespaDocumentStore,
88+
filters: dict[str, Any] | None = None,
89+
top_k: int = 10,
90+
ranking: str | None = DEFAULT_BM25_RANKING
91+
) -> None
92+
```
93+
94+
Create a Vespa keyword retriever.
95+
96+
**Parameters:**
97+
98+
- **document_store** (<code>VespaDocumentStore</code>) – Configured `VespaDocumentStore` for your application, for example
99+
`VespaDocumentStore(url="http://localhost", schema="doc", namespace="doc")` so it matches the deployed
100+
schema and endpoint. See https://docs.vespa.ai/en/basics/documents.html and the integration package README.
101+
- **filters** (<code>dict\[str, Any\] | None</code>) – Optional static Haystack metadata filters applied on each retrieval unless overridden in
102+
:meth:`run`, for example `{"field": "meta.category", "operator": "==", "value": "news"}`. See
103+
https://docs.haystack.deepset.ai/docs/metadata-filtering and https://docs.vespa.ai/en/query-language.html.
104+
- **top_k** (<code>int</code>) – Default maximum number of documents to return per query (for example `10`).
105+
- **ranking** (<code>str | None</code>) – Vespa rank profile for lexical matches, for example `bm25` for a profile that uses
106+
`bm25(content)`. Defaults to `bm25`. Pass `None` to use the schema default. See
107+
https://docs.vespa.ai/en/basics/ranking.html.
108+
109+
**Raises:**
110+
111+
- <code>ValueError</code> – If `document_store` is not an instance of VespaDocumentStore.
112+
113+
#### run
114+
115+
```python
116+
run(
117+
query: str, filters: dict[str, Any] | None = None, top_k: int | None = None
118+
) -> dict[str, list[Document]]
119+
```
120+
121+
Retrieve documents from Vespa.
122+
123+
**Parameters:**
124+
125+
- **query** (<code>str</code>) – Query text.
126+
- **filters** (<code>dict\[str, Any\] | None</code>) – Filters applied when fetching documents from the Document Store.
127+
- **top_k** (<code>int | None</code>) – Maximum number of documents to return.
128+
129+
**Returns:**
130+
131+
- <code>dict\[str, list\[Document\]\]</code> – Retrieved documents.
132+
133+
## haystack_integrations.document_stores.vespa.document_store
134+
135+
### VespaDocumentStore
136+
137+
Document store backed by an existing [Vespa](https://vespa.ai/) application.
138+
139+
#### __init__
140+
141+
```python
142+
__init__(
143+
*,
144+
url: str | None = None,
145+
port: int = 8080,
146+
cert: Secret | None = None,
147+
key: Secret | None = None,
148+
vespa_cloud_secret_token: Secret | None = None,
149+
additional_headers: dict[str, str] | None = None,
150+
content_cluster_name: str = "content",
151+
schema: str = "doc",
152+
namespace: str | None = None,
153+
groupname: str | None = None,
154+
content_field: str = "content",
155+
embedding_field: str = "embedding",
156+
id_field: str = "id",
157+
metadata_fields: list[str] | None = None,
158+
query_limit: int = DEFAULT_QUERY_LIMIT
159+
) -> None
160+
```
161+
162+
Create a new Vespa document store.
163+
164+
**Parameters:**
165+
166+
- **url** (<code>str | None</code>) – Vespa endpoint base URL. If omitted, the `VESPA_URL` environment variable is used.
167+
- **port** (<code>int</code>) – Vespa HTTP port.
168+
- **cert** (<code>Secret | None</code>) – Secret resolving to the data plane certificate file path for mTLS authentication.
169+
- **key** (<code>Secret | None</code>) – Secret resolving to the data plane key file path for mTLS authentication.
170+
- **vespa_cloud_secret_token** (<code>Secret | None</code>) – Vespa Cloud data plane secret token for token authentication.
171+
If omitted, the `VESPA_CLOUD_SECRET_TOKEN` environment variable is used when set, matching pyvespa.
172+
- **additional_headers** (<code>dict\[str, str\] | None</code>) – Additional headers to send to the Vespa application.
173+
- **content_cluster_name** (<code>str</code>) – Vespa content cluster name.
174+
- **schema** (<code>str</code>) – Vespa schema name to read from and write to.
175+
- **namespace** (<code>str | None</code>) – Vespa namespace. Defaults to the schema name when omitted.
176+
- **groupname** (<code>str | None</code>) – Optional Vespa group name.
177+
- **content_field** (<code>str</code>) – Vespa field containing the document text.
178+
- **embedding_field** (<code>str</code>) – Vespa field containing the dense embedding.
179+
- **id_field** (<code>str</code>) – Optional Vespa field containing the document id in query responses.
180+
Vespa document IDs are always written via `data_id`. If this field is missing in the
181+
schema or summaries, the integration falls back to parsing the Vespa document path.
182+
- **metadata_fields** (<code>list\[str\] | None</code>) – Optional allowlist of metadata fields to feed and return.
183+
- **query_limit** (<code>int</code>) – Maximum number of documents returned by bulk queries. Defaults to 400 to
184+
stay within Vespa's common query hit limit unless explicitly overridden.
185+
186+
#### app
187+
188+
```python
189+
app: Any
190+
```
191+
192+
Return the underlying `pyvespa` `Vespa` HTTP client.
193+
194+
It is built from this store's `url`, `port`, and authentication settings
195+
(`cert`, `key`, `vespa_cloud_secret_token`, `additional_headers`) so mTLS, bearer token,
196+
and custom headers from the constructor (or environment) are applied.
197+
198+
#### to_dict
199+
200+
```python
201+
to_dict() -> dict[str, Any]
202+
```
203+
204+
Serialize the document store to a dictionary.
205+
206+
Uses the same init-parameter names as :meth:`__init__` and `default_to_dict` so nested serialization stays
207+
aligned with Haystack's default component serialization.
208+
209+
**Returns:**
210+
211+
- <code>dict\[str, Any\]</code> – Serialized document store data.
212+
213+
#### count_documents
214+
215+
```python
216+
count_documents() -> int
217+
```
218+
219+
Return the total number of documents in Vespa.
220+
221+
**Returns:**
222+
223+
- <code>int</code> – Document count.
224+
225+
#### count_documents_by_filter
226+
227+
```python
228+
count_documents_by_filter(filters: dict[str, Any]) -> int
229+
```
230+
231+
Return the number of documents matching the provided filters.
232+
233+
**Parameters:**
234+
235+
- **filters** (<code>dict\[str, Any\]</code>) – Haystack metadata filters.
236+
237+
**Returns:**
238+
239+
- <code>int</code> – Count of matching documents.
240+
241+
#### write_documents
242+
243+
```python
244+
write_documents(
245+
documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE
246+
) -> int
247+
```
248+
249+
Write documents to Vespa.
250+
251+
**Parameters:**
252+
253+
- **documents** (<code>list\[Document\]</code>) – Documents to store.
254+
- **policy** (<code>DuplicatePolicy</code>) – Duplicate handling policy.
255+
256+
**Returns:**
257+
258+
- <code>int</code> – Number of documents written.
259+
260+
#### delete_documents
261+
262+
```python
263+
delete_documents(document_ids: list[str]) -> None
264+
```
265+
266+
Delete documents by id.
267+
268+
**Parameters:**
269+
270+
- **document_ids** (<code>list\[str\]</code>) – Document ids to delete.
271+
272+
#### delete_all_documents
273+
274+
```python
275+
delete_all_documents() -> None
276+
```
277+
278+
Delete all documents for this store's schema, namespace, and content cluster.
279+
280+
Implemented with pyvespa `Vespa.delete_all_docs` (Document V1 bulk delete).
281+
282+
#### delete_by_filter
283+
284+
```python
285+
delete_by_filter(filters: dict[str, Any]) -> int
286+
```
287+
288+
Delete all documents matching the provided filters.
289+
290+
**Parameters:**
291+
292+
- **filters** (<code>dict\[str, Any\]</code>) – Haystack metadata filters.
293+
294+
**Returns:**
295+
296+
- <code>int</code> – Number of deleted documents.
297+
298+
#### update_by_filter
299+
300+
```python
301+
update_by_filter(filters: dict[str, Any], meta: dict[str, Any]) -> int
302+
```
303+
304+
Update metadata fields for documents matching the provided filters.
305+
306+
**Parameters:**
307+
308+
- **filters** (<code>dict\[str, Any\]</code>) – Haystack metadata filters.
309+
- **meta** (<code>dict\[str, Any\]</code>) – Metadata values to merge into the matched documents.
310+
311+
**Returns:**
312+
313+
- <code>int</code> – Number of updated documents.
314+
315+
#### get_documents_by_id
316+
317+
```python
318+
get_documents_by_id(document_ids: list[str]) -> list[Document]
319+
```
320+
321+
Retrieve documents by their ids.
322+
323+
**Parameters:**
324+
325+
- **document_ids** (<code>list\[str\]</code>) – Document ids to fetch.
326+
327+
**Returns:**
328+
329+
- <code>list\[Document\]</code> – Matching documents.
330+
331+
#### filter_documents
332+
333+
```python
334+
filter_documents(filters: dict[str, Any] | None = None) -> list[Document]
335+
```
336+
337+
Retrieve documents matching the provided filters.
338+
339+
**Parameters:**
340+
341+
- **filters** (<code>dict\[str, Any\] | None</code>) – Haystack metadata filters.
342+
343+
**Returns:**
344+
345+
- <code>list\[Document\]</code> – Matching documents.
346+
347+
#### get_metadata_fields_info
348+
349+
```python
350+
get_metadata_fields_info() -> dict[str, dict[str, str]]
351+
```
352+
353+
Return best-effort metadata field information based on configured fields.
354+
355+
**Returns:**
356+
357+
- <code>dict\[str, dict\[str, str\]\]</code> – Field metadata information.
358+
359+
## haystack_integrations.document_stores.vespa.filters

0 commit comments

Comments
 (0)