Skip to content

Commit d12a8b5

Browse files
committed
docs: add service limits to component docstrings
- Class docstring: top_k cap, dimension limit, metadata limits, float32 only - write_documents: embedding required, 40KB metadata limit - _embedding_retrieval: top_k=100 cap, no embeddings in response - Retriever run: top_k=100, server-side filters, no embeddings returned
1 parent 6f82443 commit d12a8b5

2 files changed

Lines changed: 23 additions & 8 deletions

File tree

integrations/amazon_s3_vectors/src/haystack_integrations/components/retrievers/amazon_s3_vectors/embedding_retriever.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,9 +113,11 @@ def run(
113113
114114
:param query_embedding: Embedding of the query.
115115
:param filters: Filters applied to the retrieved Documents. The way runtime filters are applied depends on
116-
the ``filter_policy`` chosen at retriever initialization.
117-
:param top_k: Maximum number of Documents to return.
116+
the ``filter_policy`` chosen at retriever initialization. Filters are applied server-side during
117+
the vector search.
118+
:param top_k: Maximum number of Documents to return. S3 Vectors caps this at 100.
118119
:returns: A dictionary with key ``"documents"`` containing the retrieved Documents.
120+
Returned documents will not contain embeddings.
119121
"""
120122
filters = apply_filter_policy(self.filter_policy, self.filters, filters)
121123
top_k = top_k or self.top_k

integrations/amazon_s3_vectors/src/haystack_integrations/document_stores/amazon_s3_vectors/document_store.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,19 @@ class S3VectorsDocumentStore:
4848
"""
4949
A Document Store using [Amazon S3 Vectors](https://aws.amazon.com/s3/features/vectors/).
5050
51-
Amazon S3 Vectors provides native vector storage and similarity search within Amazon S3.
51+
Amazon S3 Vectors provides serverless vector storage and similarity search within Amazon S3.
5252
This document store stores Haystack `Document` objects as vectors with associated metadata
5353
in an S3 vector bucket and index.
5454
55+
**Service limits:**
56+
57+
- Maximum ``top_k``: 100 results per query
58+
- Maximum vector dimension: 4,096
59+
- Metadata per vector: 40 KB total, 2 KB filterable
60+
- All documents must have embeddings (``float32`` only)
61+
- Distance metrics: ``cosine`` or ``euclidean`` (set at index creation, immutable)
62+
- ``filter_documents()`` is client-side — prefer ``S3VectorsEmbeddingRetriever`` with filters
63+
5564
Usage example:
5665
```python
5766
from haystack_integrations.document_stores.amazon_s3_vectors import S3VectorsDocumentStore
@@ -229,11 +238,14 @@ def write_documents(self, documents: list[Document], policy: DuplicatePolicy = D
229238
"""
230239
Write Documents to the S3 Vectors index.
231240
232-
S3 Vectors ``put_vectors`` is an upsert operation by default, so ``DuplicatePolicy.OVERWRITE`` is the
233-
natural behavior. ``DuplicatePolicy.SKIP`` will check for existing documents first (slower).
241+
All documents must have an embedding set. S3 Vectors ``put_vectors`` is an upsert operation
242+
by default, so ``DuplicatePolicy.OVERWRITE`` is the natural behavior.
243+
``DuplicatePolicy.SKIP`` will check for existing documents first (slower).
234244
``DuplicatePolicy.NONE`` will raise an error if a document already exists.
235245
236-
:param documents: A list of Documents to write.
246+
Metadata per vector is limited to 40 KB total (2 KB filterable).
247+
248+
:param documents: A list of Documents to write. Each document must have an embedding.
237249
:param policy: The duplicate policy. Defaults to ``DuplicatePolicy.OVERWRITE``.
238250
:returns: The number of documents written.
239251
"""
@@ -372,8 +384,9 @@ def _embedding_retrieval(
372384
373385
:param query_embedding: The query embedding vector.
374386
:param filters: Optional Haystack-format metadata filters.
375-
:param top_k: Maximum number of results to return.
376-
:returns: List of Documents sorted by similarity.
387+
:param top_k: Maximum number of results to return. S3 Vectors caps this at 100.
388+
:returns: List of Documents sorted by similarity. Returned documents will not contain
389+
embeddings (S3 Vectors ``query_vectors`` does not return vector data).
377390
"""
378391
if not query_embedding:
379392
msg = "query_embedding must be a non-empty list of floats"

0 commit comments

Comments
 (0)