Skip to content

Commit d10dd46

Browse files
committed
feat: add Amazon S3 Vectors document store integration
Implements issue #2110 - Amazon S3 Vectors document store integration with: - S3VectorsDocumentStore: full DocumentStore protocol (count, write, filter, delete) - S3VectorsEmbeddingRetriever: embedding-based retrieval with metadata filtering - Filter conversion from Haystack format to S3 Vectors filter syntax - Auto-creation of vector buckets and indexes - AWS credential support via Secret (or default credential chain) - 49 unit tests covering store, retriever, filters, and serialization - README with usage examples and known limitations
1 parent 2d259b9 commit d10dd46

21 files changed

Lines changed: 2469 additions & 0 deletions

File tree

.github/labeler.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ integration:amazon-bedrock:
99
- any-glob-to-any-file: "integrations/amazon_bedrock/**/*"
1010
- any-glob-to-any-file: ".github/workflows/amazon_bedrock.yml"
1111

12+
integration:amazon-s3-vectors:
13+
- changed-files:
14+
- any-glob-to-any-file: "integrations/amazon_s3_vectors/**/*"
15+
- any-glob-to-any-file: ".github/workflows/amazon_s3_vectors.yml"
16+
1217
integration:amazon-sagemaker:
1318
- changed-files:
1419
- any-glob-to-any-file: "integrations/amazon_sagemaker/**/*"

.github/workflows/CI_coverage_comment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ on:
66
- "Test / aimlapi"
77
- "Test / amazon-bedrock"
88
- "Test / amazon-sagemaker"
9+
- "Test / amazon-s3-vectors"
910
- "Test / anthropic"
1011
- "Test / arcadedb"
1112
- "Test / astra"
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# This workflow comes from https://github.com/ofek/hatch-mypyc
2+
# https://github.com/ofek/hatch-mypyc/blob/5a198c0ba8660494d02716cfc9d79ce4adfb1442/.github/workflows/test.yml
3+
name: Test / amazon-s3-vectors
4+
5+
on:
6+
schedule:
7+
- cron: "0 0 * * *"
8+
pull_request:
9+
paths:
10+
- "integrations/amazon_s3_vectors/**"
11+
- "!integrations/amazon_s3_vectors/*.md"
12+
- ".github/workflows/amazon_s3_vectors.yml"
13+
push:
14+
branches:
15+
- main
16+
paths:
17+
- "integrations/amazon_s3_vectors/**"
18+
- "!integrations/amazon_s3_vectors/*.md"
19+
- ".github/workflows/amazon_s3_vectors.yml"
20+
21+
defaults:
22+
run:
23+
working-directory: integrations/amazon_s3_vectors
24+
25+
concurrency:
26+
group: amazon_s3_vectors-${{ github.head_ref || github.sha }}
27+
cancel-in-progress: true
28+
29+
env:
30+
PYTHONUNBUFFERED: "1"
31+
FORCE_COLOR: "1"
32+
TEST_MATRIX_OS: '["ubuntu-latest", "windows-latest", "macos-latest"]'
33+
TEST_MATRIX_PYTHON: '["3.10", "3.14"]'
34+
35+
jobs:
36+
compute-test-matrix:
37+
runs-on: ubuntu-slim
38+
defaults:
39+
run:
40+
working-directory: .
41+
outputs:
42+
os: ${{ steps.set.outputs.os }}
43+
python-version: ${{ steps.set.outputs.python-version }}
44+
steps:
45+
- id: set
46+
run: |
47+
echo 'os=${{ github.event_name == 'push' && '["ubuntu-latest"]' || env.TEST_MATRIX_OS }}' >> $GITHUB_OUTPUT
48+
echo 'python-version=${{ github.event_name == 'push' && '["3.10"]' || env.TEST_MATRIX_PYTHON }}' >> $GITHUB_OUTPUT
49+
50+
run:
51+
name: Python ${{ matrix.python-version }} on ${{ startsWith(matrix.os, 'macos-') && 'macOS' || startsWith(matrix.os, 'windows-') && 'Windows' || 'Linux' }}
52+
needs: compute-test-matrix
53+
permissions:
54+
contents: write
55+
pull-requests: write
56+
runs-on: ${{ matrix.os }}
57+
strategy:
58+
fail-fast: false
59+
matrix:
60+
os: ${{ fromJSON(needs.compute-test-matrix.outputs.os) }}
61+
python-version: ${{ fromJSON(needs.compute-test-matrix.outputs.python-version) }}
62+
63+
steps:
64+
- name: Support longpaths
65+
if: matrix.os == 'windows-latest'
66+
working-directory: .
67+
run: git config --system core.longpaths true
68+
69+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
70+
71+
- name: Set up Python ${{ matrix.python-version }}
72+
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
73+
with:
74+
python-version: ${{ matrix.python-version }}
75+
76+
- name: Install Hatch
77+
run: pip install --upgrade hatch
78+
- name: Lint
79+
if: matrix.python-version == '3.10' && runner.os == 'Linux'
80+
run: hatch run fmt-check && hatch run test:types
81+
82+
- name: Run unit tests
83+
run: hatch run test:unit-cov-retry
84+
85+
# On PR: posts coverage comment (directly on same-repo PRs; via artifact for fork PRs). On push to main: stores coverage baseline on data branch.
86+
- name: Store unit tests coverage
87+
id: coverage_comment
88+
if: matrix.python-version == '3.10' && runner.os == 'Linux' && github.event_name != 'schedule'
89+
uses: py-cov-action/python-coverage-comment-action@7188638f871f721a365d644f505d1ff3df20d683 # v3.40
90+
with:
91+
GITHUB_TOKEN: ${{ github.token }}
92+
COVERAGE_PATH: integrations/amazon_s3_vectors
93+
SUBPROJECT_ID: amazon_s3_vectors
94+
MINIMUM_GREEN: 90
95+
MINIMUM_ORANGE: 60
96+
97+
- name: Upload coverage comment to be posted
98+
if: matrix.python-version == '3.10' && runner.os == 'Linux' && github.event_name == 'pull_request' && steps.coverage_comment.outputs.COMMENT_FILE_WRITTEN == 'true'
99+
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
100+
with:
101+
name: coverage-comment-amazon_s3_vectors
102+
path: python-coverage-comment-action-amazon_s3_vectors.txt
103+
104+
- name: Run integration tests
105+
run: hatch run test:integration-cov-append-retry
106+
107+
- name: Store combined coverage
108+
if: github.event_name == 'push'
109+
uses: py-cov-action/python-coverage-comment-action@7188638f871f721a365d644f505d1ff3df20d683 # v3.40
110+
with:
111+
GITHUB_TOKEN: ${{ github.token }}
112+
COVERAGE_PATH: integrations/amazon_s3_vectors
113+
SUBPROJECT_ID: amazon_s3_vectors-combined
114+
MINIMUM_GREEN: 90
115+
MINIMUM_ORANGE: 60
116+
117+
- name: Run unit tests with lowest direct dependencies
118+
if: github.event_name != 'push'
119+
run: |
120+
hatch run uv pip compile pyproject.toml --resolution lowest-direct --output-file requirements_lowest_direct.txt
121+
hatch -e test env run -- uv pip install -r requirements_lowest_direct.txt
122+
hatch run test:unit
123+
124+
- name: Nightly - run unit tests with Haystack main branch
125+
if: github.event_name == 'schedule'
126+
run: |
127+
hatch env prune
128+
hatch -e test env run -- uv pip install git+https://github.com/deepset-ai/haystack.git@main
129+
hatch run test:unit
130+
131+
132+
notify-slack-on-failure:
133+
needs: run
134+
if: failure() && github.event_name == 'schedule'
135+
runs-on: ubuntu-slim
136+
steps:
137+
- uses: deepset-ai/notify-slack-action@3cda73b77a148f16f703274198e7771340cf862b # v1
138+
with:
139+
slack-webhook-url: ${{ secrets.SLACK_WEBHOOK_URL_NOTIFICATIONS }}

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Please check out our [Contribution Guidelines](CONTRIBUTING.md) for all the deta
2727
|-------------------------------------------------------------------------|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|---------------------|
2828
| [aimlapi-haystack](integrations/aimlapi/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/aimlapi-haystack.svg)](https://pypi.org/project/aimlapi-haystack) | [![Test / aimlapi](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/aimlapi.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/aimlapi.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-aimlapi/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-aimlapi/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-aimlapi-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-aimlapi-combined/htmlcov/index.html) |
2929
| [amazon-bedrock-haystack](integrations/amazon_bedrock/) | Embedder, Generator, Ranker, Downloader | [![PyPI - Version](https://img.shields.io/pypi/v/amazon-bedrock-haystack.svg)](https://pypi.org/project/amazon-bedrock-haystack) | [![Test / amazon_bedrock](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/amazon_bedrock.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/amazon_bedrock.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-amazon_bedrock/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-amazon_bedrock/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-amazon_bedrock-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-amazon_bedrock-combined/htmlcov/index.html) |
30+
| [amazon-s3-vectors-haystack](integrations/amazon_s3_vectors/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/amazon-s3-vectors-haystack.svg)](https://pypi.org/project/amazon-s3-vectors-haystack) | [![Test / amazon_s3_vectors](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/amazon_s3_vectors.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/amazon_s3_vectors.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-amazon_s3_vectors/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-amazon_s3_vectors/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-amazon_s3_vectors-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-amazon_s3_vectors-combined/htmlcov/index.html) |
3031
| [amazon-sagemaker-haystack](integrations/amazon_sagemaker/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/amazon-sagemaker-haystack.svg)](https://pypi.org/project/amazon-sagemaker-haystack) | [![Test / amazon_sagemaker](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/amazon_sagemaker.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/amazon_sagemaker.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-amazon_sagemaker/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-amazon_sagemaker/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-amazon_sagemaker-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-amazon_sagemaker-combined/htmlcov/index.html) |
3132
| [anthropic-haystack](integrations/anthropic/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/anthropic-haystack.svg)](https://pypi.org/project/anthropic-haystack) | [![Test / anthropic](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/anthropic.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/anthropic.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-anthropic/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-anthropic/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-anthropic-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-anthropic-combined/htmlcov/index.html) |
3233
| [arcadedb-haystack](integrations/arcadedb/) | Document Store | [![PyPI - Version](https://img.shields.io/pypi/v/arcadedb-haystack.svg)](https://pypi.org/project/arcadedb-haystack) | [![Test / arcadedb](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/arcadedb.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/arcadedb.yml) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-arcadedb/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-arcadedb/htmlcov/index.html) | [![Coverage badge](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/deepset-ai/haystack-core-integrations/python-coverage-comment-action-data-arcadedb-combined/endpoint.json&label=)](https://htmlpreview.github.io/?https://github.com/deepset-ai/haystack-core-integrations/blob/python-coverage-comment-action-data-arcadedb-combined/htmlcov/index.html) |

0 commit comments

Comments
 (0)