Skip to content

Commit 1762a72

Browse files
committed
fix: AmazonBedrockDocumentEmbedder to not modify Documents in place (#2174)
- Add dataclasses.replace import - Update _embed_cohere to create new document instances - Update _embed_titan to create new document instances - Add immutability tests for both Cohere and Titan paths - Update CHANGELOG.md with bug fix entry Follows the pattern from FastEmbed, Optimum, and Nvidia integrations.
1 parent 1017c72 commit 1762a72

3 files changed

Lines changed: 87 additions & 39 deletions

File tree

integrations/amazon_bedrock/CHANGELOG.md

Lines changed: 12 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
11
# Changelog
22

3+
## [unreleased]
4+
5+
### 🐛 Bug Fixes
6+
7+
- Fix `AmazonBedrockDocumentEmbedder` to not modify Documents in place (#2174)
8+
39
## [integrations/amazon_bedrock-v6.0.0] - 2026-01-09
410

511
### 🧹 Chores
612

713
- [**breaking**] Amazon_bedrock - drop Python 3.9 and use X|Y typing (#2685)
814

9-
1015
## [integrations/amazon_bedrock-v5.4.0] - 2026-01-08
1116

1217
### 🚀 Features
@@ -27,7 +32,6 @@
2732

2833
- Relax model name validation for Bedrock Embedders (#2625)
2934

30-
3135
## [integrations/amazon_bedrock-v5.3.0] - 2025-12-17
3236

3337
### 🚀 Features
@@ -57,7 +61,6 @@
5761

5862
- S3Downloader - add `s3_key_generation_function` param to customize S3 key generation (#2343)
5963

60-
6164
## [integrations/amazon_bedrock-v5.0.0] - 2025-09-22
6265

6366
### 🚀 Features
@@ -67,7 +70,6 @@
6770

6871
### 📚 Documentation
6972

70-
7173
### 🧹 Chores
7274

7375
- Bedrock - remove unused `stop_words` init parameter (#2275)
@@ -84,7 +86,6 @@
8486
- [**breaking**] Update AmazonBedrockChatGenerator to use the new fields in `StreamingChunk` (#2216)
8587
- [**breaking**] Use `ReasoningContent` to store reasoning content in `ChatMessage` instead of `ChatMessage.meta` (#2226)
8688

87-
8889
### 🧹 Chores
8990

9091
- Standardize readmes - part 2 (#2205)
@@ -100,7 +101,6 @@
100101
- Add framework name into UserAgent header for bedrock integration (#2168)
101102
- Standardize readmes - part 1 (#2202)
102103

103-
104104
## [integrations/amazon_bedrock-v3.10.0] - 2025-08-06
105105

106106
### 🚀 Features
@@ -117,14 +117,12 @@
117117

118118
- `AmazonBedrockChatGenerator` - fix bug with streaming + tool calls with no arguments (#2121)
119119

120-
121120
## [integrations/amazon_bedrock-v3.9.0] - 2025-07-29
122121

123122
### 🚀 Features
124123

125124
- Amazon Bedrock - multimodal support (#2114)
126125

127-
128126
## [integrations/amazon_bedrock-v3.8.0] - 2025-07-04
129127

130128
### 🚀 Features
@@ -136,7 +134,6 @@
136134
- Remove black (#1985)
137135
- Improve typing for select_streaming_callback (#2008)
138136

139-
140137
## [integrations/amazon_bedrock-v3.7.0] - 2025-06-11
141138

142139
### 🐛 Bug Fixes
@@ -154,21 +151,18 @@
154151
- Align core-integrations Hatch scripts (#1898)
155152
- Update md files for new hatch scripts (#1911)
156153

157-
158154
## [integrations/amazon_bedrock-v3.6.2] - 2025-05-13
159155

160156
### 🧹 Chores
161157

162158
- Extend error message for unknown model family in AmazonBedrockGenerator (#1733)
163159

164-
165160
## [integrations/amazon_bedrock-v3.6.1] - 2025-05-13
166161

167162
### 🚜 Refactor
168163

169164
- Add AmazonBedrockRanker and keep BedrockRanker as alias (#1732)
170165

171-
172166
## [integrations/amazon_bedrock-v3.6.0] - 2025-05-09
173167

174168
### 🚀 Features
@@ -183,7 +177,6 @@
183177

184178
- Refactor tests of AmazonBedrock Integration (#1671)
185179

186-
187180
### 🧹 Chores
188181

189182
- Update ChatGenerators with `deserialize_tools_or_toolset_inplace` (#1623)
@@ -205,7 +198,6 @@
205198

206199
- AmazonBedrockGenerator - return response metadata (#1584)
207200

208-
209201
### 🧪 Testing
210202

211203
- Update tool serialization in tests to include `inputs_from_state` and `outputs_to_state` (#1581)
@@ -224,7 +216,6 @@
224216

225217
- Review testing workflows (#1541)
226218

227-
228219
## [integrations/amazon_bedrock-v3.2.1] - 2025-03-13
229220

230221
### 🐛 Bug Fixes
@@ -235,7 +226,6 @@
235226

236227
- Update AWS Bedrock with improved docstrings and warning message (#1532)
237228

238-
239229
### 🧹 Chores
240230

241231
- Use Haystack logging across integrations (#1484)
@@ -246,14 +236,12 @@
246236

247237
- Adding async to `AmazonChatGenerator` (#1445)
248238

249-
250239
## [integrations/amazon_bedrock-v3.1.1] - 2025-02-26
251240

252241
### 🐛 Bug Fixes
253242

254243
- Avoid thinking end tag on first content block (#1442)
255244

256-
257245
## [integrations/amazon_bedrock-v3.1.0] - 2025-02-26
258246

259247
### 🚀 Features
@@ -275,26 +263,23 @@
275263
- Chore: Bedrock - manually fix changelog (#1319)
276264
- Fix error when empty document list (#1325)
277265

278-
279266
## [integrations/amazon_bedrock-v3.0.0] - 2025-01-23
280267

281268
### 🚀 Features
282269

283-
- *(AWS Bedrock)* Add Cohere Reranker (#1291)
270+
- _(AWS Bedrock)_ Add Cohere Reranker (#1291)
284271
- AmazonBedrockChatGenerator - add tools support (#1304)
285272

286273
### 🚜 Refactor
287274

288-
- [**breaking**] AmazonBedrockGenerator - remove truncation (#1314)
289-
275+
- [**breaking**] AmazonBedrockGenerator - remove truncation (#1314)
290276

291277
## [integrations/amazon_bedrock-v2.1.3] - 2025-01-21
292278

293279
### 🧹 Chores
294280

295281
- Bedrock - pin `transformers!=4.48.*` (#1306)
296282

297-
298283
## [integrations/amazon_bedrock-v2.1.2] - 2025-01-20
299284

300285
### 🌀 Miscellaneous
@@ -307,27 +292,24 @@
307292

308293
- Fixes to Bedrock Chat Generator for compatibility with the new ChatMessage (#1250)
309294

310-
311295
## [integrations/amazon_bedrock-v2.1.0] - 2024-12-11
312296

313297
### 🚀 Features
314298

315299
- Support model_arn in AmazonBedrockGenerator (#1244)
316300

317-
318301
## [integrations/amazon_bedrock-v2.0.0] - 2024-12-10
319302

320303
### 🚀 Features
321304

322305
- Update AmazonBedrockChatGenerator to use Converse API (BREAKING CHANGE) (#1219)
323306

324-
325307
## [integrations/amazon_bedrock-v1.1.1] - 2024-12-03
326308

327309
### 🐛 Bug Fixes
328310

329311
- AmazonBedrockChatGenerator with Claude raises moot warning for stream… (#1205)
330-
- Allow passing boto3 config to all AWS Bedrock classes (#1166)
312+
- Allow passing boto3 config to all AWS Bedrock classes (#1166)
331313

332314
### 🧹 Chores
333315

@@ -347,26 +329,23 @@
347329

348330
- Adopt uv as installer (#1142)
349331

350-
351332
## [integrations/amazon_bedrock-v1.0.5] - 2024-10-17
352333

353334
### 🚀 Features
354335

355336
- Add prefixes to supported model patterns to allow cross region model ids (#1127)
356337

357-
358338
## [integrations/amazon_bedrock-v1.0.4] - 2024-10-16
359339

360340
### 🐛 Bug Fixes
361341

362342
- Avoid bedrock read timeout (add boto3_config param) (#1135)
363343

364-
365344
## [integrations/amazon_bedrock-v1.0.3] - 2024-10-04
366345

367346
### 🐛 Bug Fixes
368347

369-
- *(Bedrock)* Allow tools kwargs for AWS Bedrock Claude model (#976)
348+
- _(Bedrock)_ Allow tools kwargs for AWS Bedrock Claude model (#976)
370349
- Chat roles for model responses in chat generators (#1030)
371350

372351
### 🚜 Refactor
@@ -379,7 +358,7 @@
379358

380359
### 🌀 Miscellaneous
381360

382-
- Modify regex to allow cross-region inference in bedrock (#1120)
361+
- Modify regex to allow cross-region inference in bedrock (#1120)
383362

384363
## [integrations/amazon_bedrock-v1.0.1] - 2024-08-19
385364

@@ -391,7 +370,6 @@
391370

392371
- Normalising ChatGenerators output (#973)
393372

394-
395373
## [integrations/amazon_bedrock-v1.0.0] - 2024-08-12
396374

397375
### 🚜 Refactor
@@ -402,7 +380,6 @@
402380

403381
- Do not retry tests in `hatch run test` command (#954)
404382

405-
406383
## [integrations/amazon_bedrock-v0.10.0] - 2024-08-12
407384

408385
### 🐛 Bug Fixes
@@ -466,7 +443,7 @@
466443

467444
### 🚀 Features
468445

469-
- Add Mistral Amazon Bedrock support (#632)
446+
- Add Mistral Amazon Bedrock support (#632)
470447

471448
### 📚 Documentation
472449

integrations/amazon_bedrock/src/haystack_integrations/components/embedders/amazon_bedrock/document_embedder.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import json
2+
from dataclasses import replace
23
from typing import Any
34

45
from botocore.config import Config
@@ -186,10 +187,11 @@ def _embed_cohere(self, documents: list[Document]) -> list[Document]:
186187
)
187188
all_embeddings.extend(embeddings_list)
188189

190+
new_documents = []
189191
for doc, emb in zip(documents, all_embeddings, strict=True):
190-
doc.embedding = emb
192+
new_documents.append(replace(doc, embedding=emb))
191193

192-
return documents
194+
return new_documents
193195

194196
def _embed_titan(self, documents: list[Document]) -> list[Document]:
195197
"""
@@ -214,10 +216,11 @@ def _embed_titan(self, documents: list[Document]) -> list[Document]:
214216
embedding = response_body["embedding"]
215217
all_embeddings.append(embedding)
216218

219+
new_documents = []
217220
for doc, emb in zip(documents, all_embeddings, strict=True):
218-
doc.embedding = emb
221+
new_documents.append(replace(doc, embedding=emb))
219222

220-
return documents
223+
return new_documents
221224

222225
@component.output_types(documents=list[Document])
223226
def run(self, documents: list[Document]) -> dict[str, list[Document]]:

integrations/amazon_bedrock/tests/test_document_embedder.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,74 @@ def mock_invoke_model(*args, **kwargs):
257257
assert doc.content == docs[i].content
258258
assert doc.embedding == [0.1, 0.2, 0.3]
259259

260+
def test_run_cohere_does_not_modify_original_documents(self, mock_boto3_session):
261+
embedder = AmazonBedrockDocumentEmbedder(model="cohere.embed-english-v3")
262+
263+
original_docs = [
264+
Document(content="test 1", id="doc1"),
265+
Document(content="test 2", id="doc2"),
266+
]
267+
268+
# Store original IDs to verify they're the same objects
269+
original_doc_ids = [id(doc) for doc in original_docs]
270+
original_embeddings = [doc.embedding for doc in original_docs]
271+
272+
with patch.object(embedder, "_client") as mock_client:
273+
mock_client.invoke_model.return_value = {
274+
"body": io.StringIO('{"embeddings": [[0.1, 0.2], [0.3, 0.4]]}'),
275+
}
276+
277+
result = embedder.run(documents=original_docs)
278+
279+
# Verify originals are unchanged
280+
assert all(doc.embedding is None for doc in original_docs)
281+
assert original_embeddings == [None, None]
282+
283+
# Verify returned documents are NEW instances
284+
returned_doc_ids = [id(doc) for doc in result["documents"]]
285+
assert original_doc_ids != returned_doc_ids
286+
287+
# Verify returned documents have embeddings
288+
assert result["documents"][0].embedding == [0.1, 0.2]
289+
assert result["documents"][1].embedding == [0.3, 0.4]
290+
assert result["documents"][0].content == "test 1"
291+
assert result["documents"][1].content == "test 2"
292+
293+
def test_run_titan_does_not_modify_original_documents(self, mock_boto3_session):
294+
embedder = AmazonBedrockDocumentEmbedder(model="amazon.titan-embed-text-v1")
295+
296+
original_docs = [
297+
Document(content="test 1", id="doc1"),
298+
Document(content="test 2", id="doc2"),
299+
]
300+
301+
# Store original IDs to verify they're the same objects
302+
original_doc_ids = [id(doc) for doc in original_docs]
303+
original_embeddings = [doc.embedding for doc in original_docs]
304+
305+
with patch.object(embedder, "_client") as mock_client:
306+
# Titan returns one embedding at a time
307+
mock_client.invoke_model.side_effect = [
308+
{"body": io.StringIO('{"embedding": [0.1, 0.2]}')},
309+
{"body": io.StringIO('{"embedding": [0.3, 0.4]}')},
310+
]
311+
312+
result = embedder.run(documents=original_docs)
313+
314+
# Verify originals are unchanged
315+
assert all(doc.embedding is None for doc in original_docs)
316+
assert original_embeddings == [None, None]
317+
318+
# Verify returned documents are NEW instances
319+
returned_doc_ids = [id(doc) for doc in result["documents"]]
320+
assert original_doc_ids != returned_doc_ids
321+
322+
# Verify returned documents have embeddings
323+
assert result["documents"][0].embedding == [0.1, 0.2]
324+
assert result["documents"][1].embedding == [0.3, 0.4]
325+
assert result["documents"][0].content == "test 1"
326+
assert result["documents"][1].content == "test 2"
327+
260328
@pytest.mark.integration
261329
@pytest.mark.skipif(
262330
not os.getenv("AWS_ACCESS_KEY_ID")

0 commit comments

Comments
 (0)