Skip to content

Commit d1915ee

Browse files
committed
fix: AmazonBedrockDocumentEmbedder to not modify Documents in place (#2174)
- Add dataclasses.replace import - Update _embed_cohere to create new document instances - Update _embed_titan to create new document instances - Add immutability tests for both Cohere and Titan paths - Update CHANGELOG.md with bug fix entry Follows the pattern from FastEmbed, Optimum, and Nvidia integrations.
1 parent 1017c72 commit d1915ee

4 files changed

Lines changed: 83 additions & 41 deletions

File tree

integrations/amazon_bedrock/CHANGELOG.md

Lines changed: 6 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66

77
- [**breaking**] Amazon_bedrock - drop Python 3.9 and use X|Y typing (#2685)
88

9-
109
## [integrations/amazon_bedrock-v5.4.0] - 2026-01-08
1110

1211
### 🚀 Features
@@ -27,7 +26,6 @@
2726

2827
- Relax model name validation for Bedrock Embedders (#2625)
2928

30-
3129
## [integrations/amazon_bedrock-v5.3.0] - 2025-12-17
3230

3331
### 🚀 Features
@@ -57,7 +55,6 @@
5755

5856
- S3Downloader - add `s3_key_generation_function` param to customize S3 key generation (#2343)
5957

60-
6158
## [integrations/amazon_bedrock-v5.0.0] - 2025-09-22
6259

6360
### 🚀 Features
@@ -67,7 +64,6 @@
6764

6865
### 📚 Documentation
6966

70-
7167
### 🧹 Chores
7268

7369
- Bedrock - remove unused `stop_words` init parameter (#2275)
@@ -84,7 +80,6 @@
8480
- [**breaking**] Update AmazonBedrockChatGenerator to use the new fields in `StreamingChunk` (#2216)
8581
- [**breaking**] Use `ReasoningContent` to store reasoning content in `ChatMessage` instead of `ChatMessage.meta` (#2226)
8682

87-
8883
### 🧹 Chores
8984

9085
- Standardize readmes - part 2 (#2205)
@@ -100,7 +95,6 @@
10095
- Add framework name into UserAgent header for bedrock integration (#2168)
10196
- Standardize readmes - part 1 (#2202)
10297

103-
10498
## [integrations/amazon_bedrock-v3.10.0] - 2025-08-06
10599

106100
### 🚀 Features
@@ -117,14 +111,12 @@
117111

118112
- `AmazonBedrockChatGenerator` - fix bug with streaming + tool calls with no arguments (#2121)
119113

120-
121114
## [integrations/amazon_bedrock-v3.9.0] - 2025-07-29
122115

123116
### 🚀 Features
124117

125118
- Amazon Bedrock - multimodal support (#2114)
126119

127-
128120
## [integrations/amazon_bedrock-v3.8.0] - 2025-07-04
129121

130122
### 🚀 Features
@@ -136,7 +128,6 @@
136128
- Remove black (#1985)
137129
- Improve typing for select_streaming_callback (#2008)
138130

139-
140131
## [integrations/amazon_bedrock-v3.7.0] - 2025-06-11
141132

142133
### 🐛 Bug Fixes
@@ -154,21 +145,18 @@
154145
- Align core-integrations Hatch scripts (#1898)
155146
- Update md files for new hatch scripts (#1911)
156147

157-
158148
## [integrations/amazon_bedrock-v3.6.2] - 2025-05-13
159149

160150
### 🧹 Chores
161151

162152
- Extend error message for unknown model family in AmazonBedrockGenerator (#1733)
163153

164-
165154
## [integrations/amazon_bedrock-v3.6.1] - 2025-05-13
166155

167156
### 🚜 Refactor
168157

169158
- Add AmazonBedrockRanker and keep BedrockRanker as alias (#1732)
170159

171-
172160
## [integrations/amazon_bedrock-v3.6.0] - 2025-05-09
173161

174162
### 🚀 Features
@@ -183,7 +171,6 @@
183171

184172
- Refactor tests of AmazonBedrock Integration (#1671)
185173

186-
187174
### 🧹 Chores
188175

189176
- Update ChatGenerators with `deserialize_tools_or_toolset_inplace` (#1623)
@@ -205,7 +192,6 @@
205192

206193
- AmazonBedrockGenerator - return response metadata (#1584)
207194

208-
209195
### 🧪 Testing
210196

211197
- Update tool serialization in tests to include `inputs_from_state` and `outputs_to_state` (#1581)
@@ -224,7 +210,6 @@
224210

225211
- Review testing workflows (#1541)
226212

227-
228213
## [integrations/amazon_bedrock-v3.2.1] - 2025-03-13
229214

230215
### 🐛 Bug Fixes
@@ -235,7 +220,6 @@
235220

236221
- Update AWS Bedrock with improved docstrings and warning message (#1532)
237222

238-
239223
### 🧹 Chores
240224

241225
- Use Haystack logging across integrations (#1484)
@@ -246,14 +230,12 @@
246230

247231
- Adding async to `AmazonChatGenerator` (#1445)
248232

249-
250233
## [integrations/amazon_bedrock-v3.1.1] - 2025-02-26
251234

252235
### 🐛 Bug Fixes
253236

254237
- Avoid thinking end tag on first content block (#1442)
255238

256-
257239
## [integrations/amazon_bedrock-v3.1.0] - 2025-02-26
258240

259241
### 🚀 Features
@@ -275,26 +257,23 @@
275257
- Chore: Bedrock - manually fix changelog (#1319)
276258
- Fix error when empty document list (#1325)
277259

278-
279260
## [integrations/amazon_bedrock-v3.0.0] - 2025-01-23
280261

281262
### 🚀 Features
282263

283-
- *(AWS Bedrock)* Add Cohere Reranker (#1291)
264+
- _(AWS Bedrock)_ Add Cohere Reranker (#1291)
284265
- AmazonBedrockChatGenerator - add tools support (#1304)
285266

286267
### 🚜 Refactor
287268

288-
- [**breaking**] AmazonBedrockGenerator - remove truncation (#1314)
289-
269+
- [**breaking**] AmazonBedrockGenerator - remove truncation (#1314)
290270

291271
## [integrations/amazon_bedrock-v2.1.3] - 2025-01-21
292272

293273
### 🧹 Chores
294274

295275
- Bedrock - pin `transformers!=4.48.*` (#1306)
296276

297-
298277
## [integrations/amazon_bedrock-v2.1.2] - 2025-01-20
299278

300279
### 🌀 Miscellaneous
@@ -307,27 +286,24 @@
307286

308287
- Fixes to Bedrock Chat Generator for compatibility with the new ChatMessage (#1250)
309288

310-
311289
## [integrations/amazon_bedrock-v2.1.0] - 2024-12-11
312290

313291
### 🚀 Features
314292

315293
- Support model_arn in AmazonBedrockGenerator (#1244)
316294

317-
318295
## [integrations/amazon_bedrock-v2.0.0] - 2024-12-10
319296

320297
### 🚀 Features
321298

322299
- Update AmazonBedrockChatGenerator to use Converse API (BREAKING CHANGE) (#1219)
323300

324-
325301
## [integrations/amazon_bedrock-v1.1.1] - 2024-12-03
326302

327303
### 🐛 Bug Fixes
328304

329305
- AmazonBedrockChatGenerator with Claude raises moot warning for stream… (#1205)
330-
- Allow passing boto3 config to all AWS Bedrock classes (#1166)
306+
- Allow passing boto3 config to all AWS Bedrock classes (#1166)
331307

332308
### 🧹 Chores
333309

@@ -347,26 +323,23 @@
347323

348324
- Adopt uv as installer (#1142)
349325

350-
351326
## [integrations/amazon_bedrock-v1.0.5] - 2024-10-17
352327

353328
### 🚀 Features
354329

355330
- Add prefixes to supported model patterns to allow cross region model ids (#1127)
356331

357-
358332
## [integrations/amazon_bedrock-v1.0.4] - 2024-10-16
359333

360334
### 🐛 Bug Fixes
361335

362336
- Avoid bedrock read timeout (add boto3_config param) (#1135)
363337

364-
365338
## [integrations/amazon_bedrock-v1.0.3] - 2024-10-04
366339

367340
### 🐛 Bug Fixes
368341

369-
- *(Bedrock)* Allow tools kwargs for AWS Bedrock Claude model (#976)
342+
- _(Bedrock)_ Allow tools kwargs for AWS Bedrock Claude model (#976)
370343
- Chat roles for model responses in chat generators (#1030)
371344

372345
### 🚜 Refactor
@@ -379,7 +352,7 @@
379352

380353
### 🌀 Miscellaneous
381354

382-
- Modify regex to allow cross-region inference in bedrock (#1120)
355+
- Modify regex to allow cross-region inference in bedrock (#1120)
383356

384357
## [integrations/amazon_bedrock-v1.0.1] - 2024-08-19
385358

@@ -391,7 +364,6 @@
391364

392365
- Normalising ChatGenerators output (#973)
393366

394-
395367
## [integrations/amazon_bedrock-v1.0.0] - 2024-08-12
396368

397369
### 🚜 Refactor
@@ -402,7 +374,6 @@
402374

403375
- Do not retry tests in `hatch run test` command (#954)
404376

405-
406377
## [integrations/amazon_bedrock-v0.10.0] - 2024-08-12
407378

408379
### 🐛 Bug Fixes
@@ -466,7 +437,7 @@
466437

467438
### 🚀 Features
468439

469-
- Add Mistral Amazon Bedrock support (#632)
440+
- Add Mistral Amazon Bedrock support (#632)
470441

471442
### 📚 Documentation
472443

integrations/amazon_bedrock/src/haystack_integrations/components/embedders/amazon_bedrock/document_embedder.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import json
2+
from dataclasses import replace
23
from typing import Any
34

45
from botocore.config import Config
@@ -186,10 +187,11 @@ def _embed_cohere(self, documents: list[Document]) -> list[Document]:
186187
)
187188
all_embeddings.extend(embeddings_list)
188189

190+
new_documents = []
189191
for doc, emb in zip(documents, all_embeddings, strict=True):
190-
doc.embedding = emb
192+
new_documents.append(replace(doc, embedding=emb))
191193

192-
return documents
194+
return new_documents
193195

194196
def _embed_titan(self, documents: list[Document]) -> list[Document]:
195197
"""
@@ -214,10 +216,11 @@ def _embed_titan(self, documents: list[Document]) -> list[Document]:
214216
embedding = response_body["embedding"]
215217
all_embeddings.append(embedding)
216218

219+
new_documents = []
217220
for doc, emb in zip(documents, all_embeddings, strict=True):
218-
doc.embedding = emb
221+
new_documents.append(replace(doc, embedding=emb))
219222

220-
return documents
223+
return new_documents
221224

222225
@component.output_types(documents=list[Document])
223226
def run(self, documents: list[Document]) -> dict[str, list[Document]]:

integrations/amazon_bedrock/src/haystack_integrations/components/rankers/amazon_bedrock/ranker.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from dataclasses import replace
12
from typing import Any
23

34
from botocore.exceptions import ClientError
@@ -251,8 +252,7 @@ def resolve_secret(secret: Secret | None) -> str | None:
251252
idx = result["index"]
252253
score = result["relevanceScore"]
253254
doc = documents[idx]
254-
doc.score = score
255-
sorted_docs.append(doc)
255+
sorted_docs.append(replace(doc, score=score))
256256

257257
return {"documents": sorted_docs}
258258
except ClientError as client_error:

integrations/amazon_bedrock/tests/test_document_embedder.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,74 @@ def mock_invoke_model(*args, **kwargs):
257257
assert doc.content == docs[i].content
258258
assert doc.embedding == [0.1, 0.2, 0.3]
259259

260+
def test_run_cohere_does_not_modify_original_documents(self, mock_boto3_session):
261+
embedder = AmazonBedrockDocumentEmbedder(model="cohere.embed-english-v3")
262+
263+
original_docs = [
264+
Document(content="test 1", id="doc1"),
265+
Document(content="test 2", id="doc2"),
266+
]
267+
268+
# Store original IDs to verify they're the same objects
269+
original_doc_ids = [id(doc) for doc in original_docs]
270+
original_embeddings = [doc.embedding for doc in original_docs]
271+
272+
with patch.object(embedder, "_client") as mock_client:
273+
mock_client.invoke_model.return_value = {
274+
"body": io.StringIO('{"embeddings": [[0.1, 0.2], [0.3, 0.4]]}'),
275+
}
276+
277+
result = embedder.run(documents=original_docs)
278+
279+
# Verify originals are unchanged
280+
assert all(doc.embedding is None for doc in original_docs)
281+
assert original_embeddings == [None, None]
282+
283+
# Verify returned documents are NEW instances
284+
returned_doc_ids = [id(doc) for doc in result["documents"]]
285+
assert original_doc_ids != returned_doc_ids
286+
287+
# Verify returned documents have embeddings
288+
assert result["documents"][0].embedding == [0.1, 0.2]
289+
assert result["documents"][1].embedding == [0.3, 0.4]
290+
assert result["documents"][0].content == "test 1"
291+
assert result["documents"][1].content == "test 2"
292+
293+
def test_run_titan_does_not_modify_original_documents(self, mock_boto3_session):
294+
embedder = AmazonBedrockDocumentEmbedder(model="amazon.titan-embed-text-v1")
295+
296+
original_docs = [
297+
Document(content="test 1", id="doc1"),
298+
Document(content="test 2", id="doc2"),
299+
]
300+
301+
# Store original IDs to verify they're the same objects
302+
original_doc_ids = [id(doc) for doc in original_docs]
303+
original_embeddings = [doc.embedding for doc in original_docs]
304+
305+
with patch.object(embedder, "_client") as mock_client:
306+
# Titan returns one embedding at a time
307+
mock_client.invoke_model.side_effect = [
308+
{"body": io.StringIO('{"embedding": [0.1, 0.2]}')},
309+
{"body": io.StringIO('{"embedding": [0.3, 0.4]}')},
310+
]
311+
312+
result = embedder.run(documents=original_docs)
313+
314+
# Verify originals are unchanged
315+
assert all(doc.embedding is None for doc in original_docs)
316+
assert original_embeddings == [None, None]
317+
318+
# Verify returned documents are NEW instances
319+
returned_doc_ids = [id(doc) for doc in result["documents"]]
320+
assert original_doc_ids != returned_doc_ids
321+
322+
# Verify returned documents have embeddings
323+
assert result["documents"][0].embedding == [0.1, 0.2]
324+
assert result["documents"][1].embedding == [0.3, 0.4]
325+
assert result["documents"][0].content == "test 1"
326+
assert result["documents"][1].content == "test 2"
327+
260328
@pytest.mark.integration
261329
@pytest.mark.skipif(
262330
not os.getenv("AWS_ACCESS_KEY_ID")

0 commit comments

Comments
 (0)