Skip to content

Commit c348b4f

Browse files
Merge branch 'main' into feat/firecrawl-websearch
2 parents a279d5b + ecf3ba6 commit c348b4f

File tree

32 files changed

+1181
-90
lines changed

32 files changed

+1181
-90
lines changed

.github/labeler.yml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -223,25 +223,25 @@ integration:unstructured-fileconverter:
223223
- any-glob-to-any-file: "integrations/unstructured/**/*"
224224
- any-glob-to-any-file: ".github/workflows/unstructured.yml"
225225

226-
integration:watsonx:
226+
integration:valkey:
227227
- changed-files:
228-
- any-glob-to-any-file: "integrations/watsonx/**/*"
229-
- any-glob-to-any-file: ".github/workflows/watsonx.yml"
228+
- any-glob-to-any-file: "integrations/valkey/**/*"
229+
- any-glob-to-any-file: ".github/workflows/valkey.yml"
230230

231-
integration:weaviate:
231+
integration:watsonx:
232232
- changed-files:
233-
- any-glob-to-any-file: "integrations/weaviate/**/*"
234-
- any-glob-to-any-file: ".github/workflows/weaviate.yml"
233+
- any-glob-to-any-file: "integrations/watsonx/**/*"
234+
- any-glob-to-any-file: ".github/workflows/watsonx.yml"
235235

236236
integration:weave:
237237
- changed-files:
238238
- any-glob-to-any-file: "integrations/weave/**/*"
239239
- any-glob-to-any-file: ".github/workflows/weave.yml"
240240

241-
integration:valkey:
241+
integration:weaviate:
242242
- changed-files:
243-
- any-glob-to-any-file: "integrations/valkey/**/*"
244-
- any-glob-to-any-file: ".github/workflows/valkey.yml"
243+
- any-glob-to-any-file: "integrations/weaviate/**/*"
244+
- any-glob-to-any-file: ".github/workflows/weaviate.yml"
245245

246246
# Topics
247247
topic:CI:

.github/workflows/CI_check_api_ref.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,11 @@ jobs:
109109
if: steps.changed.outputs.integrations != '[]'
110110
working-directory: website
111111
run: |
112+
# docusaurus-mdx-checker is a package that is not frequently updated. Its dependency katex sometimes ships a
113+
# broken ESM build, where a __VERSION__ placeholder is left unresolved, causing a ReferenceError at import time.
114+
# Node 22+ prefers ESM when available. We force CJS (CommonJS) resolution to use the working katex build.
115+
# This should be safe because docusaurus-mdx-checker and its dependencies provide CJS builds.
116+
export NODE_OPTIONS="--conditions=require"
112117
npx docusaurus-mdx-checker -v || {
113118
echo ""
114119
echo "For common MDX problems, see https://docusaurus.io/blog/preparing-your-site-for-docusaurus-v3#common-mdx-problems"

.github/workflows/mcp.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ jobs:
5353

5454
- name: Set up Docker
5555
if: runner.os == 'Linux'
56-
uses: docker/setup-buildx-action@v3
56+
uses: docker/setup-buildx-action@v4
5757

5858
# we need to pull the mcp/brave-search image to run the test
5959
# on the actual mcp server as an example of real life usage

CONTRIBUTING.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -236,31 +236,31 @@ $ hatch run test:integration
236236
> Core integrations follow the naming convention `PREFIX-haystack`, where `PREFIX` can be the name of the technology
237237
> you're integrating Haystack with. For example, a deepset integration would be named as `deepset-haystack`.
238238
239-
To create a new integration, from the root of the repo change directory into `integrations`:
239+
To create a new integration, run the scaffold script from the root of the repository:
240240

241241
```sh
242-
cd integrations
242+
python scripts/create_new_integration.py
243243
```
244244

245-
From there, use `hatch` to create the scaffold of the new integration:
245+
The script will interactively ask you for the integration **name** (e.g. `opensearch`, `amazon_bedrock`) and
246+
**component type** (e.g. `document_stores`, `generators`, `embedders`). You can also pass these as command-line
247+
arguments to skip the prompts:
246248

247249
```sh
248-
$ hatch --config hatch.toml new -i
249-
Project name: deepset-haystack
250-
Description []: An example integration, this text can be edited later
251-
252-
deepset-haystack
253-
├── src
254-
│ └── deepset_haystack
255-
│ ├── __about__.py
256-
│ └── __init__.py
257-
├── tests
258-
│ └── __init__.py
259-
├── LICENSE.txt
260-
├── README.md
261-
└── pyproject.toml
250+
python scripts/create_new_integration.py --name YOUR_INTEGRATION_NAME --type YOUR_COMPONENT_TYPE
262251
```
263252

253+
The script takes care of the full setup in one step:
254+
255+
- Scaffolds the integration folder under `integrations/` with the correct project structure (`pyproject.toml`,
256+
source package, tests, pydoc config, example components, and a README).
257+
- Creates a GitHub Actions CI workflow at `.github/workflows/<name>.yml`.
258+
- Adds label rules to `.github/labeler.yml`.
259+
- Adds the new integration to the table in the root `README.md`.
260+
261+
Once the script finishes, follow the printed next-steps to fill in your component code, add dependencies, and
262+
write tests.
263+
264264
### Improving The Documentation
265265

266266
There are two types of documentation for this project: Python API docs, and Documentation pages

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ Please check out our [Contribution Guidelines](CONTRIBUTING.md) for all the deta
4545
| [google-ai-haystack](integrations/google_ai/) | Generator | [![PyPI - Version](https://img.shields.io/pypi/v/google-ai-haystack.svg)](https://pypi.org/project/google-ai-haystack) | **Archived** - use [google-genai-haystack](https://pypi.org/project/google-genai-haystack) instead |
4646
| [google-genai-haystack](integrations/google_genai/) | Embedder, Generator | [![PyPI - Version](https://img.shields.io/pypi/v/google-genai-haystack.svg)](https://pypi.org/project/google-genai-haystack) | [![Test / google-genai](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/google_genai.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/google_genai.yml) |
4747
| [google-vertex-haystack](integrations/google_vertex/) | Embedder, Generator | [![PyPI - Version](https://img.shields.io/pypi/v/google-vertex-haystack.svg)](https://pypi.org/project/google-vertex-haystack) | **Archived** - use [google-genai-haystack](https://pypi.org/project/google-genai-haystack) instead |
48+
| [hallo-haystack](integrations/hallo/) | Embedder | [![PyPI - Version](https://img.shields.io/pypi/v/hallo-haystack.svg)](https://pypi.org/project/hallo-haystack) | [![Test / hallo](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/hallo.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/hallo.yml) |
4849
| [hanlp-haystack](integrations/hanlp/) | Preprocessor | [![PyPI - Version](https://img.shields.io/pypi/v/hanlp-haystack.svg)](https://pypi.org/project/hanlp-haystack) | [![Test / hanlp](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/hanlp.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/hanlp.yml) |
4950
| [jina-haystack](integrations/jina/) | Connector, Embedder, Ranker | [![PyPI - Version](https://img.shields.io/pypi/v/jina-haystack.svg)](https://pypi.org/project/jina-haystack) | [![Test / jina](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/jina.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/jina.yml) |
5051
| [langfuse-haystack](integrations/langfuse/) | Tracer | [![PyPI - Version](https://img.shields.io/pypi/v/langfuse-haystack.svg?color=orange)](https://pypi.org/project/langfuse-haystack) | [![Test / langfuse](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/langfuse.yml/badge.svg)](https://github.com/deepset-ai/haystack-core-integrations/actions/workflows/langfuse.yml) |

integrations/amazon_bedrock/CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
# Changelog
22

3+
## [integrations/amazon_bedrock-v6.5.0] - 2026-03-03
4+
5+
### 🚀 Features
6+
7+
- Bedrock - support for FileContent + citations (#2883)
8+
9+
### 📚 Documentation
10+
11+
- Fix docstring for AmazonBedrockChatGenerator (#2813)
12+
- Simplify pydoc configs (#2855)
13+
14+
315
## [integrations/amazon_bedrock-v6.4.0] - 2026-02-05
416

517
### 🚀 Features

integrations/amazon_bedrock/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ classifiers = [
2222
"Programming Language :: Python :: Implementation :: CPython",
2323
"Programming Language :: Python :: Implementation :: PyPy",
2424
]
25-
dependencies = ["haystack-ai>=2.23.0", "boto3>=1.28.57", "aioboto3>=14.0.0"]
25+
dependencies = ["haystack-ai>=2.24.1", "boto3>=1.28.57", "aioboto3>=14.0.0"]
2626

2727
[project.urls]
2828
Documentation = "https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/amazon_bedrock#readme"

integrations/amazon_bedrock/src/haystack_integrations/components/generators/amazon_bedrock/chat/utils.py

Lines changed: 122 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import base64
22
import json
3+
import os
4+
import re
35
from datetime import datetime, timezone
46
from typing import Any
57

@@ -11,6 +13,7 @@
1113
ChatMessage,
1214
ChatRole,
1315
ComponentInfo,
16+
FileContent,
1417
FinishReason,
1518
ImageContent,
1619
ReasoningContent,
@@ -26,7 +29,37 @@
2629

2730

2831
# see https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ImageBlock.html for supported formats
29-
IMAGE_SUPPORTED_FORMATS = ["png", "jpeg", "gif", "webp"]
32+
IMAGE_MIME_TYPE_TO_FORMAT: dict[str, str] = {
33+
"image/png": "png",
34+
"image/jpeg": "jpeg",
35+
"image/jpg": "jpeg",
36+
"image/gif": "gif",
37+
"image/webp": "webp",
38+
}
39+
40+
# https://docs.aws.amazon.com/cli/latest/reference/bedrock-runtime/converse.html
41+
DOCUMENT_MIME_TYPE_TO_FORMAT: dict[str, str] = {
42+
"application/pdf": "pdf",
43+
"text/csv": "csv",
44+
"application/msword": "doc",
45+
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": "docx",
46+
"application/vnd.ms-excel": "xls",
47+
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "xlsx",
48+
"text/html": "html",
49+
"text/plain": "txt",
50+
"text/markdown": "md",
51+
}
52+
53+
VIDEO_MIME_TYPE_TO_FORMAT: dict[str, str] = {
54+
"video/x-matroska": "mkv",
55+
"video/quicktime": "mov",
56+
"video/mp4": "mp4",
57+
"video/webm": "webm",
58+
"video/x-flv": "flv",
59+
"video/mpeg": "mpeg",
60+
"video/x-ms-wmv": "wmv",
61+
"video/3gpp": "three_gp",
62+
}
3063

3164
# see https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_MessageStopEvent.html
3265
FINISH_REASON_MAPPING: dict[str, FinishReason] = {
@@ -70,11 +103,11 @@ def _convert_image_content_to_bedrock_format(image_content: ImageContent) -> dic
70103
Convert a Haystack ImageContent to Bedrock format.
71104
"""
72105

73-
image_format = image_content.mime_type.split("/")[-1] if image_content.mime_type else None
74-
if image_format not in IMAGE_SUPPORTED_FORMATS:
106+
image_format = IMAGE_MIME_TYPE_TO_FORMAT.get(image_content.mime_type or "")
107+
if image_format is None:
75108
err_msg = (
76-
f"Unsupported image format: {image_format}. "
77-
f"Bedrock supports the following image formats: {IMAGE_SUPPORTED_FORMATS}"
109+
f"Unsupported image MIME type: {image_content.mime_type}. "
110+
f"Bedrock supports the following image formats: {list(set(IMAGE_MIME_TYPE_TO_FORMAT.values()))}"
78111
)
79112
raise ValueError(err_msg)
80113

@@ -83,6 +116,51 @@ def _convert_image_content_to_bedrock_format(image_content: ImageContent) -> dic
83116
return {"image": {"format": image_format, "source": source}}
84117

85118

119+
def _convert_file_content_to_bedrock_format(file_content: FileContent) -> dict[str, Any]:
120+
"""
121+
Convert a Haystack FileContent to Bedrock format.
122+
"""
123+
124+
if file_content.mime_type is None:
125+
err_msg = "MIME type is required to use FileContent in Bedrock."
126+
raise ValueError(err_msg)
127+
128+
if doc_format := DOCUMENT_MIME_TYPE_TO_FORMAT.get(file_content.mime_type):
129+
source = {"bytes": base64.b64decode(file_content.base64_data)}
130+
131+
name = "filename"
132+
if file_content.filename:
133+
raw_name = os.path.splitext(file_content.filename)[0]
134+
# Bedrock requires name to be present but is very strict about the format.
135+
# See https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_DocumentBlock.html
136+
sanitized_name = re.sub(r"\s+", " ", re.sub(r"[^a-zA-Z0-9\s\-\[\]()]", "", raw_name)).strip()
137+
if sanitized_name:
138+
name = sanitized_name
139+
140+
doc_block = {
141+
"document": {
142+
"format": doc_format,
143+
"source": source,
144+
"name": name,
145+
**({"context": file_content.extra["context"]} if file_content.extra.get("context") else {}),
146+
**({"citations": file_content.extra["citations"]} if file_content.extra.get("citations") else {}),
147+
}
148+
}
149+
return doc_block
150+
151+
if video_format := VIDEO_MIME_TYPE_TO_FORMAT.get(file_content.mime_type):
152+
source = {"bytes": base64.b64decode(file_content.base64_data)}
153+
video_block = {"video": {"format": video_format, "source": source}}
154+
return video_block
155+
156+
err_msg = (
157+
f"Unsupported file content MIME type: {file_content.mime_type}\n"
158+
f"Bedrock supports the following formats:\n - Documents: {list(DOCUMENT_MIME_TYPE_TO_FORMAT.values())}\n"
159+
f" - Videos: {list(VIDEO_MIME_TYPE_TO_FORMAT.values())}"
160+
)
161+
raise ValueError(err_msg)
162+
163+
86164
def _format_tool_call_message(tool_call_message: ChatMessage) -> dict[str, Any]:
87165
"""
88166
Format a Haystack ChatMessage containing tool calls into Bedrock format.
@@ -231,31 +309,48 @@ def _format_reasoning_content(reasoning_content: ReasoningContent) -> list[dict[
231309
return formatted_contents
232310

233311

234-
def _format_text_image_message(message: ChatMessage) -> dict[str, Any]:
312+
def _format_user_message(message: ChatMessage) -> dict[str, Any]:
235313
"""
236-
Format a Haystack ChatMessage containing text and optional image content into Bedrock format.
314+
Format a Haystack user ChatMessage into Bedrock format.
237315
238316
:param message: Haystack ChatMessage.
239317
:returns: Dictionary representing the message in Bedrock's expected format.
240-
:raises ValueError: If image content is found in an assistant message or an unsupported image format is used.
241318
"""
242319
content_parts = message._content
243320

244321
bedrock_content_blocks: list[dict[str, Any]] = []
245-
# Add reasoning content if available as the first content block
246-
if message.reasoning:
247-
bedrock_content_blocks.extend(_format_reasoning_content(reasoning_content=message.reasoning))
248322

249323
for part in content_parts:
250324
if isinstance(part, TextContent):
251325
bedrock_content_blocks.append({"text": part.text})
252326

253327
elif isinstance(part, ImageContent):
254-
if message.is_from(ChatRole.ASSISTANT):
255-
err_msg = "Image content is not supported for assistant messages"
256-
raise ValueError(err_msg)
257328
bedrock_content_blocks.append(_convert_image_content_to_bedrock_format(part))
258329

330+
elif isinstance(part, FileContent):
331+
bedrock_content_blocks.append(_convert_file_content_to_bedrock_format(part))
332+
333+
return {"role": message.role.value, "content": bedrock_content_blocks}
334+
335+
336+
def _format_textual_assistant_message(message: ChatMessage) -> dict[str, Any]:
337+
"""
338+
Format a Haystack assistant ChatMessage containing text and optionally reasoning into Bedrock format.
339+
340+
:param message: Haystack ChatMessage.
341+
:returns: Dictionary representing the message in Bedrock's expected format.
342+
"""
343+
content_parts = message._content
344+
345+
bedrock_content_blocks: list[dict[str, Any]] = []
346+
# Add reasoning content if available as the first content block
347+
if message.reasoning:
348+
bedrock_content_blocks.extend(_format_reasoning_content(reasoning_content=message.reasoning))
349+
350+
for part in content_parts:
351+
if isinstance(part, TextContent):
352+
bedrock_content_blocks.append({"text": part.text})
353+
259354
return {"role": message.role.value, "content": bedrock_content_blocks}
260355

261356

@@ -314,8 +409,10 @@ def _format_messages(messages: list[ChatMessage]) -> tuple[list[dict[str, Any]],
314409
formatted_msg = _format_tool_call_message(msg)
315410
elif msg.tool_call_results:
316411
formatted_msg = _format_tool_result_message(msg)
317-
else:
318-
formatted_msg = _format_text_image_message(msg)
412+
elif msg.is_from(ChatRole.USER):
413+
formatted_msg = _format_user_message(msg)
414+
elif msg.is_from(ChatRole.ASSISTANT):
415+
formatted_msg = _format_textual_assistant_message(msg)
319416
if cache_point:
320417
formatted_msg["content"].append(cache_point)
321418
bedrock_formatted_messages.append(formatted_msg)
@@ -386,6 +483,14 @@ def _parse_completion_response(response_body: dict[str, Any], model: str) -> lis
386483
if "redactedContent" in reasoning_content:
387484
reasoning_content["redacted_content"] = reasoning_content.pop("redactedContent")
388485
reasoning_contents.append({"reasoning_content": reasoning_content})
486+
elif "citationsContent" in content_block:
487+
citations_content = content_block["citationsContent"]
488+
meta["citations"] = citations_content
489+
if "content" in citations_content:
490+
for entry in citations_content["content"]:
491+
text = entry.get("text", "")
492+
if text.strip():
493+
text_content.append(text)
389494

390495
reasoning_text = ""
391496
for content in reasoning_contents:
@@ -397,7 +502,7 @@ def _parse_completion_response(response_body: dict[str, Any], model: str) -> lis
397502
# Create a single ChatMessage with combined text and tool calls
398503
replies.append(
399504
ChatMessage.from_assistant(
400-
" ".join(text_content),
505+
"".join(text_content),
401506
tool_calls=tool_calls,
402507
meta=meta,
403508
reasoning=ReasoningContent(

0 commit comments

Comments
 (0)