You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs-website/docs/concepts/agents.mdx
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,7 @@ Key capabilities include:
45
45
-**Human-in-the-loop**: Intercept tool calls for human review before execution. See [Human in the Loop](../pipeline-components/agents-1/human-in-the-loop.mdx).
46
46
-**Multi-agent systems**: Wrap an `Agent` as a `ComponentTool` to build coordinator/specialist architectures. See [Multi-Agent Systems](./agents/multi-agent-systems.mdx).
47
47
-**MCP server exposure**: Expose your agent as an MCP server using [Hayhooks](../development/hayhooks.mdx), making it callable from any MCP-compatible client such as Claude Desktop or Cursor.
48
+
-**Multimodal inputs**: Pass images alongside text using `ImageContent` in `ChatMessage` content parts, or return `ImageContent` from tools for dynamic image analysis. Requires a vision-capable model such as `gpt-5` or `gemini-2.5-flash`. See [Multimodal Inputs](../pipeline-components/agents-1/agent.mdx#multimodal-inputs).
48
49
49
50
Check out the [Agent](../pipeline-components/agents-1/agent.mdx) documentation, or the [example](#tool-calling-agent) below to get started.
ChatMessage.from_user(content_parts=["What does this chart show?", image]),
349
+
],
350
+
)
351
+
```
352
+
353
+
Tools can also return `ImageContent` directly, letting the agent fetch and reason about images dynamically during its loop.
354
+
Two things are required: set `outputs_to_string={"raw_result": True}` so the `ToolInvoker` skips string conversion, and return a `list[ImageContent]` (the tool result type is `str | Sequence[TextContent | ImageContent]`).
355
+
356
+
The standard Chat Completions API doesn't support images in tool results — use `OpenAIResponsesChatGenerator` (OpenAI's Responses API) instead:
357
+
358
+
```python
359
+
from typing import Annotated
360
+
from haystack.components.agents import Agent
361
+
from haystack.components.generators.chat import OpenAIResponsesChatGenerator
362
+
from haystack.dataclasses import ChatMessage, ImageContent
363
+
from haystack.tools import tool
364
+
365
+
366
+
@tool(outputs_to_string={"raw_result": True})
367
+
def fetch_image(
368
+
url: Annotated[str, "URL of the image to fetch and analyze"],
369
+
) -> list[ImageContent]:
370
+
"""Fetch an image from a URL so the agent can analyze its contents."""
system_prompt="You are a helpful assistant that can fetch and analyze images from URLs.",
378
+
)
379
+
380
+
result = agent.run(
381
+
messages=[
382
+
ChatMessage.from_user(
383
+
"Fetch the image at https://picsum.photos/seed/haystack/640/480 and describe what you see.",
384
+
),
385
+
],
386
+
)
387
+
print(result["last_message"].text)
388
+
```
389
+
390
+
`ImageContent`can be created from a URL, a local file path, or a PDF page using the `PDFToImageContent` converter.
391
+
392
+
### In a pipeline
393
+
394
+
When an `Agent` sits inside a pipeline, use `ChatPromptBuilder` with its string template format and the `| templatize_part` filter to pass images as structured content parts:
395
+
396
+
```python
397
+
from haystack import Pipeline
398
+
from haystack.components.agents import Agent
399
+
from haystack.components.builders import ChatPromptBuilder
400
+
from haystack.components.generators.chat import OpenAIChatGenerator
# Download or provide your own chart image as "chart.png"
425
+
image = ImageContent.from_file_path("chart.png")
426
+
result = pipeline.run(
427
+
{
428
+
"prompt_builder": {"question": "What does this chart show?", "image": image},
429
+
},
430
+
)
431
+
print(result["agent"]["last_message"].text)
432
+
```
433
+
434
+
:::tip
435
+
See these cookbooks for complete multimodal agent examples:
436
+
- [Multimodal Agents](https://haystack.deepset.ai/cookbook/multimodal_intro#multimodal-agent) — image inputs and tool use with agents
437
+
- [Gemma Chat RAG](https://haystack.deepset.ai/cookbook/gemma_chat_rag) — vision model in a RAG pipeline
438
+
:::
439
+
337
440
## Multi-Agent Systems
338
441
339
442
You can wrap an `Agent` as a tool to build multi-agent systems where specialist agents handle focused subtasks and a coordinator agent plans and delegates.
@@ -363,3 +466,5 @@ Agents work with MCP in two directions:
363
466
🧑🍳 Cookbook:
364
467
365
468
- [Build a GitHub Issue Resolver Agent](https://haystack.deepset.ai/cookbook/github_issue_resolver_agent)
Copy file name to clipboardExpand all lines: docs-website/versioned_docs/version-2.28/concepts/agents.mdx
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,7 @@ Key capabilities include:
45
45
-**Human-in-the-loop**: Intercept tool calls for human review before execution. See [Human in the Loop](../pipeline-components/agents-1/human-in-the-loop.mdx).
46
46
-**Multi-agent systems**: Wrap an `Agent` as a `ComponentTool` to build coordinator/specialist architectures. See [Multi-Agent Systems](./agents/multi-agent-systems.mdx).
47
47
-**MCP server exposure**: Expose your agent as an MCP server using [Hayhooks](../development/hayhooks.mdx), making it callable from any MCP-compatible client such as Claude Desktop or Cursor.
48
+
-**Multimodal inputs**: Pass images alongside text using `ImageContent` in `ChatMessage` content parts, or return `ImageContent` from tools for dynamic image analysis. Requires a vision-capable model such as `gpt-5` or `gemini-2.5-flash`. See [Multimodal Inputs](../pipeline-components/agents-1/agent.mdx#multimodal-inputs).
48
49
49
50
Check out the [Agent](../pipeline-components/agents-1/agent.mdx) documentation, or the [example](#tool-calling-agent) below to get started.
ChatMessage.from_user(content_parts=["What does this chart show?", image]),
349
+
],
350
+
)
351
+
```
352
+
353
+
Tools can also return `ImageContent` directly, letting the agent fetch and reason about images dynamically during its loop.
354
+
Two things are required: set `outputs_to_string={"raw_result": True}` so the `ToolInvoker` skips string conversion, and return a `list[ImageContent]` (the tool result type is `str | Sequence[TextContent | ImageContent]`).
355
+
356
+
The standard Chat Completions API doesn't support images in tool results — use `OpenAIResponsesChatGenerator` (OpenAI's Responses API) instead:
357
+
358
+
```python
359
+
from typing import Annotated
360
+
from haystack.components.agents import Agent
361
+
from haystack.components.generators.chat import OpenAIResponsesChatGenerator
362
+
from haystack.dataclasses import ChatMessage, ImageContent
363
+
from haystack.tools import tool
364
+
365
+
366
+
@tool(outputs_to_string={"raw_result": True})
367
+
def fetch_image(
368
+
url: Annotated[str, "URL of the image to fetch and analyze"],
369
+
) -> list[ImageContent]:
370
+
"""Fetch an image from a URL so the agent can analyze its contents."""
system_prompt="You are a helpful assistant that can fetch and analyze images from URLs.",
378
+
)
379
+
380
+
result = agent.run(
381
+
messages=[
382
+
ChatMessage.from_user(
383
+
"Fetch the image at https://picsum.photos/seed/haystack/640/480 and describe what you see.",
384
+
),
385
+
],
386
+
)
387
+
print(result["last_message"].text)
388
+
```
389
+
390
+
`ImageContent`can be created from a URL, a local file path, or a PDF page using the `PDFToImageContent` converter.
391
+
392
+
### In a pipeline
393
+
394
+
When an `Agent` sits inside a pipeline, use `ChatPromptBuilder` with its string template format and the `| templatize_part` filter to pass images as structured content parts:
395
+
396
+
```python
397
+
from haystack import Pipeline
398
+
from haystack.components.agents import Agent
399
+
from haystack.components.builders import ChatPromptBuilder
400
+
from haystack.components.generators.chat import OpenAIChatGenerator
# Download or provide your own chart image as "chart.png"
425
+
image = ImageContent.from_file_path("chart.png")
426
+
result = pipeline.run(
427
+
{
428
+
"prompt_builder": {"question": "What does this chart show?", "image": image},
429
+
},
430
+
)
431
+
print(result["agent"]["last_message"].text)
432
+
```
433
+
434
+
:::tip
435
+
See these cookbooks for complete multimodal agent examples:
436
+
- [Multimodal Agents](https://haystack.deepset.ai/cookbook/multimodal_intro#multimodal-agent) — image inputs and tool use with agents
437
+
- [Gemma Chat RAG](https://haystack.deepset.ai/cookbook/gemma_chat_rag) — vision model in a RAG pipeline
438
+
:::
439
+
337
440
## Multi-Agent Systems
338
441
339
442
You can wrap an `Agent` as a tool to build multi-agent systems where specialist agents handle focused subtasks and a coordinator agent plans and delegates.
@@ -363,3 +466,5 @@ Agents work with MCP in two directions:
363
466
🧑🍳 Cookbook:
364
467
365
468
- [Build a GitHub Issue Resolver Agent](https://haystack.deepset.ai/cookbook/github_issue_resolver_agent)
0 commit comments