Skip to content

Commit 9e1dcb8

Browse files
authored
docs: Update the docs on streaming callback (#11150)
1 parent fd63f53 commit 9e1dcb8

23 files changed

Lines changed: 465 additions & 275 deletions

File tree

docs-website/docs/pipeline-components/generators/fallbackchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ You can use this metadata to:
7575

7676
### Streaming
7777

78-
`FallbackChatGenerator` supports streaming through the `streaming_callback` parameter. The callback is passed directly to the underlying Generators.
78+
`FallbackChatGenerator` supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) through the `streaming_callback` parameter. The callback is passed directly to the underlying Generators.
7979

8080
## Usage
8181

docs-website/docs/pipeline-components/generators/guides-to-generators/choosing-the-right-generator.mdx

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ When you enable streaming, the generator calls your `streaming_callback` for eve
3636
- **Tool calls**: The model is building a tool/function call. Read `chunk.tool_calls`.
3737
- **Tool result**: A tool finished and returned output. Read `chunk.tool_call_result`.
3838
- **Text tokens**: Normal assistant text. Read `chunk.content`.
39+
- **Reasoning tokens**: Extended thinking output (for models that support it). Read `chunk.reasoning`.
3940

4041
Only one of these fields appears per chunk. Use `chunk.start` and `chunk.finish_reason` to detect boundaries. Use `chunk.index` and `chunk.component_info` for tracing.
4142

@@ -46,44 +47,59 @@ For providers that support multiple candidates, set `n=1` to stream.
4647
Check out the parameter details in our [API Reference for StreamingChunk](/reference/data-classes-api#streamingchunk).
4748
:::
4849

49-
The simplest way is to use the built-in `print_streaming_chunk` function. It handles tool calls, tool results, and text tokens.
50+
The simplest way is to use the built-in `print_streaming_chunk` function. It handles all chunk types and prints formatted output to stdout:
5051

5152
```python
5253
from haystack.components.generators.utils import print_streaming_chunk
5354

5455
generator = SomeGenerator(streaming_callback=print_streaming_chunk)
55-
## For ChatGenerators, pass a list[ChatMessage]. For text generators, pass a prompt string.
56+
# For ChatGenerators, pass a list[ChatMessage]. For text generators, pass a prompt string.
5657
```
5758

5859
### Custom Callback
5960

60-
If you need custom rendering, you can create your own callback.
61-
62-
Handle the three chunk types in this order: tool calls, tool result, and text.
61+
If you need custom rendering, write your own callback. Handle the four chunk types in order:
6362

6463
```python
6564
from haystack.dataclasses import StreamingChunk
6665

6766

68-
def my_stream(chunk: StreamingChunk):
69-
if chunk.start:
70-
on_start() # e.g., open an SSE stream
67+
def my_streaming_callback(chunk: StreamingChunk) -> None:
68+
if chunk.start and chunk.index and chunk.index > 0:
69+
print("\n\n", flush=True, end="")
7170

72-
# 1) Tool calls: name and JSON args arrive as deltas
71+
# Tool Call streaming
7372
if chunk.tool_calls:
74-
for t in chunk.tool_calls:
75-
on_tool_call_delta(index=t.index, name=t.tool_name, args_delta=t.arguments)
76-
77-
# 2) Tool result: final output from the tool
78-
if chunk.tool_call_result is not None:
79-
on_tool_result(chunk.tool_call_result)
80-
81-
# 3) Text tokens
73+
for tool_call in chunk.tool_calls:
74+
if chunk.start:
75+
if chunk.index and tool_call.index > chunk.index:
76+
print("\n\n", flush=True, end="")
77+
print(
78+
f">>> Tool Call: {tool_call.tool_name}\n>>> Arguments: ",
79+
flush=True,
80+
end="",
81+
)
82+
if tool_call.arguments:
83+
print(tool_call.arguments, flush=True, end="")
84+
85+
# Tool Result streaming
86+
if chunk.tool_call_result:
87+
print(f">>> Tool Result\n{chunk.tool_call_result.result}", flush=True, end="")
88+
89+
# Text streaming
8290
if chunk.content:
83-
on_text_delta(chunk.content)
84-
85-
if chunk.finish_reason:
86-
on_finish(chunk.finish_reason)
91+
if chunk.start:
92+
print(">>> Assistant\n", flush=True, end="")
93+
print(chunk.content, flush=True, end="")
94+
95+
# Reasoning streaming
96+
if chunk.reasoning:
97+
if chunk.start:
98+
print(">>> Reasoning\n", flush=True, end="")
99+
print(chunk.reasoning.reasoning_text, flush=True, end="")
100+
101+
if chunk.finish_reason is not None:
102+
print("\n\n", flush=True, end="")
87103
```
88104

89105
### Agents and Tools

docs-website/versioned_docs/version-2.18/pipeline-components/generators/guides-to-generators/choosing-the-right-generator.mdx

Lines changed: 39 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ The choice between Generators and ChatGenerators depends on your use case and th
2323

2424
:::tip
2525
To learn more about this comparison, check out our [Generators vs Chat Generators](generators-vs-chat-generators.mdx) guide.
26-
2726
:::
2827

2928
## Streaming Support
@@ -37,55 +36,70 @@ When you enable streaming, the generator calls your `streaming_callback` for eve
3736
- **Tool calls**: The model is building a tool/function call. Read `chunk.tool_calls`.
3837
- **Tool result**: A tool finished and returned output. Read `chunk.tool_call_result`.
3938
- **Text tokens**: Normal assistant text. Read `chunk.content`.
39+
- **Reasoning tokens**: Extended thinking output (for models that support it). Read `chunk.reasoning`.
4040

4141
Only one of these fields appears per chunk. Use `chunk.start` and `chunk.finish_reason` to detect boundaries. Use `chunk.index` and `chunk.component_info` for tracing.
4242

4343
For providers that support multiple candidates, set `n=1` to stream.
4444

45-
:::note
46-
Parameter Details
45+
:::info[Parameter Details]
4746

4847
Check out the parameter details in our [API Reference for StreamingChunk](/reference/data-classes-api#streamingchunk).
4948
:::
5049

51-
The simplest way is to use the built-in `print_streaming_chunk` function. It handles tool calls, tool results, and text tokens.
50+
The simplest way is to use the built-in `print_streaming_chunk` function. It handles all chunk types and prints formatted output to stdout:
5251

5352
```python
5453
from haystack.components.generators.utils import print_streaming_chunk
5554

5655
generator = SomeGenerator(streaming_callback=print_streaming_chunk)
57-
## For ChatGenerators, pass a list[ChatMessage]. For text generators, pass a prompt string.
56+
# For ChatGenerators, pass a list[ChatMessage]. For text generators, pass a prompt string.
5857
```
5958

6059
### Custom Callback
6160

62-
If you need custom rendering, you can create your own callback.
63-
64-
Handle the three chunk types in this order: tool calls, tool result, and text.
61+
If you need custom rendering, write your own callback. Handle the four chunk types in order:
6562

6663
```python
6764
from haystack.dataclasses import StreamingChunk
6865

6966

70-
def my_stream(chunk: StreamingChunk):
71-
if chunk.start:
72-
on_start() # e.g., open an SSE stream
67+
def my_streaming_callback(chunk: StreamingChunk) -> None:
68+
if chunk.start and chunk.index and chunk.index > 0:
69+
print("\n\n", flush=True, end="")
7370

74-
# 1) Tool calls: name and JSON args arrive as deltas
71+
# Tool Call streaming
7572
if chunk.tool_calls:
76-
for t in chunk.tool_calls:
77-
on_tool_call_delta(index=t.index, name=t.tool_name, args_delta=t.arguments)
78-
79-
# 2) Tool result: final output from the tool
80-
if chunk.tool_call_result is not None:
81-
on_tool_result(chunk.tool_call_result)
82-
83-
# 3) Text tokens
73+
for tool_call in chunk.tool_calls:
74+
if chunk.start:
75+
if chunk.index and tool_call.index > chunk.index:
76+
print("\n\n", flush=True, end="")
77+
print(
78+
f">>> Tool Call: {tool_call.tool_name}\n>>> Arguments: ",
79+
flush=True,
80+
end="",
81+
)
82+
if tool_call.arguments:
83+
print(tool_call.arguments, flush=True, end="")
84+
85+
# Tool Result streaming
86+
if chunk.tool_call_result:
87+
print(f">>> Tool Result\n{chunk.tool_call_result.result}", flush=True, end="")
88+
89+
# Text streaming
8490
if chunk.content:
85-
on_text_delta(chunk.content)
86-
87-
if chunk.finish_reason:
88-
on_finish(chunk.finish_reason)
91+
if chunk.start:
92+
print(">>> Assistant\n", flush=True, end="")
93+
print(chunk.content, flush=True, end="")
94+
95+
# Reasoning streaming
96+
if chunk.reasoning:
97+
if chunk.start:
98+
print(">>> Reasoning\n", flush=True, end="")
99+
print(chunk.reasoning.reasoning_text, flush=True, end="")
100+
101+
if chunk.finish_reason is not None:
102+
print("\n\n", flush=True, end="")
89103
```
90104

91105
### Agents and Tools
@@ -105,8 +119,7 @@ We also support [Amazon Bedrock](../amazonbedrockgenerator.mdx): it provides acc
105119

106120
When discussing open (weights) models, we're referring to models with public weights that anyone can deploy on their infrastructure. The datasets used for training are shared less frequently. One could choose to use an open model for several reasons, including more transparency and control of the model.
107121

108-
:::note
109-
Commercial Use
122+
:::info[Commercial Use]
110123

111124
Not all open models are suitable for commercial use. We advise thoroughly reviewing the license, typically available on Hugging Face, before considering their adoption.
112125
:::

docs-website/versioned_docs/version-2.19/pipeline-components/generators/fallbackchatgenerator.mdx

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ You can use this metadata to:
7575

7676
### Streaming
7777

78-
`FallbackChatGenerator` supports streaming through the `streaming_callback` parameter. The callback is passed directly to the underlying Generators.
78+
`FallbackChatGenerator` supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) through the `streaming_callback` parameter. The callback is passed directly to the underlying Generators.
7979

8080
## Usage
8181

@@ -84,10 +84,7 @@ You can use this metadata to:
8484
Basic usage with fallback from a primary to a backup model:
8585

8686
```python
87-
from haystack.components.generators.chat import (
88-
FallbackChatGenerator,
89-
OpenAIChatGenerator,
90-
)
87+
from haystack.components.generators.chat import FallbackChatGenerator, OpenAIChatGenerator
9188
from haystack.dataclasses import ChatMessage
9289

9390
## Create primary and backup generators
@@ -104,6 +101,11 @@ result = generator.run(messages=messages)
104101
print(result["replies"][0].text)
105102
print(f"Successful generator: {result['meta']['successful_chat_generator_class']}")
106103
print(f"Total attempts: {result['meta']['total_attempts']}")
104+
105+
>> Natural Language Processing (NLP) is a field of artificial intelligence that
106+
>> focuses on the interaction between computers and humans through natural language...
107+
>> Successful generator: OpenAIChatGenerator
108+
>> Total attempts: 1
107109
```
108110

109111
With multiple providers:

docs-website/versioned_docs/version-2.19/pipeline-components/generators/guides-to-generators/choosing-the-right-generator.mdx

Lines changed: 39 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ The choice between Generators and ChatGenerators depends on your use case and th
2323

2424
:::tip
2525
To learn more about this comparison, check out our [Generators vs Chat Generators](generators-vs-chat-generators.mdx) guide.
26-
2726
:::
2827

2928
## Streaming Support
@@ -37,54 +36,70 @@ When you enable streaming, the generator calls your `streaming_callback` for eve
3736
- **Tool calls**: The model is building a tool/function call. Read `chunk.tool_calls`.
3837
- **Tool result**: A tool finished and returned output. Read `chunk.tool_call_result`.
3938
- **Text tokens**: Normal assistant text. Read `chunk.content`.
39+
- **Reasoning tokens**: Extended thinking output (for models that support it). Read `chunk.reasoning`.
4040

4141
Only one of these fields appears per chunk. Use `chunk.start` and `chunk.finish_reason` to detect boundaries. Use `chunk.index` and `chunk.component_info` for tracing.
4242

4343
For providers that support multiple candidates, set `n=1` to stream.
4444

45-
:::note[Parameter Details]
45+
:::info[Parameter Details]
4646

4747
Check out the parameter details in our [API Reference for StreamingChunk](/reference/data-classes-api#streamingchunk).
4848
:::
4949

50-
The simplest way is to use the built-in `print_streaming_chunk` function. It handles tool calls, tool results, and text tokens.
50+
The simplest way is to use the built-in `print_streaming_chunk` function. It handles all chunk types and prints formatted output to stdout:
5151

5252
```python
5353
from haystack.components.generators.utils import print_streaming_chunk
5454

5555
generator = SomeGenerator(streaming_callback=print_streaming_chunk)
56-
## For ChatGenerators, pass a list[ChatMessage]. For text generators, pass a prompt string.
56+
# For ChatGenerators, pass a list[ChatMessage]. For text generators, pass a prompt string.
5757
```
5858

5959
### Custom Callback
6060

61-
If you need custom rendering, you can create your own callback.
62-
63-
Handle the three chunk types in this order: tool calls, tool result, and text.
61+
If you need custom rendering, write your own callback. Handle the four chunk types in order:
6462

6563
```python
6664
from haystack.dataclasses import StreamingChunk
6765

6866

69-
def my_stream(chunk: StreamingChunk):
70-
if chunk.start:
71-
on_start() # e.g., open an SSE stream
67+
def my_streaming_callback(chunk: StreamingChunk) -> None:
68+
if chunk.start and chunk.index and chunk.index > 0:
69+
print("\n\n", flush=True, end="")
7270

73-
# 1) Tool calls: name and JSON args arrive as deltas
71+
# Tool Call streaming
7472
if chunk.tool_calls:
75-
for t in chunk.tool_calls:
76-
on_tool_call_delta(index=t.index, name=t.tool_name, args_delta=t.arguments)
77-
78-
# 2) Tool result: final output from the tool
79-
if chunk.tool_call_result is not None:
80-
on_tool_result(chunk.tool_call_result)
81-
82-
# 3) Text tokens
73+
for tool_call in chunk.tool_calls:
74+
if chunk.start:
75+
if chunk.index and tool_call.index > chunk.index:
76+
print("\n\n", flush=True, end="")
77+
print(
78+
f">>> Tool Call: {tool_call.tool_name}\n>>> Arguments: ",
79+
flush=True,
80+
end="",
81+
)
82+
if tool_call.arguments:
83+
print(tool_call.arguments, flush=True, end="")
84+
85+
# Tool Result streaming
86+
if chunk.tool_call_result:
87+
print(f">>> Tool Result\n{chunk.tool_call_result.result}", flush=True, end="")
88+
89+
# Text streaming
8390
if chunk.content:
84-
on_text_delta(chunk.content)
85-
86-
if chunk.finish_reason:
87-
on_finish(chunk.finish_reason)
91+
if chunk.start:
92+
print(">>> Assistant\n", flush=True, end="")
93+
print(chunk.content, flush=True, end="")
94+
95+
# Reasoning streaming
96+
if chunk.reasoning:
97+
if chunk.start:
98+
print(">>> Reasoning\n", flush=True, end="")
99+
print(chunk.reasoning.reasoning_text, flush=True, end="")
100+
101+
if chunk.finish_reason is not None:
102+
print("\n\n", flush=True, end="")
88103
```
89104

90105
### Agents and Tools
@@ -104,7 +119,7 @@ We also support [Amazon Bedrock](../amazonbedrockgenerator.mdx): it provides acc
104119

105120
When discussing open (weights) models, we're referring to models with public weights that anyone can deploy on their infrastructure. The datasets used for training are shared less frequently. One could choose to use an open model for several reasons, including more transparency and control of the model.
106121

107-
:::note[Commercial Use]
122+
:::info[Commercial Use]
108123

109124
Not all open models are suitable for commercial use. We advise thoroughly reviewing the license, typically available on Hugging Face, before considering their adoption.
110125
:::

docs-website/versioned_docs/version-2.20/pipeline-components/generators/fallbackchatgenerator.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ You can use this metadata to:
7575

7676
### Streaming
7777

78-
`FallbackChatGenerator` supports streaming through the `streaming_callback` parameter. The callback is passed directly to the underlying Generators.
78+
`FallbackChatGenerator` supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) through the `streaming_callback` parameter. The callback is passed directly to the underlying Generators.
7979

8080
## Usage
8181

0 commit comments

Comments
 (0)