You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/internals/llm-streams.md
+36-8Lines changed: 36 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,24 @@
1
1
# LLM Stream Core
2
2
3
-
This document describes the first part of the Layer 3 stream pipeline.
3
+
This document describes the current Layer 3 stream pipeline.
4
4
5
-
The current implementation introduces two building blocks:
5
+
The current implementation introduces four building blocks:
6
6
7
7
-`sse_reader` in `gateway::streams::reader`
8
8
-`HubChunkStream` in `gateway::streams::hub`
9
+
-`BridgedStream` in `gateway::streams::bridged`
10
+
-`NativeStream` in `gateway::streams::native`
9
11
10
12
## Scope
11
13
12
-
This slice only covers the hub-facing stream foundation.
14
+
This slice now covers both hub-facing and native stream adapters.
13
15
14
16
-`sse_reader` turns a byte stream into complete SSE lines.
15
17
-`HubChunkStream` turns provider stream lines into hub `ChatCompletionChunk` values.
18
+
-`BridgedStream` turns hub chunks into a concrete `ChatFormat` stream.
19
+
-`NativeStream` bypasses hub chunks and lets a `ChatFormat` decode native provider stream lines directly.
16
20
17
-
`BridgedStream` and `NativeStream` are intentionally deferred to later steps.
21
+
Gateway request execution still sits in a later step, but the reusable stream adapters are now in place.
18
22
19
23
## `sse_reader`
20
24
@@ -41,12 +45,35 @@ Its polling behavior is deliberately ordered:
41
45
42
46
That fixes the earlier class of bug where a provider transform could return multiple chunks for one raw input line and only the first chunk would be observed.
43
47
48
+
## `BridgedStream`
49
+
50
+
`BridgedStream` sits one layer above `HubChunkStream`.
51
+
52
+
Its behavior mirrors the hub adapter:
53
+
54
+
1. drain any already buffered format-specific chunks
55
+
2. poll `HubChunkStream` only when that buffer is empty
56
+
3. call `ChatFormat::from_hub_stream()` for each hub chunk
57
+
4. return the first bridged chunk immediately and queue the rest
58
+
59
+
When the hub stream ends, `BridgedStream` also calls `ChatFormat::stream_end_events()` so formats can emit explicit terminators such as final SSE events.
60
+
61
+
## `NativeStream`
62
+
63
+
`NativeStream` is the direct counterpart for native-format paths.
64
+
65
+
Instead of going through hub `ChatCompletionChunk` values, it passes each raw provider stream line to `ChatFormat::transform_native_stream_chunk()`. Buffering rules are the same: if one input line expands into multiple output items, the adapter returns the first one immediately and preserves the rest for later polls.
Whenever a transformed hub chunk carries `usage`, the stream copies `prompt_tokens` and `completion_tokens` into `ChatStreamState`. This keeps token accounting outside individual provider transforms while still making the latest usage totals available to later pipeline stages.
49
72
73
+
`BridgedStream` reports those latest hub totals through a oneshot channel on both normal completion and premature drop. It only fills fields that were actually observed in the hub stream, and it derives `total_tokens` when both sides are known.
74
+
75
+
`NativeStream` exposes the same completion and drop hook through `ChatFormat::native_usage()`. Formats that do not override that hook still send an empty `Usage` value, but native-capable formats can now report their own accumulated usage snapshot without coupling the generic stream adapter to any one state shape.
76
+
50
77
## Stream State
51
78
52
79
`ChatStreamState` now carries both aggregation data and provider stream metadata.
@@ -63,8 +90,9 @@ Those metadata fields are required because some providers only emit response ide
63
90
64
91
This implementation is intentionally narrow.
65
92
66
-
- only the SSE reader is implemented in this slice
93
+
- only the SSE reader kind is implemented in this slice
67
94
-`JsonArrayStream` and `AwsEventStream` readers are still future work
68
-
- no format bridging happens here yet; this stream only produces hub chunks
95
+
- the legacy providers under `src/providers/` still keep their own SSE splitting logic
96
+
- no production native format has started overriding `ChatFormat::native_usage()` yet
69
97
70
-
That keeps the first stream-layer step focused on correctness of buffering, polling order, and usage propagation.
98
+
That keeps the stream-layer work focused on buffering correctness, polling order, and handoff between provider, hub, and format-specific stream representations.
0 commit comments