Skip to content

Commit ae053c2

Browse files
authored
feat(provider): add llm sdk entrypoint for streaming (#25)
1 parent 25d2418 commit ae053c2

2 files changed

Lines changed: 556 additions & 51 deletions

File tree

docs/internals/llm-gateway.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ This document describes the current Layer 3 gateway entry point.
44

55
The current implementation is intentionally narrow:
66

7-
- it only handles non-streaming chat requests
7+
- it handles both complete and streaming chat requests
88
- it routes either through native format support or through the OpenAI Chat hub
9-
- it exposes a typed `ChatResponse<F>` envelope even though the current implementation only returns the `Complete` variant
9+
- it exposes a typed `ChatResponse<F>` envelope that can return either `Complete` or `Stream`
1010

1111
## Files
1212

@@ -23,12 +23,12 @@ The current slice is implemented in two files:
2323

2424
Its control flow is:
2525

26-
1. reject `stream=true` up front
26+
1. ask the format whether the request is streaming
2727
2. ask the format whether the provider supports a native path
28-
3. if native support exists, call the native non-streaming path
29-
4. otherwise bridge through the hub request/response path
28+
3. if native support exists, call the native path
29+
4. otherwise use the hub request/response path for complete calls or the hub streaming path for stream calls
3030

31-
That keeps the current implementation limited to one closed loop: typed request in, typed response out, with `Usage` attached.
31+
That keeps the current implementation limited to one closed loop: typed request in, typed response out, with `Usage` attached either directly or through a oneshot receiver.
3232

3333
## Hub Path
3434

@@ -44,34 +44,44 @@ The sequence is:
4444

4545
This keeps provider-specific JSON shape handling in the provider layer and format-specific response shape handling in the format layer. The gateway itself only orchestrates the sequence.
4646

47+
For streaming hub calls, the sequence is:
48+
49+
1. `F::to_hub()` converts the request into `ChatCompletionRequest`
50+
2. `Gateway::call_chat_hub_stream()` runs provider-side request transformation and the HTTP POST
51+
3. `select_chat_stream_reader()` chooses the raw response reader based on `StreamReaderKind`
52+
4. `HubChunkStream` parses the provider stream into hub `ChatCompletionChunk` values
53+
5. `BridgedStream<F>` converts those hub chunks into `F::StreamChunk` and sends final `Usage` through a oneshot channel
54+
55+
Today the gateway only wires `StreamReaderKind::Sse`. Other reader kinds return a validation error until their readers are implemented.
56+
4757
## Native Path
4858

4959
The native path is used when `F::native_support()` returns a `NativeHandler` for the chosen provider.
5060

51-
The native path is still non-streaming only.
52-
5361
The sequence is:
5462

5563
1. `F::call_native()` chooses the endpoint path and request body
5664
2. `Gateway::call_chat_native()` executes the HTTP POST against the provider instance base URL
57-
3. `F::parse_native_response()` parses the JSON response into `F::Response`
65+
3. for complete calls, `F::parse_native_response()` parses the JSON response into `F::Response`
66+
4. for stream calls, `NativeStream<F>` converts provider-native chunks into `F::StreamChunk` and sends final `Usage` through a oneshot channel
5867

59-
The gateway currently returns `Usage::default()` for native non-streaming calls because there is not yet a generic format hook for extracting usage out of arbitrary native response types.
68+
The gateway currently returns `Usage::default()` for native complete calls because there is not yet a generic format hook for extracting usage out of arbitrary native response types.
6069

6170
## `ChatResponse<F>`
6271

63-
`ChatResponse<F>` is introduced now even though the current code path only emits `Complete`.
72+
`ChatResponse<F>` uses a single public shape for both complete and stream responses.
6473

65-
That is deliberate for two reasons:
74+
That is deliberate for three reasons:
6675

6776
- the public typed entry point should not need a return-type rewrite once streaming is enabled
6877
- usage has different delivery timing for complete vs stream responses
78+
- the gateway can box either bridged hub streams or native streams behind one alias
6979

7080
The stream field uses a type-erased alias:
7181

7282
- `ChatResponseStream<F> = Pin<Box<dyn Stream<Item = Result<F::StreamChunk>> + Send>>`
7383

74-
That avoids hard-coding either `BridgedStream<F>` or `NativeStream<F>` into the response type. The later streaming work can box either stream adapter without changing the outer `ChatResponse<F>` shape.
84+
That avoids hard-coding either `BridgedStream<F>` or `NativeStream<F>` into the response type. The gateway can box either stream adapter without changing the outer `ChatResponse<F>` shape.
7585

7686
## Helper Naming
7787

@@ -83,21 +93,21 @@ The gateway layer will later grow non-chat entry points such as embeddings, TTS,
8393

8494
This module does not attempt to finish the full Layer 3 design.
8595

86-
- streaming requests are rejected explicitly and deferred to the later streaming gateway work
8796
- `SessionStore` is not wired yet
8897
- only `chat_completion()` is implemented as a convenience helper today
8998
- `messages()` and `responses()` remain deferred until their corresponding formats land
90-
- native non-streaming usage extraction is still format-specific future work
99+
- only `StreamReaderKind::Sse` is wired today; `AwsEventStream` and `JsonArrayStream` are still deferred
100+
- native complete-call usage extraction is still format-specific future work
91101

92102
## Why This Slice Exists
93103

94104
This is the first point where the new provider layer, format layer, and response envelope meet under one runtime entry point.
95105

96-
Without this slice, later stream work would still be missing:
106+
Without this slice, the current gateway would still be missing:
97107

98108
- a typed orchestration entry point
99109
- a place to choose native vs hub routing
100110
- a shared `ChatResponse<F>` shape
101111
- a common provider error mapping path
102112

103-
That is why the implementation stops at non-streaming correctness first and leaves stream transport integration to later gateway work.
113+
That is why the implementation first established typed complete-call orchestration and then extended the same entry point to stream transport integration.

0 commit comments

Comments
 (0)