Skip to content

Commit 25d2418

Browse files
authored
feat(provider): add llm sdk entrypoint (#24)
1 parent 971118e commit 25d2418

5 files changed

Lines changed: 741 additions & 0 deletions

File tree

docs/internals/llm-gateway.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# LLM Gateway
2+
3+
This document describes the current Layer 3 gateway entry point.
4+
5+
The current implementation is intentionally narrow:
6+
7+
- it only handles non-streaming chat requests
8+
- it routes either through native format support or through the OpenAI Chat hub
9+
- it exposes a typed `ChatResponse<F>` envelope even though the current implementation only returns the `Complete` variant
10+
11+
## Files
12+
13+
The current slice is implemented in two files:
14+
15+
- `gateway::gateway` in `src/gateway/gateway.rs`
16+
- `gateway::types::response` in `src/gateway/types/response.rs`
17+
18+
`gateway::mod.rs` re-exports `Gateway`, and `gateway::types::mod.rs` re-exports `ChatResponse`.
19+
20+
## Core Flow
21+
22+
`Gateway::chat<F>()` is the typed entry point.
23+
24+
Its control flow is:
25+
26+
1. reject `stream=true` up front
27+
2. ask the format whether the provider supports a native path
28+
3. if native support exists, call the native non-streaming path
29+
4. otherwise bridge through the hub request/response path
30+
31+
That keeps the current implementation limited to one closed loop: typed request in, typed response out, with `Usage` attached.
32+
33+
## Hub Path
34+
35+
The hub path is used when the format does not have native support for the chosen provider.
36+
37+
The sequence is:
38+
39+
1. `F::to_hub()` converts the request into `ChatCompletionRequest`
40+
2. `Gateway::call_chat_hub()` runs provider-side request transformation and the HTTP POST
41+
3. the provider definition converts the JSON response back into hub `ChatCompletionResponse`
42+
4. `extract_chat_usage_from_response()` maps OpenAI-style usage into the shared `Usage` type
43+
5. `F::from_hub()` converts the hub response back into `F::Response`
44+
45+
This keeps provider-specific JSON shape handling in the provider layer and format-specific response shape handling in the format layer. The gateway itself only orchestrates the sequence.
46+
47+
## Native Path
48+
49+
The native path is used when `F::native_support()` returns a `NativeHandler` for the chosen provider.
50+
51+
The native path is still non-streaming only.
52+
53+
The sequence is:
54+
55+
1. `F::call_native()` chooses the endpoint path and request body
56+
2. `Gateway::call_chat_native()` executes the HTTP POST against the provider instance base URL
57+
3. `F::parse_native_response()` parses the JSON response into `F::Response`
58+
59+
The gateway currently returns `Usage::default()` for native non-streaming calls because there is not yet a generic format hook for extracting usage out of arbitrary native response types.
60+
61+
## `ChatResponse<F>`
62+
63+
`ChatResponse<F>` is introduced now even though the current code path only emits `Complete`.
64+
65+
That is deliberate for two reasons:
66+
67+
- the public typed entry point should not need a return-type rewrite once streaming is enabled
68+
- usage has different delivery timing for complete vs stream responses
69+
70+
The stream field uses a type-erased alias:
71+
72+
- `ChatResponseStream<F> = Pin<Box<dyn Stream<Item = Result<F::StreamChunk>> + Send>>`
73+
74+
That avoids hard-coding either `BridgedStream<F>` or `NativeStream<F>` into the response type. The later streaming work can box either stream adapter without changing the outer `ChatResponse<F>` shape.
75+
76+
## Helper Naming
77+
78+
The internal HTTP helpers are named `call_chat_hub()` and `call_chat_native()` on purpose.
79+
80+
The gateway layer will later grow non-chat entry points such as embeddings, TTS, STT, and image generation. Keeping the current helpers explicitly chat-scoped prevents ambiguity once those additional call paths exist.
81+
82+
## Current Limits
83+
84+
This module does not attempt to finish the full Layer 3 design.
85+
86+
- streaming requests are rejected explicitly and deferred to the later streaming gateway work
87+
- `SessionStore` is not wired yet
88+
- only `chat_completion()` is implemented as a convenience helper today
89+
- `messages()` and `responses()` remain deferred until their corresponding formats land
90+
- native non-streaming usage extraction is still format-specific future work
91+
92+
## Why This Slice Exists
93+
94+
This is the first point where the new provider layer, format layer, and response envelope meet under one runtime entry point.
95+
96+
Without this slice, later stream work would still be missing:
97+
98+
- a typed orchestration entry point
99+
- a place to choose native vs hub routing
100+
- a shared `ChatResponse<F>` shape
101+
- a common provider error mapping path
102+
103+
That is why the implementation stops at non-streaming correctness first and leaves stream transport integration to later gateway work.

0 commit comments

Comments
 (0)