|
| 1 | +# LLM Gateway |
| 2 | + |
| 3 | +This document describes the current Layer 3 gateway entry point. |
| 4 | + |
| 5 | +The current implementation is intentionally narrow: |
| 6 | + |
| 7 | +- it only handles non-streaming chat requests |
| 8 | +- it routes either through native format support or through the OpenAI Chat hub |
| 9 | +- it exposes a typed `ChatResponse<F>` envelope even though the current implementation only returns the `Complete` variant |
| 10 | + |
| 11 | +## Files |
| 12 | + |
| 13 | +The current slice is implemented in two files: |
| 14 | + |
| 15 | +- `gateway::gateway` in `src/gateway/gateway.rs` |
| 16 | +- `gateway::types::response` in `src/gateway/types/response.rs` |
| 17 | + |
| 18 | +`gateway::mod.rs` re-exports `Gateway`, and `gateway::types::mod.rs` re-exports `ChatResponse`. |
| 19 | + |
| 20 | +## Core Flow |
| 21 | + |
| 22 | +`Gateway::chat<F>()` is the typed entry point. |
| 23 | + |
| 24 | +Its control flow is: |
| 25 | + |
| 26 | +1. reject `stream=true` up front |
| 27 | +2. ask the format whether the provider supports a native path |
| 28 | +3. if native support exists, call the native non-streaming path |
| 29 | +4. otherwise bridge through the hub request/response path |
| 30 | + |
| 31 | +That keeps the current implementation limited to one closed loop: typed request in, typed response out, with `Usage` attached. |
| 32 | + |
| 33 | +## Hub Path |
| 34 | + |
| 35 | +The hub path is used when the format does not have native support for the chosen provider. |
| 36 | + |
| 37 | +The sequence is: |
| 38 | + |
| 39 | +1. `F::to_hub()` converts the request into `ChatCompletionRequest` |
| 40 | +2. `Gateway::call_chat_hub()` runs provider-side request transformation and the HTTP POST |
| 41 | +3. the provider definition converts the JSON response back into hub `ChatCompletionResponse` |
| 42 | +4. `extract_chat_usage_from_response()` maps OpenAI-style usage into the shared `Usage` type |
| 43 | +5. `F::from_hub()` converts the hub response back into `F::Response` |
| 44 | + |
| 45 | +This keeps provider-specific JSON shape handling in the provider layer and format-specific response shape handling in the format layer. The gateway itself only orchestrates the sequence. |
| 46 | + |
| 47 | +## Native Path |
| 48 | + |
| 49 | +The native path is used when `F::native_support()` returns a `NativeHandler` for the chosen provider. |
| 50 | + |
| 51 | +The native path is still non-streaming only. |
| 52 | + |
| 53 | +The sequence is: |
| 54 | + |
| 55 | +1. `F::call_native()` chooses the endpoint path and request body |
| 56 | +2. `Gateway::call_chat_native()` executes the HTTP POST against the provider instance base URL |
| 57 | +3. `F::parse_native_response()` parses the JSON response into `F::Response` |
| 58 | + |
| 59 | +The gateway currently returns `Usage::default()` for native non-streaming calls because there is not yet a generic format hook for extracting usage out of arbitrary native response types. |
| 60 | + |
| 61 | +## `ChatResponse<F>` |
| 62 | + |
| 63 | +`ChatResponse<F>` is introduced now even though the current code path only emits `Complete`. |
| 64 | + |
| 65 | +That is deliberate for two reasons: |
| 66 | + |
| 67 | +- the public typed entry point should not need a return-type rewrite once streaming is enabled |
| 68 | +- usage has different delivery timing for complete vs stream responses |
| 69 | + |
| 70 | +The stream field uses a type-erased alias: |
| 71 | + |
| 72 | +- `ChatResponseStream<F> = Pin<Box<dyn Stream<Item = Result<F::StreamChunk>> + Send>>` |
| 73 | + |
| 74 | +That avoids hard-coding either `BridgedStream<F>` or `NativeStream<F>` into the response type. The later streaming work can box either stream adapter without changing the outer `ChatResponse<F>` shape. |
| 75 | + |
| 76 | +## Helper Naming |
| 77 | + |
| 78 | +The internal HTTP helpers are named `call_chat_hub()` and `call_chat_native()` on purpose. |
| 79 | + |
| 80 | +The gateway layer will later grow non-chat entry points such as embeddings, TTS, STT, and image generation. Keeping the current helpers explicitly chat-scoped prevents ambiguity once those additional call paths exist. |
| 81 | + |
| 82 | +## Current Limits |
| 83 | + |
| 84 | +This module does not attempt to finish the full Layer 3 design. |
| 85 | + |
| 86 | +- streaming requests are rejected explicitly and deferred to the later streaming gateway work |
| 87 | +- `SessionStore` is not wired yet |
| 88 | +- only `chat_completion()` is implemented as a convenience helper today |
| 89 | +- `messages()` and `responses()` remain deferred until their corresponding formats land |
| 90 | +- native non-streaming usage extraction is still format-specific future work |
| 91 | + |
| 92 | +## Why This Slice Exists |
| 93 | + |
| 94 | +This is the first point where the new provider layer, format layer, and response envelope meet under one runtime entry point. |
| 95 | + |
| 96 | +Without this slice, later stream work would still be missing: |
| 97 | + |
| 98 | +- a typed orchestration entry point |
| 99 | +- a place to choose native vs hub routing |
| 100 | +- a shared `ChatResponse<F>` shape |
| 101 | +- a common provider error mapping path |
| 102 | + |
| 103 | +That is why the implementation stops at non-streaming correctness first and leaves stream transport integration to later gateway work. |
0 commit comments