|
| 1 | +# Tower based middlewares |
| 2 | + |
| 3 | +Enable the `middleware` feature to customize the HTTP execution path with Tower services and layers. |
| 4 | + |
| 5 | +The middleware boundary is intentionally below the API groups and above the concrete HTTP transport, an example middleware stack: |
| 6 | + |
| 7 | +```text |
| 8 | +async-openai API groups |
| 9 | + responses(), chat(), files(), ... |
| 10 | + | |
| 11 | + v |
| 12 | + HttpRequestFactory |
| 13 | + | |
| 14 | + v |
| 15 | ++----- concurrency_limit ------+ |
| 16 | +| +------- timeout ----------+ | |
| 17 | +| | +-- OpenAIRetryLayer --+ | | |
| 18 | +| | | | | | |
| 19 | +| | | ReqwestService or | | | |
| 20 | +| | | custom service | | | |
| 21 | +| | | | | | |
| 22 | +| | +-- OpenAIRetryLayer --+ | | |
| 23 | +| +------- timeout ----------+ | |
| 24 | ++----- concurrency_limit ------+ |
| 25 | + | |
| 26 | + v |
| 27 | + reqwest::Response |
| 28 | +``` |
| 29 | + |
| 30 | +The request value passed through tower is `HttpRequestFactory`, not `reqwest::Request`. This is deliberate: `reqwest::Request` is not generally cloneable once it contains a streaming body, but retry middleware needs a way to replay a request. The factory is cheap to clone and rebuilds a fresh `reqwest::Request` for each attempt. |
| 31 | + |
| 32 | +## Use the Default `ReqwestService` |
| 33 | + |
| 34 | +`ReqwestService` is a tower service backed by `reqwest::Client`. It is used by default to make outbound HTTP requests. |
| 35 | + |
| 36 | +```rust |
| 37 | +use async_openai::{Client, config::OpenAIConfig}; |
| 38 | +use async_openai::middleware::{retry::OpenAIRetryLayer, ReqwestService}; |
| 39 | +use std::time::Duration; |
| 40 | + |
| 41 | +let service = tower::ServiceBuilder::new() |
| 42 | + .concurrency_limit(8) |
| 43 | + .timeout(Duration::from_secs(30)) |
| 44 | + .layer(OpenAIRetryLayer::default()) |
| 45 | + .service(ReqwestService::new(reqwest::Client::new())); |
| 46 | + |
| 47 | +let client = Client::with_config(OpenAIConfig::default()) |
| 48 | + .with_http_service(service); |
| 49 | +``` |
| 50 | + |
| 51 | +## Use a Custom Service |
| 52 | + |
| 53 | +You can replace `ReqwestService` entirely. This is useful for logging, metrics, tests, mocks, alternate transports, or policy layers that want to inspect the generated request before sending it. |
| 54 | + |
| 55 | +```rust |
| 56 | +use async_openai::{Client, config::OpenAIConfig, error::OpenAIError}; |
| 57 | +use async_openai::middleware::HttpRequestFactory; |
| 58 | +use tower::service_fn; |
| 59 | + |
| 60 | +let service = service_fn(|factory: HttpRequestFactory| async move { |
| 61 | + let request = factory.build().await?; |
| 62 | + |
| 63 | + // here you can inspect, modify, or log the request, route it somewhere else, |
| 64 | + // or return a synthetic response for testing. |
| 65 | + |
| 66 | + println!("sending {} {}", request.method(), request.url()); |
| 67 | + |
| 68 | + reqwest::Client::new() |
| 69 | + .execute(request) |
| 70 | + .await |
| 71 | + .map_err(OpenAIError::Reqwest) |
| 72 | +}); |
| 73 | + |
| 74 | +let client = Client::with_config(OpenAIConfig::default()) |
| 75 | + .with_http_service(service); |
| 76 | +``` |
| 77 | + |
| 78 | +## Retry layer |
| 79 | + |
| 80 | +`middleware::retry::OpenAIRetryLayer` is a Tower layer and `middleware::retry::SimpleRetryPolicy` is a Tower retry policy. |
| 81 | + |
| 82 | +Both attempt retries with exponential backoff on `429`, `5xx` and connection errors and respects `Retry-After` header. |
| 83 | + |
| 84 | +The difference is that upon seeing 429, `OpenAIRetryLayer` consumes response body to check if it is a rate limit (retryable error) or insufficient quota (permanent error). The default async-openai client uses this layer internally for library's default retry behavior. |
| 85 | + |
| 86 | +The retry boundary is `HttpRequestFactory`. Retrying clones the factory and rebuilds a fresh `reqwest::Request` for each attempt instead of cloning a built request. That matters because `reqwest::Request` is not Clone. |
| 87 | + |
| 88 | +`middleware::retry::SimpleRetryPolicy` uses `middleware::retry::should_retry` to determine if a request should be retried. |
| 89 | + |
| 90 | +Custom tower retry policies can call `middleware::retry::should_retry` to reuse the same retry classification while changing delay behavior. |
| 91 | + |
| 92 | +On native targets retries wait using `tokio::time::sleep`. On WASM retries are immediate. |
| 93 | + |
| 94 | +## Native and WASM bounds |
| 95 | + |
| 96 | +The conceptual middleware boundary stays the same; only the platform thread-safety bounds differ. |
| 97 | + |
| 98 | +On native targets, middleware services installed with `Client::with_http_service` must be `Send + Sync + 'static` and return `Send + 'static` futures. |
| 99 | + |
| 100 | +On WASM targets, middleware services and futures must be `'static`. |
| 101 | + |
| 102 | +## Bring Your Own Types Interaction |
| 103 | + |
| 104 | +With the `byot` feature, generated `*_byot` methods keep minimal trait bounds. When `middleware` feature is enabled additional `MiddlewareInput` bounds are added based on native or WASM targets so the input can be stored long enough to rebuild a fresh request for retries. |
| 105 | + |
| 106 | +## Error Handling |
| 107 | + |
| 108 | +`OpenAIError::Boxed` is available only when the `middleware` feature is enabled. |
| 109 | + |
| 110 | +Custom middleware services installed with `Client::with_http_service` may use any error type that implements `Into<OpenAIError>`. This lets middleware preserve structured errors when it has a dedicated `OpenAIError` conversion. |
| 111 | + |
| 112 | +Tower's `BoxError` converts into `OpenAIError::Boxed`, which is useful for generic tower layers whose concrete error type is erased. Callers can still downcast the boxed error when they know the original error type. |
0 commit comments