Problem Statement
Many users need to supply an alternative interface-compatible implementation of openai.client (e.g., a GuardrailsAsyncOpenAI wrapper to implement OpenAI Guardrails). Today the SDK creates a new AsyncOpenAI client per request to avoid HTTPX connection sharing across event loops. This makes it impossible to:
- Inject a pre-configured, guardrails-enabled client.
- Reuse connection pools efficiently within a single event loop/worker.
- Centralise observability, retries, timeouts and networking policy on one client.
Separately, Strands currently has very limited support for guardrails outside of Bedrock Guardrails, so users commonly reach for OpenAI-side guardrails via a wrapper client. Without support for injecting a fixed client, this is impossible
Proposed Solution
Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown. Continue to support current behaviour when no client is provided (backwards compatible).
Key changes (additive, non-breaking):
-
Constructor injection
OpenAIModel(client: Optional[Client] = None, client_args: Optional[dict] = None, …)
- If
client is provided, reuse it and do not create/close a new client internally.
- If
client is None, retain current behaviour (construct ephemeral client).
-
Lifecycle guidance in docs
- Recommend creating one client per worker/event loop (e.g., FastAPI lifespan
startup/shutdown).
- Emphasise that clients should not be shared across event loops, but can be safely reused across tasks within a loop.
-
Acceptance criteria
- Works with any
AsyncOpenAI-compatible interface (e.g., GuardrailsAsyncOpenAI, custom proxies, instrumentation wrappers).
- Streaming and structured output paths both reuse the injected client.
- Clear examples for FastAPI and generic asyncio.
Code sketch (constructor + reuse):
from typing import Any, Optional, Protocol
from openai import AsyncOpenAI
class Client(Protocol):
@property
def chat(self) -> Any: ...
class OpenAIModel(Model):
def __init__(self, client: Optional[Client] = None, client_args: Optional[dict] = None, **config):
self.client = client
self._owns_client = client is None
self.client_args = client_args or {}
self.config = dict(config)
async def stream(...):
request = self.format_request(...)
if self.client is not None:
# Reuse injected client
response = await self.client.chat.completions.create(**request)
...
else:
# Back-compat
async with AsyncOpenAI(**self.client_args) as c:
response = await c.chat.completions.create(**request)
...
Example usage (FastAPI lifespan, per-worker client):
from fastapi import FastAPI
from openai import AsyncOpenAI
# from my_guardrails import GuardrailsAsyncOpenAI
app = FastAPI()
@app.on_event("startup")
async def startup():
base = AsyncOpenAI()
app.state.oai = base # or GuardrailsAsyncOpenAI(base)
app.state.model = OpenAIModel(client=app.state.oai, model_id="gpt-4o")
@app.on_event("shutdown")
async def shutdown():
await app.state.oai.close()
Use Case
- Guardrails: Wrap the OpenAI client with
GuardrailsAsyncOpenAI to enforce content filters, schema validation, and redaction before responses reach application code.
- Observability & policy: Centralise timeouts, retries, logging, tracing, and network egress policy (e.g., custom
httpx.AsyncClient).
- Performance: Reuse keep-alive connections and connection pools within a worker/event loop for lower latency and higher throughput.
- Multi-model routing: Swap the injected client to target proxies or gateways without touching model code (e.g., toggling
base_url, auth, or headers).
This would help with:
- Meeting compliance requirements where guardrails must run before responses are consumed.
- Reducing tail latency by avoiding per-request client construction.
- Simplifying integration with enterprise networking and telemetry.
Alternatives Solutions
-
Create a new client per request
- Pros: Safe wrt event-loop boundaries; current behaviour.
- Cons: Loses pooling; higher latency and allocation overhead; hard to apply cross-cutting concerns (guardrails, tracing) consistently.
-
Global client shared across event loops
- Pros: Simple in theory.
- Cons: Unsafe; HTTPX pools cannot be shared across loops; leads to intermittent runtime errors.
-
Disable pooling (force Connection: close)
- Pros: Avoids cross-loop sharing issues.
- Cons: Sacrifices performance; still doesn’t enable easy injection of guardrails wrappers.
Additional Context
- Rationale: HTTPX connection pools are not shareable across asyncio event loops; reuse is safe within a loop.
- Need: Strands’ current guardrails support focuses on Bedrock; many users need OpenAI-side guardrails today.
- The OpenAI Python SDK supports async reuse and custom HTTP clients (
http_client=), making injection straightforward.
If useful, I’m happy to contribute a PR with the constructor change, a small _stream_with_client helper, tests, and docs.
Problem Statement
Many users need to supply an alternative interface-compatible implementation of
openai.client(e.g., aGuardrailsAsyncOpenAIwrapper to implement OpenAI Guardrails). Today the SDK creates a newAsyncOpenAIclient per request to avoid HTTPX connection sharing across event loops. This makes it impossible to:Separately, Strands currently has very limited support for guardrails outside of Bedrock Guardrails, so users commonly reach for OpenAI-side guardrails via a wrapper client. Without support for injecting a fixed client, this is impossible
Proposed Solution
Allow
OpenAIModelto accept a fixed, injectedAsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown. Continue to support current behaviour when no client is provided (backwards compatible).Key changes (additive, non-breaking):
Constructor injection
OpenAIModel(client: Optional[Client] = None, client_args: Optional[dict] = None, …)clientis provided, reuse it and do not create/close a new client internally.clientisNone, retain current behaviour (construct ephemeral client).Lifecycle guidance in docs
startup/shutdown).Acceptance criteria
AsyncOpenAI-compatible interface (e.g.,GuardrailsAsyncOpenAI, custom proxies, instrumentation wrappers).Code sketch (constructor + reuse):
Example usage (FastAPI lifespan, per-worker client):
Use Case
GuardrailsAsyncOpenAIto enforce content filters, schema validation, and redaction before responses reach application code.httpx.AsyncClient).base_url, auth, or headers).This would help with:
Alternatives Solutions
Create a new client per request
Global client shared across event loops
Disable pooling (force
Connection: close)Additional Context
http_client=), making injection straightforward.If useful, I’m happy to contribute a PR with the constructor change, a small
_stream_with_clienthelper, tests, and docs.