You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: restore prompt cache and reuse connections in websocket mode
Since v2.2.7, requests without an explicit session identifier got a
random stateless- connection ID per request on the WS upstream path,
which leaked into both the prompt_cache_key body field and the
Session_id/Conversation_id handshake headers. The upstream prompt cache
therefore never hit in WS mode (HTTP mode was unaffected), and sustained
load opened a new WS connection per request until the upstream throttled
handshakes (bad handshake -> 503 at ~200 RPM).
- Inject a deterministic prompt cache key (derived from the downstream
API key, falling back to the account) when the body has none, matching
HTTP-path behavior; stateless IDs no longer overwrite it
- Send the deterministic key in WS handshake session headers; the
stateless ID is only used for local connection pool isolation
- Reuse WS connections for sessionless requests via per-(account, cache
key) slots (8), falling back to one-shot connections when all slots
are busy, eliminating per-request handshakes under high RPM
Verified against live upstream: cache hits ~86% of input tokens from the
second request on (previously 0%), and 200 RPM yields 200/200 success
with zero handshake failures (previously 72x 503).
0 commit comments