You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(api-proxy): map OpenAI Responses API cached tokens to cache_read (#5262)
* fix(api-proxy): map OpenAI Responses API cached tokens to cache_read
The token normalizer recognized cached prompt tokens from the Chat
Completions API (usage.prompt_tokens_details.cached_tokens) and Anthropic
(cache_read_input_tokens), but not the OpenAI Responses API (/responses),
which reports them under usage.input_tokens_details.cached_tokens as an
object property.
Because extractCacheReadTokens only treated input_tokens_details as a
token-entry array, Responses API cache reads silently fell through and were
recorded as cache_read_tokens: 0. Agents using the /responses endpoint
(e.g. codex) with heavy automatic prompt caching had their cache hits
completely unreported, which also skews AI-credits accounting since the
guard prices the non-cached input as input_tokens - cache_read_tokens.
Fix extractCacheReadTokens to read input_tokens_details.cached_tokens
directly. This covers both the buffered JSON and SSE streaming paths
(both route through extractCacheReadTokens). Adds regression tests for the
JSON, streaming, and normalizeUsage paths using the real Responses API
usage shape.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* test(api-proxy): cover Copilot /responses streaming cache reads
Add a regression test reproducing the exact final-chunk shape from
gh-aw run 27784259295: a Copilot `/responses` streaming response that
arrives as a chat.completion.chunk carrying both
prompt_tokens_details.cached_tokens and the authoritative per-type split
in copilot_usage.token_details. That run reported cache_read_tokens: 0
despite ~1.43M cached reads across 28 requests; this locks in that the
copilot_usage breakdown drives the exact input/cache_read split.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* test(api-proxy): data-driven Copilot /responses cache-read replay
Replace the single Copilot /responses regression sample with a
data-driven test.each over all 28 real requests captured from gh-aw run
27784259295 (chronological; cache reads grow as the prompt is re-sent).
Each request asserts the exact input/cache_read/output split from the
upstream copilot_usage.token_details, and that input + cache_read
reconstructs the lumped prompt_tokens. A final aggregate test confirms
the parser recovers the full 1,426,432 cache-read tokens that the run
had reported as 0.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0 commit comments