Skip to content

Commit 8a3d323

Browse files
lpcoxCopilot
andauthored
fix(api-proxy): map OpenAI Responses API cached tokens to cache_read (#5262)
* fix(api-proxy): map OpenAI Responses API cached tokens to cache_read The token normalizer recognized cached prompt tokens from the Chat Completions API (usage.prompt_tokens_details.cached_tokens) and Anthropic (cache_read_input_tokens), but not the OpenAI Responses API (/responses), which reports them under usage.input_tokens_details.cached_tokens as an object property. Because extractCacheReadTokens only treated input_tokens_details as a token-entry array, Responses API cache reads silently fell through and were recorded as cache_read_tokens: 0. Agents using the /responses endpoint (e.g. codex) with heavy automatic prompt caching had their cache hits completely unreported, which also skews AI-credits accounting since the guard prices the non-cached input as input_tokens - cache_read_tokens. Fix extractCacheReadTokens to read input_tokens_details.cached_tokens directly. This covers both the buffered JSON and SSE streaming paths (both route through extractCacheReadTokens). Adds regression tests for the JSON, streaming, and normalizeUsage paths using the real Responses API usage shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(api-proxy): cover Copilot /responses streaming cache reads Add a regression test reproducing the exact final-chunk shape from gh-aw run 27784259295: a Copilot `/responses` streaming response that arrives as a chat.completion.chunk carrying both prompt_tokens_details.cached_tokens and the authoritative per-type split in copilot_usage.token_details. That run reported cache_read_tokens: 0 despite ~1.43M cached reads across 28 requests; this locks in that the copilot_usage breakdown drives the exact input/cache_read split. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test(api-proxy): data-driven Copilot /responses cache-read replay Replace the single Copilot /responses regression sample with a data-driven test.each over all 28 real requests captured from gh-aw run 27784259295 (chronological; cache reads grow as the prompt is re-sent). Each request asserts the exact input/cache_read/output split from the upstream copilot_usage.token_details, and that input + cache_read reconstructs the lumped prompt_tokens. A final aggregate test confirms the parser recovers the full 1,426,432 cache-read tokens that the run had reported as 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 9e65405 commit 8a3d323

2 files changed

Lines changed: 190 additions & 2 deletions

File tree

containers/api-proxy/token-parsers.js

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,8 @@ function extractReasoningTokens(usage) {
6161
*
6262
* Supports:
6363
* - Anthropic: usage.cache_read_input_tokens
64-
* - OpenAI/Copilot: usage.prompt_tokens_details.cached_tokens
64+
* - OpenAI Chat Completions / Copilot: usage.prompt_tokens_details.cached_tokens
65+
* - OpenAI Responses API: usage.input_tokens_details.cached_tokens
6566
* - Token-entry arrays containing { token_type: "cache_read", token_count: <n> }
6667
*/
6768
function extractCacheReadTokens(usage) {
@@ -75,6 +76,15 @@ function extractCacheReadTokens(usage) {
7576
return usage.prompt_tokens_details.cached_tokens;
7677
}
7778

79+
// OpenAI Responses API (/responses) reports cached prompt tokens under
80+
// `input_tokens_details.cached_tokens` (an object), rather than the Chat
81+
// Completions `prompt_tokens_details.cached_tokens`. Without this branch the
82+
// value falls through to the array loop below, which only handles token-entry
83+
// arrays, so cache reads are silently dropped (reported as 0).
84+
if (usage.input_tokens_details && typeof usage.input_tokens_details.cached_tokens === 'number') {
85+
return usage.input_tokens_details.cached_tokens;
86+
}
87+
7888
const tokenContainers = [
7989
usage.prompt_tokens_details,
8090
usage.input_tokens_details,
@@ -392,7 +402,9 @@ function parseSseDataLines(text) {
392402
* Output fields:
393403
* - input_tokens: number (from Anthropic input_tokens or OpenAI prompt_tokens)
394404
* - output_tokens: number (from Anthropic output_tokens or OpenAI completion_tokens)
395-
* - cache_read_tokens: number (from Anthropic cache_read_input_tokens or OpenAI prompt_tokens_details.cached_tokens)
405+
* - cache_read_tokens: number (from Anthropic cache_read_input_tokens,
406+
* OpenAI Chat Completions prompt_tokens_details.cached_tokens, or
407+
* OpenAI Responses API input_tokens_details.cached_tokens)
396408
* - cache_write_tokens: number (Anthropic cache_creation_input_tokens or
397409
* Copilot copilot_usage cache_write; not available in flattened OpenAI usage)
398410
*/

containers/api-proxy/token-tracker.parsing.test.js

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,39 @@ describe('extractUsageFromJson', () => {
211211
cache_read_input_tokens: 77,
212212
});
213213
});
214+
215+
test('extracts OpenAI Responses API cached tokens from input_tokens_details.cached_tokens', () => {
216+
// The real /responses endpoint (used by codex) reports cached prompt tokens
217+
// under `input_tokens_details.cached_tokens`, not `prompt_tokens_details`.
218+
const body = Buffer.from(JSON.stringify({
219+
type: 'response.completed',
220+
response: {
221+
id: 'resp_responses_cache',
222+
model: 'gpt-5.4-mini',
223+
usage: {
224+
input_tokens: 707301,
225+
output_tokens: 12096,
226+
total_tokens: 719397,
227+
input_tokens_details: {
228+
cached_tokens: 672256,
229+
},
230+
output_tokens_details: {
231+
reasoning_tokens: 7715,
232+
},
233+
},
234+
},
235+
}));
236+
237+
const result = extractUsageFromJson(body);
238+
expect(result.model).toBe('gpt-5.4-mini');
239+
expect(result.usage).toEqual({
240+
input_tokens: 707301,
241+
output_tokens: 12096,
242+
total_tokens: 719397,
243+
reasoning_tokens: 7715,
244+
cache_read_input_tokens: 672256,
245+
});
246+
});
214247
});
215248

216249
// ── extractUsageFromSseLine ───────────────────────────────────────────
@@ -343,6 +376,37 @@ describe('extractUsageFromSseLine', () => {
343376
});
344377
});
345378

379+
test('extracts cache tokens from OpenAI Responses API input_tokens_details (streaming)', () => {
380+
// Real /responses streaming final event: cached tokens live under
381+
// input_tokens_details.cached_tokens (object), not prompt_tokens_details.
382+
const line = JSON.stringify({
383+
type: 'response.completed',
384+
response: {
385+
model: 'gpt-5.4-mini',
386+
usage: {
387+
input_tokens: 37484,
388+
output_tokens: 619,
389+
total_tokens: 38103,
390+
input_tokens_details: {
391+
cached_tokens: 34816,
392+
},
393+
output_tokens_details: {
394+
reasoning_tokens: 128,
395+
},
396+
},
397+
},
398+
});
399+
400+
const result = extractUsageFromSseLine(line);
401+
expect(result.usage).toEqual({
402+
input_tokens: 37484,
403+
output_tokens: 619,
404+
total_tokens: 38103,
405+
reasoning_tokens: 128,
406+
cache_read_input_tokens: 34816,
407+
});
408+
});
409+
346410
test('returns null for [DONE]', () => {
347411
const result = extractUsageFromSseLine('[DONE]');
348412
expect(result.usage).toBeNull();
@@ -390,6 +454,99 @@ describe('extractUsageFromSseLine', () => {
390454
cache_read_input_tokens: 43894,
391455
});
392456
});
457+
458+
// Regression for gh-aw run 27784259295: the Copilot /responses endpoint
459+
// streams a chat.completion-shaped final chunk that carries both
460+
// prompt_tokens_details.cached_tokens AND the authoritative per-type split
461+
// in copilot_usage.token_details. The copilot_usage breakdown must win so
462+
// the input/cache_read split is exact. That run reported cache_read_tokens: 0
463+
// on every request despite ~1.43M cached reads in aggregate.
464+
//
465+
// Each fixture below is a real request captured from the agent's process log
466+
// for that run, in chronological order (cache reads grow as the prompt is
467+
// re-sent). `input` + `cacheRead` === `promptTokens` for every entry.
468+
describe('Copilot /responses streaming cache reads (run 27784259295)', () => {
469+
const REQUESTS = [
470+
{ promptTokens: 19158, completionTokens: 1304, cachedTokens: 0, reasoningTokens: 516, input: 19158, cacheRead: 0, output: 1304 },
471+
{ promptTokens: 10852, completionTokens: 168, cachedTokens: 0, reasoningTokens: 94, input: 10852, cacheRead: 0, output: 168 },
472+
{ promptTokens: 16601, completionTokens: 124, cachedTokens: 10752, reasoningTokens: 14, input: 5849, cacheRead: 10752, output: 124 },
473+
{ promptTokens: 23055, completionTokens: 559, cachedTokens: 18944, reasoningTokens: 516, input: 4111, cacheRead: 18944, output: 559 },
474+
{ promptTokens: 24429, completionTokens: 978, cachedTokens: 22528, reasoningTokens: 455, input: 1901, cacheRead: 22528, output: 978 },
475+
{ promptTokens: 26055, completionTokens: 1405, cachedTokens: 24064, reasoningTokens: 904, input: 1991, cacheRead: 24064, output: 1405 },
476+
{ promptTokens: 28551, completionTokens: 1306, cachedTokens: 25600, reasoningTokens: 941, input: 2951, cacheRead: 25600, output: 1306 },
477+
{ promptTokens: 33145, completionTokens: 1636, cachedTokens: 28160, reasoningTokens: 938, input: 4985, cacheRead: 28160, output: 1636 },
478+
{ promptTokens: 39144, completionTokens: 921, cachedTokens: 32768, reasoningTokens: 595, input: 6376, cacheRead: 32768, output: 921 },
479+
{ promptTokens: 41728, completionTokens: 372, cachedTokens: 38912, reasoningTokens: 193, input: 2816, cacheRead: 38912, output: 372 },
480+
{ promptTokens: 44382, completionTokens: 735, cachedTokens: 41472, reasoningTokens: 488, input: 2910, cacheRead: 41472, output: 735 },
481+
{ promptTokens: 45677, completionTokens: 335, cachedTokens: 44032, reasoningTokens: 83, input: 1645, cacheRead: 44032, output: 335 },
482+
{ promptTokens: 46386, completionTokens: 363, cachedTokens: 45568, reasoningTokens: 119, input: 818, cacheRead: 45568, output: 363 },
483+
{ promptTokens: 48174, completionTokens: 376, cachedTokens: 46080, reasoningTokens: 139, input: 2094, cacheRead: 46080, output: 376 },
484+
{ promptTokens: 48980, completionTokens: 211, cachedTokens: 47616, reasoningTokens: 62, input: 1364, cacheRead: 47616, output: 211 },
485+
{ promptTokens: 65247, completionTokens: 424, cachedTokens: 48640, reasoningTokens: 313, input: 16607, cacheRead: 48640, output: 424 },
486+
{ promptTokens: 68930, completionTokens: 267, cachedTokens: 65024, reasoningTokens: 114, input: 3906, cacheRead: 65024, output: 267 },
487+
{ promptTokens: 69642, completionTokens: 138, cachedTokens: 68608, reasoningTokens: 24, input: 1034, cacheRead: 68608, output: 138 },
488+
{ promptTokens: 75433, completionTokens: 138, cachedTokens: 69120, reasoningTokens: 22, input: 6313, cacheRead: 69120, output: 138 },
489+
{ promptTokens: 78451, completionTokens: 131, cachedTokens: 75264, reasoningTokens: 73, input: 3187, cacheRead: 75264, output: 131 },
490+
{ promptTokens: 78808, completionTokens: 56, cachedTokens: 78336, reasoningTokens: 0, input: 472, cacheRead: 78336, output: 56 },
491+
{ promptTokens: 79128, completionTokens: 56, cachedTokens: 78336, reasoningTokens: 0, input: 792, cacheRead: 78336, output: 56 },
492+
{ promptTokens: 79320, completionTokens: 2799, cachedTokens: 78848, reasoningTokens: 2522, input: 472, cacheRead: 78848, output: 2799 },
493+
{ promptTokens: 82221, completionTokens: 3408, cachedTokens: 78848, reasoningTokens: 2243, input: 3373, cacheRead: 78848, output: 3408 },
494+
{ promptTokens: 91547, completionTokens: 1400, cachedTokens: 81920, reasoningTokens: 1333, input: 9627, cacheRead: 81920, output: 1400 },
495+
{ promptTokens: 93125, completionTokens: 201, cachedTokens: 91136, reasoningTokens: 113, input: 1989, cacheRead: 91136, output: 201 },
496+
{ promptTokens: 93675, completionTokens: 423, cachedTokens: 92672, reasoningTokens: 366, input: 1003, cacheRead: 92672, output: 423 },
497+
{ promptTokens: 94114, completionTokens: 161, cachedTokens: 93184, reasoningTokens: 60, input: 930, cacheRead: 93184, output: 161 },
498+
];
499+
500+
const buildChunk = (r) => JSON.stringify({
501+
object: 'chat.completion.chunk',
502+
model: 'gpt-5.4-2026-03-05',
503+
choices: [{ index: 0, delta: {}, finish_reason: 'stop' }],
504+
usage: {
505+
completion_tokens: r.completionTokens,
506+
prompt_tokens: r.promptTokens,
507+
total_tokens: r.promptTokens + r.completionTokens,
508+
prompt_tokens_details: { cached_tokens: r.cachedTokens },
509+
completion_tokens_details: { reasoning_tokens: r.reasoningTokens },
510+
},
511+
copilot_usage: {
512+
token_details: [
513+
{ token_count: r.input, token_type: 'input' },
514+
{ token_count: r.cacheRead, token_type: 'cache_read' },
515+
{ token_count: r.output, token_type: 'output' },
516+
],
517+
},
518+
});
519+
520+
test.each(REQUESTS)(
521+
'request prompt=$promptTokens recovers cache_read=$cacheRead',
522+
(r) => {
523+
const normalized = normalizeUsage(extractUsageFromSseLine(buildChunk(r)).usage);
524+
expect(normalized).toEqual({
525+
input_tokens: r.input,
526+
output_tokens: r.output,
527+
cache_read_tokens: r.cacheRead,
528+
cache_write_tokens: 0,
529+
reasoning_tokens: r.reasoningTokens,
530+
});
531+
// The copilot_usage split must reconstruct the lumped prompt_tokens.
532+
expect(normalized.input_tokens + normalized.cache_read_tokens).toBe(r.promptTokens);
533+
},
534+
);
535+
536+
test('recovers the full aggregate cache-read total the run reported as 0', () => {
537+
const totals = REQUESTS.reduce(
538+
(acc, r) => {
539+
const n = normalizeUsage(extractUsageFromSseLine(buildChunk(r)).usage);
540+
acc.cacheRead += n.cache_read_tokens;
541+
acc.input += n.input_tokens;
542+
return acc;
543+
},
544+
{ cacheRead: 0, input: 0 },
545+
);
546+
expect(totals.cacheRead).toBe(1426432);
547+
expect(totals.input).toBe(119526);
548+
});
549+
});
393550
});
394551

395552
// ── parseSseDataLines ─────────────────────────────────────────────────
@@ -523,6 +680,25 @@ describe('normalizeUsage', () => {
523680
reasoning_tokens: 0,
524681
});
525682
});
683+
684+
test('normalizes OpenAI Responses API cached_tokens via input_tokens_details.cached_tokens', () => {
685+
const result = normalizeUsage({
686+
input_tokens: 707301,
687+
output_tokens: 12096,
688+
total_tokens: 719397,
689+
input_tokens_details: {
690+
cached_tokens: 672256,
691+
},
692+
reasoning_tokens: 7715,
693+
});
694+
expect(result).toEqual({
695+
input_tokens: 707301,
696+
output_tokens: 12096,
697+
cache_read_tokens: 672256,
698+
cache_write_tokens: 0,
699+
reasoning_tokens: 7715,
700+
});
701+
});
526702
});
527703

528704
// ── Copilot copilot_usage.token_details breakdown ─────────────────────

0 commit comments

Comments
 (0)