Skip to content

feat(openai): plumb through cache tokens in metadata events#2116

Merged
mkmeral merged 3 commits intomainfrom
agent-tasks/2115
Apr 13, 2026
Merged

feat(openai): plumb through cache tokens in metadata events#2116
mkmeral merged 3 commits intomainfrom
agent-tasks/2115

Conversation

@Unshure
Copy link
Copy Markdown
Member

@Unshure Unshure commented Apr 13, 2026

Motivation

When OpenAI returns prompt caching information (prompt_tokens_details.cached_tokens), the OpenAI model provider currently discards it. Users have no way to see whether their requests hit the OpenAI prompt cache, which is valuable for cost optimization and debugging.

The LiteLLM provider already extracts this data, as does the experimental OpenAI Realtime bidi model — but the primary OpenAI provider was missing this support.

Resolves #2115

Public API Changes

No public API changes. The metadata event emitted by OpenAIModel.format_chunk now includes cacheReadInputTokens in the usage data when OpenAI reports cached prompt tokens:

# Before: metadata event usage
{"inputTokens": 1861, "outputTokens": 10, "totalTokens": 1871}

# After: metadata event usage (when cache hit occurs)
{"inputTokens": 1861, "outputTokens": 10, "totalTokens": 1871, "cacheReadInputTokens": 1792}

When prompt_tokens_details is None or cached_tokens is None/0, the field is omitted — preserving backward compatibility. The existing telemetry pipeline (tracer and metrics) already handles cacheReadInputTokens, so cache data flows through automatically.

Only cacheReadInputTokens is set because OpenAI's API does not expose a cache write token equivalent (unlike Anthropic).

Extract prompt_tokens_details.cached_tokens from OpenAI usage data
and include it as cacheReadInputTokens in the metadata event, following
the same pattern used by the LiteLLM provider.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Unshure
Copy link
Copy Markdown
Member Author

Unshure commented Apr 13, 2026

/strands

@github-actions
Copy link
Copy Markdown

Assessment: Request Changes

Clean, well-scoped feature that correctly follows the established LiteLLM pattern for cache token extraction. The implementation, type usage, and test coverage all look good.

Review Details
  • Accidental file: test_output.log was committed and must be removed before merge — this is the only blocking issue.
  • Implementation: Matches the litellm.py pattern exactly (getattr() with walrus operator), proper Usage type annotation, backward-compatible field omission.
  • Tests: Good coverage of cache-present, cache-absent, and zero-cache-tokens scenarios. Existing parametrized test properly updated.

Nice contribution — once the stray log file is removed, this is ready to go.

Remove redundant test_format_chunk_metadata_without_cache_tokens
(already covered by parametrized test_format_chunk metadata case).
Remove accidentally committed test_output.log build artifact.
@Unshure Unshure marked this pull request as ready for review April 13, 2026 17:41
@github-actions
Copy link
Copy Markdown

Assessment: Approve

The previous blocking issue (test_output.log) has been resolved. The implementation is clean, correctly follows the established LiteLLM pattern, and all acceptance criteria from #2115 are met.

Review Details
  • Implementation: Matches the litellm.py cache token extraction pattern exactly — getattr() with walrus operator for safe attribute access, proper Usage type annotation, backward-compatible field omission when cache data is absent.
  • Testing: Good coverage with cache-present, zero-cache-tokens, and prompt_tokens_details=None (via updated parametrized test) scenarios. Codecov confirms 100% coverage on modified lines.
  • Telemetry: Existing tracer and metrics pipelines already handle cacheReadInputTokens, so cache data flows through automatically with no additional changes needed.
  • API: No public API changes — no API review or documentation PR needed.

@mkmeral mkmeral merged commit 0930ca6 into main Apr 13, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Plumb through openai cache tokens

3 participants