feat(openai): plumb through cache tokens in metadata events#2116
Merged
feat(openai): plumb through cache tokens in metadata events#2116
Conversation
Extract prompt_tokens_details.cached_tokens from OpenAI usage data and include it as cacheReadInputTokens in the metadata event, following the same pattern used by the LiteLLM provider.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Unshure
commented
Apr 13, 2026
Member
Author
|
/strands |
|
Assessment: Request Changes Clean, well-scoped feature that correctly follows the established LiteLLM pattern for cache token extraction. The implementation, type usage, and test coverage all look good. Review Details
Nice contribution — once the stray log file is removed, this is ready to go. |
Remove redundant test_format_chunk_metadata_without_cache_tokens (already covered by parametrized test_format_chunk metadata case). Remove accidentally committed test_output.log build artifact.
|
Assessment: Approve The previous blocking issue ( Review Details
|
mkmeral
approved these changes
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
When OpenAI returns prompt caching information (
prompt_tokens_details.cached_tokens), the OpenAI model provider currently discards it. Users have no way to see whether their requests hit the OpenAI prompt cache, which is valuable for cost optimization and debugging.The LiteLLM provider already extracts this data, as does the experimental OpenAI Realtime bidi model — but the primary OpenAI provider was missing this support.
Resolves #2115
Public API Changes
No public API changes. The
metadataevent emitted byOpenAIModel.format_chunknow includescacheReadInputTokensin the usage data when OpenAI reports cached prompt tokens:When
prompt_tokens_detailsisNoneorcached_tokensisNone/0, the field is omitted — preserving backward compatibility. The existing telemetry pipeline (tracer and metrics) already handlescacheReadInputTokens, so cache data flows through automatically.Only
cacheReadInputTokensis set because OpenAI's API does not expose a cache write token equivalent (unlike Anthropic).