Skip to content

[bot] Bedrock Converse prompt caching metrics (cacheReadInputTokens, cacheWriteInputTokens) not captured #88

@braintrust-bot

Description

@braintrust-bot

Summary

The Bedrock Converse instrumentation extracts inputTokens, outputTokens, and totalTokens from the response usage object, but silently drops the prompt caching fields cacheReadInputTokens, cacheWriteInputTokens, and cacheDetails. These fields are returned by the Bedrock Converse API when prompt caching is active and are important for understanding cache hit rates and cost savings.

Both the non-streaming (Converse) and streaming (ConverseStream) paths are affected.

What is missing

Non-streaming path

In InstrumentationSemConv.tagBedrockResponse() (lines 350–357), only three usage fields are extracted:

if (usage.has("inputTokens")) metrics.put("prompt_tokens", usage.get("inputTokens"));
if (usage.has("outputTokens")) metrics.put("completion_tokens", usage.get("outputTokens"));
if (usage.has("totalTokens")) metrics.put("tokens", usage.get("totalTokens"));

The following fields from the Bedrock usage object are never extracted:

  • cacheReadInputTokens — tokens served from the prompt cache
  • cacheWriteInputTokens — tokens written to the prompt cache
  • cacheDetails — array of per-checkpoint cache details including TTL

Streaming path

In BraintrustBedrockInterceptor.TeeingSubscriber.parseTokenUsage() (lines 362–379), only inputTokens and outputTokens are parsed from the metadata event payload. Cache token fields in the same payload are ignored. The buildConverseJson() method (lines 385–410) then constructs a synthetic response with only inputTokens, outputTokens, and totalTokens — cache fields are lost before they reach tagBedrockResponse.

A real Converse response with prompt caching looks like:

"usage": {
    "inputTokens": 1200,
    "outputTokens": 350,
    "totalTokens": 1550,
    "cacheReadInputTokens": 800,
    "cacheWriteInputTokens": 400,
    "cacheDetails": [
        { "inputTokens": 800, "ttl": "5m" }
    ]
}

Today, only inputTokens, outputTokens, and totalTokens are captured. The cache fields are silently dropped.

For comparison, the Google GenAI handler in this repo already extracts cachedContentTokenCount as prompt_cached_tokens (line 142–146 of BraintrustApiClient.java), showing that cache token extraction is an established pattern here. Similar gaps for Anthropic (#57) and OpenAI (#58, #70) cache tokens have already been filed.

Braintrust docs status

Upstream sources

Local files inspected

  • braintrust-sdk/src/main/java/dev/braintrust/instrumentation/InstrumentationSemConv.java — lines 350–357 (tagBedrockResponse: only inputTokens, outputTokens, totalTokens extracted from usage)
  • braintrust-sdk/instrumentation/aws_bedrock_2_30_0/src/main/java/dev/braintrust/instrumentation/awsbedrock/v2_30_0/BraintrustBedrockInterceptor.java — lines 362–379 (parseTokenUsage: only inputTokens and outputTokens parsed); lines 385–410 (buildConverseJson: synthetic response omits cache fields)
  • braintrust-sdk/instrumentation/aws_bedrock_2_30_0/src/test/java/dev/braintrust/instrumentation/awsbedrock/v2_30_0/BraintrustAWSBedrockTest.java — no test exercises prompt caching responses
  • braintrust-sdk/instrumentation/genai_1_18_0/src/main/java/com/google/genai/BraintrustApiClient.java — lines 142–146 (GenAI handler already extracts cachedContentTokenCount as prompt_cached_tokens)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions