Skip to content

llm_anthropic: emit per-call token usage into flow trace #786

@joshuadarron

Description

@joshuadarron

Problem

apaevt_flow events from llm_anthropic invocations include duration but not token counts. This makes it impossible to distinguish "this role takes longer because it processes more tokens" from "this role takes longer per token" without re-instrumenting from scratch.

In a coding-agent tracer run, eng2 consumed 60 LLM calls at 12.4 s avg vs eng1's 31 calls at 3.9 s avg. Without usage data we can't tell whether eng2 is context-bloated, output-heavy, or model-slow on its specific prompt shape.

Proposed fix

After each ChatAnthropic call returns, extract response.usage_metadata (or LangChain's AIMessage.usage_metadata) and attach to the trace.result payload of the apaevt_flow op:leave event:

{
  "input_tokens": 1234,
  "output_tokens": 567,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0
}

Free piggyback on the same flow event — no new event type, no new schema, no extra round-trip.

Acceptance

  • apaevt_flow op:leave for any llm_anthropic invoke includes a usage dict.
  • Caching verification (companion issue on prompt caching) becomes inspectable from the tracer file alone.
  • Optional: also record on llm_openai, llm_bedrock, etc. if upstream usage shapes are similar — out of scope for this issue, separate ticket.

Suggested labels

enhancement, observability, nodes/llm_anthropic

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions