Skip to content

fix: use per-invocation usage on agent OTEL span instead of accumulated#2074

Open
Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Zelys-DFKH:fix/agent-span-per-invocation-usage
Open

fix: use per-invocation usage on agent OTEL span instead of accumulated#2074
Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Zelys-DFKH:fix/agent-span-per-invocation-usage

Conversation

@Zelys-DFKH
Copy link
Copy Markdown

Description

end_agent_span in tracer.py reported response.metrics.accumulated_usage on
each span, which grows with every request in a session. In a session with 10
requests each using 100k tokens, request 1 would correctly show 100k, request 2
would show 200k, request 3 would show 300k, and so on. Observability backends like
Langfuse then sum these values, producing wildly inflated token counts and cost
estimates.

The fix replaces accumulated_usage with response.metrics.latest_agent_invocation.usage,
which contains only the tokens consumed during the current agent invocation. The
accumulated_usage field is retained as a fallback for the edge case where no
invocation has been recorded.

Related Issues

Resolves #2010

Documentation PR

N/A

Type of Change

Bug fix

Testing

  • Updated all four existing test_end_agent_span* tests to wire
    latest_agent_invocation.usage on the mock (matching the value they were
    already asserting, so expectations are unchanged).
  • Added test_end_agent_span_uses_invocation_not_accumulated_usage: sets
    invocation usage to 100/200/300 tokens while accumulated usage is 300/600/900,
    and asserts the span receives only the invocation values.
  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

end_agent_span was reporting response.metrics.accumulated_usage, which
grows with every request in a session. In a 10-request session each
using 100k tokens, request 2 would report 200k, request 3 would report
300k, etc., causing wildly inflated token counts in Langfuse and other
OTEL backends.

Use response.metrics.latest_agent_invocation.usage instead, which
contains only the tokens for the current agent invocation. Falls back
to accumulated_usage when no invocation is recorded (shouldn't happen
in practice but guards against edge cases).

Adds test_end_agent_span_uses_invocation_not_accumulated_usage to
confirm that per-invocation values appear on the span when accumulated
usage differs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] OTEL span reports accumulated_usage instead of per-invocation usage, causing inflated token metrics in Langfuse

1 participant