fix: use per-invocation usage on agent OTEL span instead of accumulated#2074
Open
Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Open
fix: use per-invocation usage on agent OTEL span instead of accumulated#2074Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Zelys-DFKH wants to merge 1 commit intostrands-agents:mainfrom
Conversation
end_agent_span was reporting response.metrics.accumulated_usage, which grows with every request in a session. In a 10-request session each using 100k tokens, request 2 would report 200k, request 3 would report 300k, etc., causing wildly inflated token counts in Langfuse and other OTEL backends. Use response.metrics.latest_agent_invocation.usage instead, which contains only the tokens for the current agent invocation. Falls back to accumulated_usage when no invocation is recorded (shouldn't happen in practice but guards against edge cases). Adds test_end_agent_span_uses_invocation_not_accumulated_usage to confirm that per-invocation values appear on the span when accumulated usage differs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
end_agent_spanintracer.pyreportedresponse.metrics.accumulated_usageoneach span, which grows with every request in a session. In a session with 10
requests each using 100k tokens, request 1 would correctly show 100k, request 2
would show 200k, request 3 would show 300k, and so on. Observability backends like
Langfuse then sum these values, producing wildly inflated token counts and cost
estimates.
The fix replaces
accumulated_usagewithresponse.metrics.latest_agent_invocation.usage,which contains only the tokens consumed during the current agent invocation. The
accumulated_usagefield is retained as a fallback for the edge case where noinvocation has been recorded.
Related Issues
Resolves #2010
Documentation PR
N/A
Type of Change
Bug fix
Testing
test_end_agent_span*tests to wirelatest_agent_invocation.usageon the mock (matching the value they werealready asserting, so expectations are unchanged).
test_end_agent_span_uses_invocation_not_accumulated_usage: setsinvocation usage to 100/200/300 tokens while accumulated usage is 300/600/900,
and asserts the span receives only the invocation values.
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.