feat: Add token usage telemetry#435
Open
Vamshi-Microsoft wants to merge 4 commits into
Open
Conversation
The extract_usage_from_stream_chunk function only checked messages[*].contents[*].usage_details, but agent-framework-foundry AgentResponseUpdate objects expose contents directly (no wrapping messages list). Usage Content items with usage_details were being missed, causing LLM_Token_Usage_Summary events to never emit in workshop (IS_WORKSHOP=True) mode. Now also checks chunk.contents[*].usage_details directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Coverage Report •
|
Unit Test Results489 tests 489 ✅ 7s ⏱️ Results for commit c029bae. ♻️ This comment has been updated with latest results. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces LLM token-usage telemetry for the Python chat API by adding a shared, process-wide TokenUsageEmitter, wiring token extraction/emission into streaming chat endpoints, and providing environment-variable configuration for sampling, user ID hashing, and model pricing.
Changes:
- Added
llm_token_telemetry.py(extraction helpers, emitter, and scope/decorator utilities) plus atelemetry.pysingleton (token_emitter) configured via env vars. - Integrated token usage reporting into
stream_openai_textandstream_openai_text_workshop. - Updated
.coveragercto omit the new telemetry helper module from coverage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/api/python/telemetry.py | Adds a process-wide token_emitter singleton configured from environment variables. |
| src/api/python/llm_token_telemetry.py | Introduces a shared telemetry helper module (usage extraction + standardized event emission). |
| src/api/python/chat.py | Emits token usage telemetry for chat streaming endpoints (standard + workshop mode). |
| .coveragerc | Excludes the new telemetry helper module from coverage collection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Move telemetry imports after load_dotenv() so .env values apply - Use AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME instead of agent name for model labeling - Accumulate token usage across all tool-call iterations (non-workshop) - Wrap workshop streaming in try/finally for exception-safe telemetry emission - Update telemetry.py docstring to document actual import-time side effects - Downgrade emit_all() log from INFO to DEBUG to avoid PII/volume issues - Fix double extraction in TokenUsageScope.add() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preserve original behavior where None is passed when env var is unset, rather than empty string which could behave differently on the API side. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines
+184
to
191
| agent_name = os.getenv("AGENT_NAME_CHAT", "") | ||
| model_deployment_name = os.getenv("AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME", "") | ||
|
|
||
| response = await openai_client.responses.create( | ||
| conversation=thread_conversation_id, | ||
| input=query, | ||
| extra_body={"agent_reference": {"name": os.getenv("AGENT_NAME_CHAT"), "type": "agent_reference"}} | ||
| ) |
Comment on lines
262
to
267
| # Submit tool outputs and get next response | ||
| response = await openai_client.responses.create( | ||
| conversation=thread_conversation_id, | ||
| input=tool_outputs, | ||
| extra_body={"agent_reference": {"name": os.getenv("AGENT_NAME_CHAT"), "type": "agent_reference"}} | ||
| ) |
Comment on lines
+912
to
+915
| start_ns = time.perf_counter_ns() | ||
| try: | ||
| found = extract_usage_from_stream_chunk(source) or extract_usage(source) | ||
| except Exception as exc: # belt + braces; extractors are already safe |
Comment on lines
1
to
+4
| [run] | ||
| omit = | ||
| */test_*.py | ||
| */llm_token_telemetry.py |
Comment on lines
+277
to
+288
| try: | ||
| if accumulated_usage and accumulated_usage.has_any: | ||
| resolved_model = getattr(response, "model", "") or model_deployment_name | ||
| token_emitter.emit_all( | ||
| agent_name=agent_name, | ||
| model_deployment_name=resolved_model, | ||
| usage=accumulated_usage, | ||
| conversation_id=conversation_id, | ||
| user_id=user_id, | ||
| ) | ||
| except Exception: | ||
| logger.debug("Token usage telemetry failed", exc_info=True) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request adds telemetry for LLM token usage to the chat API, enabling better tracking of model usage and associated costs. The changes introduce a process-wide telemetry emitter, integrate token usage reporting into chat streaming endpoints, and provide configuration via environment variables for sampling, user ID hashing, and model pricing. The
.coveragercfile is also updated to exclude the new telemetry module from coverage reports.Telemetry infrastructure and configuration:
token_emittersingleton intelemetry.py, which configures aTokenUsageEmitterfor process-wide use. This supports environment variable configuration for sample rate (LLM_TOKEN_SAMPLE_RATE), user ID hashing (LLM_TOKEN_USER_ID_HMAC_KEY), and model pricing (LLM_TOKEN_PRICING)..coveragercto omit thellm_token_telemetry.pyfile from coverage reports.Integration with chat endpoints:
chat.py, imported the telemetry emitter and supporting utilities, and integrated token usage telemetry into thestream_openai_textendpoint. Token usage is extracted from responses and emitted after streaming completes, with error handling to avoid breaking the response flow. [1] [2]stream_openai_text_workshopendpoint, added aTokenUsageScopeto accumulate token usage from streaming chunks, emitting telemetry after the stream completes. Errors in telemetry emission are logged but do not affect the main response. [1] [2]Does this introduce a breaking change?
Golden Path Validation
Deployment Validation
What to Check
Verify that the following are valid
Other Information