Skip to content

feat: Add token usage telemetry#435

Open
Vamshi-Microsoft wants to merge 4 commits into
devfrom
psl-tokenMonitoring
Open

feat: Add token usage telemetry#435
Vamshi-Microsoft wants to merge 4 commits into
devfrom
psl-tokenMonitoring

Conversation

@Vamshi-Microsoft

Copy link
Copy Markdown
Contributor

Purpose

This pull request adds telemetry for LLM token usage to the chat API, enabling better tracking of model usage and associated costs. The changes introduce a process-wide telemetry emitter, integrate token usage reporting into chat streaming endpoints, and provide configuration via environment variables for sampling, user ID hashing, and model pricing. The .coveragerc file is also updated to exclude the new telemetry module from coverage reports.

Telemetry infrastructure and configuration:

  • Introduced a new token_emitter singleton in telemetry.py, which configures a TokenUsageEmitter for process-wide use. This supports environment variable configuration for sample rate (LLM_TOKEN_SAMPLE_RATE), user ID hashing (LLM_TOKEN_USER_ID_HMAC_KEY), and model pricing (LLM_TOKEN_PRICING).
  • Updated .coveragerc to omit the llm_token_telemetry.py file from coverage reports.

Integration with chat endpoints:

  • In chat.py, imported the telemetry emitter and supporting utilities, and integrated token usage telemetry into the stream_openai_text endpoint. Token usage is extracted from responses and emitted after streaming completes, with error handling to avoid breaking the response flow. [1] [2]
  • In the stream_openai_text_workshop endpoint, added a TokenUsageScope to accumulate token usage from streaming chunks, emitting telemetry after the stream completes. Errors in telemetry emission are logged but do not affect the main response. [1] [2]

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Verify that the following are valid

  • ...

Other Information

Vamshi-Microsoft and others added 2 commits June 18, 2026 15:19
The extract_usage_from_stream_chunk function only checked
messages[*].contents[*].usage_details, but agent-framework-foundry
AgentResponseUpdate objects expose contents directly (no wrapping
messages list). Usage Content items with usage_details were being
missed, causing LLM_Token_Usage_Summary events to never emit in
workshop (IS_WORKSHOP=True) mode.

Now also checks chunk.contents[*].usage_details directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Coverage

Coverage Report •
FileStmtsMissCoverMissing
chat.py45613869%92–93, 150, 209, 215–225, 235–239, 241–249, 251, 253–254, 256, 263, 269–271, 274–275, 279–280, 287–288, 298–302, 304–305, 307–315, 379, 404–405, 408, 412, 417, 453–455, 461, 474–476, 480–481, 484–487, 489, 493, 497–498, 520–521, 523–528, 530–533, 539, 543–551, 560–561, 582, 595, 606–614, 616–617, 619–622, 706–707, 709, 713, 716, 722–727, 732–735
telemetry.py462447%49–53, 60, 62–64, 66, 73–86
TOTAL179630682% 

Tests Skipped Failures Errors Time
489 0 💤 0 ❌ 0 🔥 7.629s ⏱️

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Unit Test Results

489 tests   489 ✅  7s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit c029bae.

♻️ This comment has been updated with latest results.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces LLM token-usage telemetry for the Python chat API by adding a shared, process-wide TokenUsageEmitter, wiring token extraction/emission into streaming chat endpoints, and providing environment-variable configuration for sampling, user ID hashing, and model pricing.

Changes:

  • Added llm_token_telemetry.py (extraction helpers, emitter, and scope/decorator utilities) plus a telemetry.py singleton (token_emitter) configured via env vars.
  • Integrated token usage reporting into stream_openai_text and stream_openai_text_workshop.
  • Updated .coveragerc to omit the new telemetry helper module from coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
src/api/python/telemetry.py Adds a process-wide token_emitter singleton configured from environment variables.
src/api/python/llm_token_telemetry.py Introduces a shared telemetry helper module (usage extraction + standardized event emission).
src/api/python/chat.py Emits token usage telemetry for chat streaming endpoints (standard + workshop mode).
.coveragerc Excludes the new telemetry helper module from coverage collection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/api/python/chat.py Outdated
Comment thread src/api/python/chat.py Outdated
Comment thread src/api/python/chat.py Outdated
Comment thread src/api/python/chat.py
Comment thread src/api/python/chat.py Outdated
Comment thread src/api/python/telemetry.py Outdated
Comment thread src/api/python/llm_token_telemetry.py
Comment thread src/api/python/llm_token_telemetry.py Outdated
Comment thread src/api/python/llm_token_telemetry.py Outdated
- Move telemetry imports after load_dotenv() so .env values apply
- Use AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME instead of agent name for model labeling
- Accumulate token usage across all tool-call iterations (non-workshop)
- Wrap workshop streaming in try/finally for exception-safe telemetry emission
- Update telemetry.py docstring to document actual import-time side effects
- Downgrade emit_all() log from INFO to DEBUG to avoid PII/volume issues
- Fix double extraction in TokenUsageScope.add()

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preserve original behavior where None is passed when env var is unset,
rather than empty string which could behave differently on the API side.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 12:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

Comment thread src/api/python/chat.py
Comment on lines +184 to 191
agent_name = os.getenv("AGENT_NAME_CHAT", "")
model_deployment_name = os.getenv("AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME", "")

response = await openai_client.responses.create(
conversation=thread_conversation_id,
input=query,
extra_body={"agent_reference": {"name": os.getenv("AGENT_NAME_CHAT"), "type": "agent_reference"}}
)
Comment thread src/api/python/chat.py
Comment on lines 262 to 267
# Submit tool outputs and get next response
response = await openai_client.responses.create(
conversation=thread_conversation_id,
input=tool_outputs,
extra_body={"agent_reference": {"name": os.getenv("AGENT_NAME_CHAT"), "type": "agent_reference"}}
)
Comment on lines +912 to +915
start_ns = time.perf_counter_ns()
try:
found = extract_usage_from_stream_chunk(source) or extract_usage(source)
except Exception as exc: # belt + braces; extractors are already safe
Comment thread .coveragerc
Comment on lines 1 to +4
[run]
omit =
*/test_*.py
*/llm_token_telemetry.py
Comment thread src/api/python/chat.py
Comment on lines +277 to +288
try:
if accumulated_usage and accumulated_usage.has_any:
resolved_model = getattr(response, "model", "") or model_deployment_name
token_emitter.emit_all(
agent_name=agent_name,
model_deployment_name=resolved_model,
usage=accumulated_usage,
conversation_id=conversation_id,
user_id=user_id,
)
except Exception:
logger.debug("Token usage telemetry failed", exc_info=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants