Add streaming timing metrics to generic stream wrappers by Nik-Reddy · Pull Request #13 · open-telemetry/opentelemetry-python-genai

Nik-Reddy · 2026-05-13T23:56:01Z

Closes #8

What this does

Adds three streaming timing measurements to the shared utils layer:

gen_ai.client.operation.time_to_first_chunk histogram
gen_ai.response.time_to_first_chunk span attribute
gen_ai.client.operation.time_per_output_chunk histogram (one data point per inter-chunk gap)

All timing logic lives in the stream wrapper base classes (SyncStreamWrapper/AsyncStreamWrapper in util/opentelemetry-util-genai). The OpenAI chat wrappers just pass through the invocation start time and copy measured values back, keeping provider-specific code minimal.

How timing works

TTFC: measured from invocation start (when the HTTP request was issued) to when the first chunk arrives
time_per_output_chunk: measures the blocking read duration for each chunk after the first, so user processing time between pulls doesn't inflate the metric
Uses timeit.default_timer() around the blocking next()/anext() call

Scope

This wires up timing for OpenAI chat completions streams (which use the shared ABC). Anthropic and the Responses API have their own stream wrappers that don't extend the ABC, so they will need separate follow-up work.

Testing

5 unit tests for the timing logic in stream.py (sync, async, single chunk, error cases)
2 integration tests verifying histogram recording in metrics.py
All existing tests still pass

Implement TTFC histogram, time_per_output_chunk histogram, and a span attribute for time to first chunk. All timing logic lives in the shared stream ABC so individual instrumentations only pass through start time and copy results. Closes open-telemetry#8

Copilot

Pull request overview

Adds shared streaming timing support in the GenAI utility layer and wires OpenAI v2 chat stream wrappers to propagate those timings into inference telemetry.

Changes:

Adds TTFC and per-output-chunk timing capture to sync/async stream wrapper base classes.
Records new streaming timing histograms and the TTFC span attribute from InferenceInvocation.
Updates OpenAI v2 chat stream wrappers, changelogs, and utility tests for the new timing behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py`	Measures TTFC and per-chunk read durations in shared sync/async stream wrappers.
`util/opentelemetry-util-genai/src/opentelemetry/util/genai/metrics.py`	Records streaming timing histograms from invocation timing fields.
`util/opentelemetry-util-genai/src/opentelemetry/util/genai/instruments.py`	Defines histogram creation helpers for streaming timing metrics.
`util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py`	Adds timing fields and emits TTFC as an inference span attribute.
`instrumentation/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/chat_wrappers.py`	Passes invocation start time into stream wrappers and copies measured timings back.
`util/opentelemetry-util-genai/tests/test_stream.py`	Adds sync/async stream wrapper timing tests.
`util/opentelemetry-util-genai/tests/test_handler_metrics.py`	Adds metric recorder tests for streaming timing histograms.
`util/opentelemetry-util-genai/CHANGELOG.md`	Documents utility streaming timing support.
`instrumentation/opentelemetry-instrumentation-openai-v2/CHANGELOG.md`	Documents OpenAI v2 chat streaming timing metrics.

Comments suppressed due to low confidence (1)

util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py:217

The async wrapper has the same unbounded accumulation issue: _self_chunk_gaps stores one entry per output chunk until the stream finalizes. For large or long-lived streams this can grow without bound; prefer recording each gap immediately or passing timings through a bounded recorder.

        self._self_chunk_gaps: list[float] = []

Nik-Reddy · 2026-05-14T01:29:56Z

@lmolkova This implements the three items you mentioned in #8:

gen_ai.response.time_to_first_chunk span attribute
gen_ai.client.operation.time_to_first_chunk histogram
gen_ai.client.operation.time_per_output_chunk histogram (one record per inter-chunk gap)

For chunk gaps I'm measuring blocking read time rather than wall clock between returns, so user-side processing doesn't inflate it. Let me know if that's not what you had in mind.

Anthropic/Responses API don't use the ABC yet so they'll need follow-up work in a separate PR.

The ABC now accepts a timing_target and syncs values before calling _on_stream_end/_on_stream_error, so providers just pass it in the constructor without handling timing themselves.

MikeGoldsmith

Looks mostly okay, I've left a few of comments we should resolve before accepting.

- fix time_per_output_chunk description to match semconv wording - expose monotonic_start_s as public read-only property on GenAIInvocation - update chat wrappers and metrics recorder to use public property - clear chunk_gap_seconds after recording to avoid holding data past finalization

Copilot AI review requested due to automatic review settings May 13, 2026 23:56

Nik-Reddy requested a review from a team as a code owner May 13, 2026 23:56

Copilot started reviewing on behalf of Nik-Reddy May 13, 2026 23:56 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

Comment thread util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py

Comment thread util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py

chore: update changelog links to PR open-telemetry#13

27dd4e2

test: assert TTFC span attribute on inference invocation

2dbb54d

lmolkova reviewed May 14, 2026

View reviewed changes

Comment thread ...metry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/chat_wrappers.py Outdated

Move timing propagation into stream helpers

7239901

The ABC now accepts a timing_target and syncs values before calling _on_stream_end/_on_stream_error, so providers just pass it in the constructor without handling timing themselves.

MikeGoldsmith requested changes May 14, 2026

View reviewed changes

MikeGoldsmith self-assigned this May 14, 2026

Nik-Reddy requested review from MikeGoldsmith and lmolkova May 15, 2026 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add streaming timing metrics to generic stream wrappers#13

Add streaming timing metrics to generic stream wrappers#13
Nik-Reddy wants to merge 5 commits into
open-telemetry:mainfrom
Nik-Reddy:feat/streaming-timing-metrics

Nik-Reddy commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Nik-Reddy commented May 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

MikeGoldsmith left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Nik-Reddy commented May 13, 2026

What this does

How timing works

Scope

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Nik-Reddy commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MikeGoldsmith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Nik-Reddy commented May 14, 2026 •

edited

Loading