Skip to content

Add streaming timing metrics to generic stream wrappers#13

Open
Nik-Reddy wants to merge 5 commits into
open-telemetry:mainfrom
Nik-Reddy:feat/streaming-timing-metrics
Open

Add streaming timing metrics to generic stream wrappers#13
Nik-Reddy wants to merge 5 commits into
open-telemetry:mainfrom
Nik-Reddy:feat/streaming-timing-metrics

Conversation

@Nik-Reddy
Copy link
Copy Markdown

Closes #8

What this does

Adds three streaming timing measurements to the shared utils layer:

  1. gen_ai.client.operation.time_to_first_chunk histogram
  2. gen_ai.response.time_to_first_chunk span attribute
  3. gen_ai.client.operation.time_per_output_chunk histogram (one data point per inter-chunk gap)

All timing logic lives in the stream wrapper base classes (SyncStreamWrapper/AsyncStreamWrapper in util/opentelemetry-util-genai). The OpenAI chat wrappers just pass through the invocation start time and copy measured values back, keeping provider-specific code minimal.

How timing works

  • TTFC: measured from invocation start (when the HTTP request was issued) to when the first chunk arrives
  • time_per_output_chunk: measures the blocking read duration for each chunk after the first, so user processing time between pulls doesn't inflate the metric
  • Uses timeit.default_timer() around the blocking next()/anext() call

Scope

This wires up timing for OpenAI chat completions streams (which use the shared ABC). Anthropic and the Responses API have their own stream wrappers that don't extend the ABC, so they will need separate follow-up work.

Testing

  • 5 unit tests for the timing logic in stream.py (sync, async, single chunk, error cases)
  • 2 integration tests verifying histogram recording in metrics.py
  • All existing tests still pass

Implement TTFC histogram, time_per_output_chunk histogram, and a span
attribute for time to first chunk. All timing logic lives in the shared
stream ABC so individual instrumentations only pass through start time
and copy results.

Closes open-telemetry#8
Copilot AI review requested due to automatic review settings May 13, 2026 23:56
@Nik-Reddy Nik-Reddy requested a review from a team as a code owner May 13, 2026 23:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds shared streaming timing support in the GenAI utility layer and wires OpenAI v2 chat stream wrappers to propagate those timings into inference telemetry.

Changes:

  • Adds TTFC and per-output-chunk timing capture to sync/async stream wrapper base classes.
  • Records new streaming timing histograms and the TTFC span attribute from InferenceInvocation.
  • Updates OpenAI v2 chat stream wrappers, changelogs, and utility tests for the new timing behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py Measures TTFC and per-chunk read durations in shared sync/async stream wrappers.
util/opentelemetry-util-genai/src/opentelemetry/util/genai/metrics.py Records streaming timing histograms from invocation timing fields.
util/opentelemetry-util-genai/src/opentelemetry/util/genai/instruments.py Defines histogram creation helpers for streaming timing metrics.
util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py Adds timing fields and emits TTFC as an inference span attribute.
instrumentation/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/chat_wrappers.py Passes invocation start time into stream wrappers and copies measured timings back.
util/opentelemetry-util-genai/tests/test_stream.py Adds sync/async stream wrapper timing tests.
util/opentelemetry-util-genai/tests/test_handler_metrics.py Adds metric recorder tests for streaming timing histograms.
util/opentelemetry-util-genai/CHANGELOG.md Documents utility streaming timing support.
instrumentation/opentelemetry-instrumentation-openai-v2/CHANGELOG.md Documents OpenAI v2 chat streaming timing metrics.
Comments suppressed due to low confidence (1)

util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py:217

  • The async wrapper has the same unbounded accumulation issue: _self_chunk_gaps stores one entry per output chunk until the stream finalizes. For large or long-lived streams this can grow without bound; prefer recording each gap immediately or passing timings through a bounded recorder.
        self._self_chunk_gaps: list[float] = []

@Nik-Reddy
Copy link
Copy Markdown
Author

Nik-Reddy commented May 14, 2026

@lmolkova This implements the three items you mentioned in #8:

  • gen_ai.response.time_to_first_chunk span attribute
  • gen_ai.client.operation.time_to_first_chunk histogram
  • gen_ai.client.operation.time_per_output_chunk histogram (one record per inter-chunk gap)

For chunk gaps I'm measuring blocking read time rather than wall clock between returns, so user-side processing doesn't inflate it. Let me know if that's not what you had in mind.

Anthropic/Responses API don't use the ABC yet so they'll need follow-up work in a separate PR.

The ABC now accepts a timing_target and syncs values before
calling _on_stream_end/_on_stream_error, so providers just
pass it in the constructor without handling timing themselves.
Copy link
Copy Markdown
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly okay, I've left a few of comments we should resolve before accepting.

Comment thread util/opentelemetry-util-genai/src/opentelemetry/util/genai/instruments.py Outdated
@MikeGoldsmith MikeGoldsmith self-assigned this May 14, 2026
- fix time_per_output_chunk description to match semconv wording
- expose monotonic_start_s as public read-only property on GenAIInvocation
- update chat wrappers and metrics recorder to use public property
- clear chunk_gap_seconds after recording to avoid holding data past finalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement gen_ai.client.operation.time_to_first_chunk for OpenAI v2 streaming

4 participants