Add streaming timing metrics to generic stream wrappers#13
Conversation
Implement TTFC histogram, time_per_output_chunk histogram, and a span attribute for time to first chunk. All timing logic lives in the shared stream ABC so individual instrumentations only pass through start time and copy results. Closes open-telemetry#8
There was a problem hiding this comment.
Pull request overview
Adds shared streaming timing support in the GenAI utility layer and wires OpenAI v2 chat stream wrappers to propagate those timings into inference telemetry.
Changes:
- Adds TTFC and per-output-chunk timing capture to sync/async stream wrapper base classes.
- Records new streaming timing histograms and the TTFC span attribute from
InferenceInvocation. - Updates OpenAI v2 chat stream wrappers, changelogs, and utility tests for the new timing behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py |
Measures TTFC and per-chunk read durations in shared sync/async stream wrappers. |
util/opentelemetry-util-genai/src/opentelemetry/util/genai/metrics.py |
Records streaming timing histograms from invocation timing fields. |
util/opentelemetry-util-genai/src/opentelemetry/util/genai/instruments.py |
Defines histogram creation helpers for streaming timing metrics. |
util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py |
Adds timing fields and emits TTFC as an inference span attribute. |
instrumentation/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/chat_wrappers.py |
Passes invocation start time into stream wrappers and copies measured timings back. |
util/opentelemetry-util-genai/tests/test_stream.py |
Adds sync/async stream wrapper timing tests. |
util/opentelemetry-util-genai/tests/test_handler_metrics.py |
Adds metric recorder tests for streaming timing histograms. |
util/opentelemetry-util-genai/CHANGELOG.md |
Documents utility streaming timing support. |
instrumentation/opentelemetry-instrumentation-openai-v2/CHANGELOG.md |
Documents OpenAI v2 chat streaming timing metrics. |
Comments suppressed due to low confidence (1)
util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py:217
- The async wrapper has the same unbounded accumulation issue:
_self_chunk_gapsstores one entry per output chunk until the stream finalizes. For large or long-lived streams this can grow without bound; prefer recording each gap immediately or passing timings through a bounded recorder.
self._self_chunk_gaps: list[float] = []
|
@lmolkova This implements the three items you mentioned in #8:
For chunk gaps I'm measuring blocking read time rather than wall clock between returns, so user-side processing doesn't inflate it. Let me know if that's not what you had in mind. Anthropic/Responses API don't use the ABC yet so they'll need follow-up work in a separate PR. |
The ABC now accepts a timing_target and syncs values before calling _on_stream_end/_on_stream_error, so providers just pass it in the constructor without handling timing themselves.
MikeGoldsmith
left a comment
There was a problem hiding this comment.
Looks mostly okay, I've left a few of comments we should resolve before accepting.
- fix time_per_output_chunk description to match semconv wording - expose monotonic_start_s as public read-only property on GenAIInvocation - update chat wrappers and metrics recorder to use public property - clear chunk_gap_seconds after recording to avoid holding data past finalization
Closes #8
What this does
Adds three streaming timing measurements to the shared utils layer:
gen_ai.client.operation.time_to_first_chunkhistogramgen_ai.response.time_to_first_chunkspan attributegen_ai.client.operation.time_per_output_chunkhistogram (one data point per inter-chunk gap)All timing logic lives in the stream wrapper base classes (
SyncStreamWrapper/AsyncStreamWrapperinutil/opentelemetry-util-genai). The OpenAI chat wrappers just pass through the invocation start time and copy measured values back, keeping provider-specific code minimal.How timing works
timeit.default_timer()around the blockingnext()/anext()callScope
This wires up timing for OpenAI chat completions streams (which use the shared ABC). Anthropic and the Responses API have their own stream wrappers that don't extend the ABC, so they will need separate follow-up work.
Testing