Problem
The OpenTelemetry GenAI Semantic Conventions define a gen_ai.client.operation.time_to_first_chunk histogram metric that records the time (in seconds) from request start to the first output chunk during streaming operations. No Python instrumentation currently implements this metric.
Originally tracked in opentelemetry-python-contrib #3932 with a previous implementation in opentelemetry-python-contrib#4415. That PR was closed when GenAI instrumentations moved to this repository.
Context
In the old repository, #4415 was designed to build on top of opentelemetry-python-contrib #4500 (streaming ABC refactor by @eternalcuriouslearner), which introduced the generic stream wrapper in opentelemetry-util-genai. That refactor is already present here at util/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py, so the time-to-first-chunk metric can now be built directly on top of it.
cc @lmolkova @eternalcuriouslearner
Proposed solution
- Add a
gen_ai.client.operation.time_to_first_chunk histogram to the OpenAI v2 instrumentation and the generic streaming ABC.
- Record
time_of_first_chunk - start_time in seconds when the first streaming chunk arrives.
- Attach standard GenAI metric attributes:
gen_ai.operation.name, gen_ai.request.model, gen_ai.response.model, gen_ai.system, server.address, server.port.
- Use the semconv-defined explicit bucket boundaries.
- Record only for streaming calls; do not emit when no chunk is ever received.
- Cover both sync and async streaming paths.
Acceptance criteria
- Metric recorded only on streaming completions (not non-streaming)
- No data point emitted if the stream errors before the first chunk
- Sync and async streaming paths both instrumented
- Test coverage ported from opentelemetry-python-contrib #4415
References
I have a previous implementation from opentelemetry-python-contrib#4415 ready to port.
Problem
The OpenTelemetry GenAI Semantic Conventions define a
gen_ai.client.operation.time_to_first_chunkhistogram metric that records the time (in seconds) from request start to the first output chunk during streaming operations. No Python instrumentation currently implements this metric.Originally tracked in opentelemetry-python-contrib #3932 with a previous implementation in opentelemetry-python-contrib#4415. That PR was closed when GenAI instrumentations moved to this repository.
Context
In the old repository, #4415 was designed to build on top of opentelemetry-python-contrib #4500 (streaming ABC refactor by @eternalcuriouslearner), which introduced the generic stream wrapper in
opentelemetry-util-genai. That refactor is already present here atutil/opentelemetry-util-genai/src/opentelemetry/util/genai/stream.py, so the time-to-first-chunk metric can now be built directly on top of it.cc @lmolkova @eternalcuriouslearner
Proposed solution
gen_ai.client.operation.time_to_first_chunkhistogram to the OpenAI v2 instrumentation and the generic streaming ABC.time_of_first_chunk - start_timein seconds when the first streaming chunk arrives.gen_ai.operation.name,gen_ai.request.model,gen_ai.response.model,gen_ai.system,server.address,server.port.Acceptance criteria
References
I have a previous implementation from opentelemetry-python-contrib#4415 ready to port.