feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming#4415
feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming#4415Nik-Reddy wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
bf9fbee to
40ac7e3
Compare
|
Hi @lmolkova, I have addressed your feedback: (1) Renamed gen_ai.server.time_to_first_token to gen_ai.client.time_to_first_token throughout. (2) Moved the TTFT constant, bucket boundaries, and get_metric_data_points() helper into util/opentelemetry-util-genai instruments.py so they are shared across instrumentation libraries. @xrmx earlier feedback was also incorporated (helper returns all matches, tests assert count). Would appreciate a re-review when convenient. Thanks! |
9a5cba5 to
69ab567
Compare
|
Rebased on latest main and addressed @lmolkova's feedback — refactored \instruments.py\ to use the shared \create_duration_histogram, \create_token_histogram, and \create_ttft_histogram\ helpers from \opentelemetry.util.genai.instruments\ instead of defining bucket boundaries inline. This removes ~50 lines of duplicated configuration and aligns with the pattern of keeping common metric definitions in genai-utils. Ready for re-review when you get a chance. Happy to address any further feedback. |
|
Updated the metric name to align with the semantic conventions registry:
This matches the semconv-defined client metric with the correct Bucket boundaries are now the semconv-specified values: All helper functions and constants are in |
…for streaming Implement the gen_ai.client.operation.time_to_first_chunk histogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions. Changes: - Add time_to_first_token_s field to LLMInvocation dataclass - Add create_ttfc_histogram() factory with semconv-specified bucket boundaries - InvocationMetricsRecorder now creates and records TTFC histogram - First-token detection in stream wrappers for both new and legacy paths - 4 test cases: sync/async streaming, non-streaming exclusion, tool-call streaming Fixes open-telemetry#3932
376170b to
acbf5c4
Compare
| common_attributes[ServerAttributes.SERVER_PORT] = ( | ||
| self._request_attributes[ServerAttributes.SERVER_PORT] | ||
| ) | ||
| self._instruments.ttfc_histogram.record( |
There was a problem hiding this comment.
this should not happen in openai instrumentation, this logic is not openai specific and should live in utils
| self.time_to_first_token_s: float | None = None | ||
| """Time to first token in seconds (streaming responses only).""" |
There was a problem hiding this comment.
| self.time_to_first_token_s: float | None = None | |
| """Time to first token in seconds (streaming responses only).""" | |
| self.time_to_first_chunk_s: float | None = None | |
| """Time to first chunk in seconds (streaming responses only).""" |
| seed: int | None = None | ||
| server_address: str | None = None | ||
| server_port: int | None = None | ||
| time_to_first_token_s: float | None = None |
There was a problem hiding this comment.
@eternalcuriouslearner if I remember correctly you were exploring having common streaming helpers - I imagine if we had them in utils, we wouldn't need instrumentation libs to provide this and would populate it through that common code.
WDYT?
There was a problem hiding this comment.
There was a problem hiding this comment.
@lmolkova I have a dumb question. I am assuming this attribute is going to available once after move to OpenTelemetry Semantic Conventions v1.38.0. Do we really need this pr?
There was a problem hiding this comment.
I don't know where 1.38.0 came from, time-to-first chunk metric was added in upcoming 1.41.0 (not released yet) and there is more coming in open-telemetry/semantic-conventions#3607, but I agree with you that stream helpers would be a better design choice. Also give that open-telemetry/semantic-conventions#3607 is not merged yet, I think it would be best to close this PR.
There was a problem hiding this comment.
I see #3607 landed and #4443 is merged too, so once #4274 goes in and @eternalcuriouslearner moves the streaming helpers into utils, I can rebase this to plug the TTFT metric into that shared infrastructure instead of having it in the openai instrumentation directly.
I'll keep this open for now and rework it once the streaming helpers are in place. Please let me know @lmolkova
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds the OpenTelemetry semantic-convention metric gen_ai.client.operation.time_to_first_chunk (TTFC) for OpenAI v2 streaming chat completions, capturing latency from request start to first streamed output.
Changes:
- Introduces a TTFC histogram instrument (name + explicit bucket boundaries) and wires it into metric recording.
- Detects the “first streamed output” moment in stream wrappers and records TTFC for successful streaming invocations.
- Adds unit tests covering sync/async streaming, non-streaming exclusion, and tool-call streaming.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py | Minor formatting change near type definitions. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/metrics.py | Adds TTFC histogram creation and recording logic. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/instruments.py | Adds TTFC metric constants/buckets and histogram factory; adds metric-reader helper. |
| util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py | Stores time-to-first-* value on invocation and exposes monotonic start time. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/instruments.py | Reuses shared histogram factories and adds TTFC histogram to Instruments. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/patch.py | Adds first-chunk detection + TTFC recording in legacy/new stream wrappers. |
| instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_ttft_metrics.py | Adds tests validating TTFC emission behavior and buckets. |
Comments suppressed due to low confidence (1)
util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py:1
- There’s an extra blank line added before
@dataclassthat doesn’t appear to serve a purpose. Consider removing it to keep formatting consistent.
# Copyright The OpenTelemetry Authors
| if ( | ||
| choice.delta.content is not None | ||
| or choice.delta.tool_calls is not None | ||
| ): | ||
| self._first_token_received = True | ||
| self._first_token_time = default_timer() | ||
| return |
There was a problem hiding this comment.
TTFC is specified as time to the first output chunk received, but this detection waits until a chunk contains delta.content or delta.tool_calls. For OpenAI streaming, the first chunk can be role-only (e.g., delta.role='assistant'), so this will systematically over-measure TTFC. Record the timestamp on the first received streaming chunk (e.g., first chunk with any choices entry / any delta present), not only when content/tool_calls appear.
| if ( | |
| choice.delta.content is not None | |
| or choice.delta.tool_calls is not None | |
| ): | |
| self._first_token_received = True | |
| self._first_token_time = default_timer() | |
| return | |
| self._first_token_received = True | |
| self._first_token_time = default_timer() | |
| return |
| ttfc_s = getattr(invocation, 'time_to_first_token_s', None) | ||
| if ttfc_s is not None and error_attributes.ERROR_TYPE not in attributes: | ||
| self._ttfc_histogram.record( | ||
| ttfc_s, | ||
| attributes=attributes, | ||
| context=invocation._span_context, | ||
| ) |
There was a problem hiding this comment.
This records the time_to_first_chunk metric but reads from an attribute named time_to_first_token_s. That mismatch makes the data model confusing (token vs chunk) and increases the chance of future mis-use. Consider renaming the stored field to time_to_first_chunk_s (or similar) and using that consistently throughout the invocation/stream wrappers and metrics recorder.
| def _record_ttft(self): | ||
| if ( | ||
| self._instruments is None | ||
| or self._start_time is None | ||
| or self._first_token_time is None | ||
| ): | ||
| return | ||
| ttft = max(self._first_token_time - self._start_time, 0.0) |
There was a problem hiding this comment.
Method/variable naming is inconsistent with the new metric: _record_ttft and ttft are used while recording ttfc_histogram (time to first chunk). Renaming to _record_ttfc and ttfc will reduce confusion and align with the semantic convention and metric name.
| self._instruments.ttfc_histogram.record( | ||
| ttft, | ||
| attributes=common_attributes, | ||
| ) |
There was a problem hiding this comment.
Method/variable naming is inconsistent with the new metric: _record_ttft and ttft are used while recording ttfc_histogram (time to first chunk). Renaming to _record_ttfc and ttfc will reduce confusion and align with the semantic convention and metric name.
|
|
||
|
|
||
| def get_metric_data_points(metric_reader, metric_name): | ||
| """Extract all data points for a given metric name from a metric reader.""" | ||
| results = [] | ||
| metrics = metric_reader.get_metrics_data().resource_metrics | ||
| if not metrics: | ||
| return results | ||
| for scope_metrics in metrics[0].scope_metrics: | ||
| for m in scope_metrics.metrics: | ||
| if m.name == metric_name: | ||
| results.extend(m.data.data_points) | ||
| return results |
There was a problem hiding this comment.
This helper looks test-focused and now lives in a production utility module, increasing surface area and coupling the library to test-only introspection patterns. Consider moving get_metric_data_points into the test utilities (e.g., tests/test_utils.py) or a dedicated test helper module, and keep util.genai.instruments focused on instrumentation primitives.
| def get_metric_data_points(metric_reader, metric_name): | |
| """Extract all data points for a given metric name from a metric reader.""" | |
| results = [] | |
| metrics = metric_reader.get_metrics_data().resource_metrics | |
| if not metrics: | |
| return results | |
| for scope_metrics in metrics[0].scope_metrics: | |
| for m in scope_metrics.metrics: | |
| if m.name == metric_name: | |
| results.extend(m.data.data_points) | |
| return results |
| from opentelemetry.util.genai.instruments import ( | ||
| GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK, | ||
| _GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK_BUCKETS, | ||
| get_metric_data_points, | ||
| ) |
There was a problem hiding this comment.
The test imports a module-private constant (_GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK_BUCKETS). Tests depending on private names are brittle. Prefer either (a) asserting a locally-defined expected bucket list in the test, or (b) exposing a public accessor/constant if bucket verification is intended to be part of the supported API.
| @property | ||
| def monotonic_start_s(self) -> float | None: | ||
| """Monotonic start time, delegated from the underlying InferenceInvocation.""" | ||
| if self._inference_invocation is not None: | ||
| return self._inference_invocation._monotonic_start_s | ||
| return None |
There was a problem hiding this comment.
This property reaches into a private attribute (_monotonic_start_s) of InferenceInvocation. To avoid tight coupling to internal state, consider adding a public property/method on InferenceInvocation (e.g., monotonic_start_s) and delegating to that instead.
Description
Implement the
gen_ai.client.operation.time_to_first_chunkhistogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions.This was requested in #3932 -- the semantic convention defines the metric but no Python instrumentation existed for it.
Note: Issue #3932 references
gen_ai.server.time_to_first_token(server-side). This PR implements the client-side equivalentgen_ai.client.operation.time_to_first_chunkper the semconv registry, which measures time from when the client issues the request to when the first response chunk arrives in the stream.Fixes #3932
Changes
util/genai/types.py: Addedtime_to_first_token_sfield toLLMInvocationdataclassutil/genai/instruments.py: AddedGEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNKconstant andcreate_ttfc_histogram()factory with semconv-specified bucket boundariesutil/genai/metrics.py:InvocationMetricsRecordernow creates and records TTFC histogram (only for successful streaming responses)openai_v2/instruments.py: Addedttfc_histogramtoInstrumentsclass via shared helperopenai_v2/patch.py: First-token detection in stream wrappers, wired into both new and legacy pathstests/test_ttft_metrics.py: 4 test cases covering sync/async streaming, non-streaming exclusion, and tool-call streamingType of change
How Has This Been Tested?
All new TTFC tests pass. All existing tests continue to pass.
Does This PR Require a Core Repo Change?
Checklist: