feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming by Nik-Reddy · Pull Request #4415 · open-telemetry/opentelemetry-python-contrib

Nik-Reddy · 2026-04-13T00:17:43Z

Description

Implement the gen_ai.client.operation.time_to_first_chunk histogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions.

This was requested in #3932 -- the semantic convention defines the metric but no Python instrumentation existed for it.

Note: Issue #3932 references gen_ai.server.time_to_first_token (server-side). This PR implements the client-side equivalent gen_ai.client.operation.time_to_first_chunk per the semconv registry, which measures time from when the client issues the request to when the first response chunk arrives in the stream.

Fixes #3932

Changes

util/genai/types.py: Added time_to_first_token_s field to LLMInvocation dataclass
util/genai/instruments.py: Added GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK constant and create_ttfc_histogram() factory with semconv-specified bucket boundaries
util/genai/metrics.py: InvocationMetricsRecorder now creates and records TTFC histogram (only for successful streaming responses)
openai_v2/instruments.py: Added ttfc_histogram to Instruments class via shared helper
openai_v2/patch.py: First-token detection in stream wrappers, wired into both new and legacy paths
tests/test_ttft_metrics.py: 4 test cases covering sync/async streaming, non-streaming exclusion, and tool-call streaming

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

All new TTFC tests pass. All existing tests continue to pass.

Does This PR Require a Core Repo Change?

No.

Checklist:

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

Nik-Reddy · 2026-04-14T21:45:32Z

Hi @lmolkova, I have addressed your feedback: (1) Renamed gen_ai.server.time_to_first_token to gen_ai.client.time_to_first_token throughout. (2) Moved the TTFT constant, bucket boundaries, and get_metric_data_points() helper into util/opentelemetry-util-genai instruments.py so they are shared across instrumentation libraries. @xrmx earlier feedback was also incorporated (helper returns all matches, tests assert count). Would appreciate a re-review when convenient. Thanks!

Nik-Reddy · 2026-04-15T01:09:15Z

Rebased on latest main and addressed @lmolkova's feedback — refactored \instruments.py\ to use the shared \create_duration_histogram, \create_token_histogram, and \create_ttft_histogram\ helpers from \opentelemetry.util.genai.instruments\ instead of defining bucket boundaries inline. This removes ~50 lines of duplicated configuration and aligns with the pattern of keeping common metric definitions in genai-utils.

Ready for re-review when you get a chance. Happy to address any further feedback.

Nik-Reddy · 2026-04-15T07:21:40Z

Updated the metric name to align with the semantic conventions registry:

Before: gen_ai.client.time_to_first_token
After: gen_ai.client.operation.time_to_first_chunk

This matches the semconv-defined client metric with the correct .operation. infix (consistent with gen_ai.client.operation.duration) and uses time_to_first_chunk as the official metric name.

Bucket boundaries are now the semconv-specified values: [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48, 40.96, 81.92]

All helper functions and constants are in genai-utils as requested -- individual instrumentations just call the shared factory.

@xrmx @lmolkova Ready for re-review when you get a chance.

…for streaming Implement the gen_ai.client.operation.time_to_first_chunk histogram metric as defined in OpenTelemetry Semantic Conventions v1.38.0. This metric records the time (in seconds) from request start to the first output chunk received during streaming chat completions. Changes: - Add time_to_first_token_s field to LLMInvocation dataclass - Add create_ttfc_histogram() factory with semconv-specified bucket boundaries - InvocationMetricsRecorder now creates and records TTFC histogram - First-token detection in stream wrappers for both new and legacy paths - 4 test cases: sync/async streaming, non-streaming exclusion, tool-call streaming Fixes open-telemetry#3932

lmolkova · 2026-04-16T21:19:13Z

+            common_attributes[ServerAttributes.SERVER_PORT] = (
+                self._request_attributes[ServerAttributes.SERVER_PORT]
+            )
+        self._instruments.ttfc_histogram.record(


this should not happen in openai instrumentation, this logic is not openai specific and should live in utils

lmolkova · 2026-04-16T21:20:12Z

+        self.time_to_first_token_s: float | None = None
+        """Time to first token in seconds (streaming responses only)."""


Suggested change

self.time_to_first_token_s: float | None = None

"""Time to first token in seconds (streaming responses only)."""

self.time_to_first_chunk_s: float | None = None

"""Time to first chunk in seconds (streaming responses only)."""

lmolkova · 2026-04-16T21:26:12Z

    seed: int | None = None
    server_address: str | None = None
    server_port: int | None = None
+    time_to_first_token_s: float | None = None


@eternalcuriouslearner if I remember correctly you were exploring having common streaming helpers - I imagine if we had them in utils, we wouldn't need instrumentation libs to provide this and would populate it through that common code.

WDYT?

Yes @lmolkova. I am planning to move the streaming helpers into utils so that we can commonly reuse them. Those streaming helpers are going to follow the same ABC pattern we have in GenAiInvocaton class. I am planning to move them to utils after the following prs are merged:

#4443
#4274

@lmolkova I have a dumb question. I am assuming this attribute is going to available once after move to OpenTelemetry Semantic Conventions v1.38.0. Do we really need this pr?

I don't know where 1.38.0 came from, time-to-first chunk metric was added in upcoming 1.41.0 (not released yet) and there is more coming in open-telemetry/semantic-conventions#3607, but I agree with you that stream helpers would be a better design choice. Also give that open-telemetry/semantic-conventions#3607 is not merged yet, I think it would be best to close this PR.

I see #3607 landed and #4443 is merged too, so once #4274 goes in and @eternalcuriouslearner moves the streaming helpers into utils, I can rebase this to plug the TTFT metric into that shared infrastructure instead of having it in the openai instrumentation directly.

I'll keep this open for now and rework it once the streaming helpers are in place. Please let me know @lmolkova

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds the OpenTelemetry semantic-convention metric gen_ai.client.operation.time_to_first_chunk (TTFC) for OpenAI v2 streaming chat completions, capturing latency from request start to first streamed output.

Changes:

Introduces a TTFC histogram instrument (name + explicit bucket boundaries) and wires it into metric recording.
Detects the “first streamed output” moment in stream wrappers and records TTFC for successful streaming invocations.
Adds unit tests covering sync/async streaming, non-streaming exclusion, and tool-call streaming.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py	Minor formatting change near type definitions.
util/opentelemetry-util-genai/src/opentelemetry/util/genai/metrics.py	Adds TTFC histogram creation and recording logic.
util/opentelemetry-util-genai/src/opentelemetry/util/genai/instruments.py	Adds TTFC metric constants/buckets and histogram factory; adds metric-reader helper.
util/opentelemetry-util-genai/src/opentelemetry/util/genai/_inference_invocation.py	Stores time-to-first-* value on invocation and exposes monotonic start time.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/instruments.py	Reuses shared histogram factories and adds TTFC histogram to Instruments.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/patch.py	Adds first-chunk detection + TTFC recording in legacy/new stream wrappers.
instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_ttft_metrics.py	Adds tests validating TTFC emission behavior and buckets.

Comments suppressed due to low confidence (1)

util/opentelemetry-util-genai/src/opentelemetry/util/genai/types.py:1

There’s an extra blank line added before @dataclass that doesn’t appear to serve a purpose. Consider removing it to keep formatting consistent.

# Copyright The OpenTelemetry Authors

Copilot · 2026-04-23T17:52:04Z

+            if (
+                choice.delta.content is not None
+                or choice.delta.tool_calls is not None
+            ):
+                self._first_token_received = True
+                self._first_token_time = default_timer()
+                return


TTFC is specified as time to the first output chunk received, but this detection waits until a chunk contains delta.content or delta.tool_calls. For OpenAI streaming, the first chunk can be role-only (e.g., delta.role='assistant'), so this will systematically over-measure TTFC. Record the timestamp on the first received streaming chunk (e.g., first chunk with any choices entry / any delta present), not only when content/tool_calls appear.

Suggested change

if (

choice.delta.content is not None

or choice.delta.tool_calls is not None

):

self._first_token_received = True

self._first_token_time = default_timer()

return

self._first_token_received = True

self._first_token_time = default_timer()

return

Copilot · 2026-04-23T17:52:04Z

+        ttfc_s = getattr(invocation, 'time_to_first_token_s', None)
+        if ttfc_s is not None and error_attributes.ERROR_TYPE not in attributes:
+            self._ttfc_histogram.record(
+                ttfc_s,
+                attributes=attributes,
+                context=invocation._span_context,
+            )


This records the time_to_first_chunk metric but reads from an attribute named time_to_first_token_s. That mismatch makes the data model confusing (token vs chunk) and increases the chance of future mis-use. Consider renaming the stored field to time_to_first_chunk_s (or similar) and using that consistently throughout the invocation/stream wrappers and metrics recorder.

Copilot · 2026-04-23T17:52:04Z

+    def _record_ttft(self):
+        if (
+            self._instruments is None
+            or self._start_time is None
+            or self._first_token_time is None
+        ):
+            return
+        ttft = max(self._first_token_time - self._start_time, 0.0)


Method/variable naming is inconsistent with the new metric: _record_ttft and ttft are used while recording ttfc_histogram (time to first chunk). Renaming to _record_ttfc and ttfc will reduce confusion and align with the semantic convention and metric name.

Copilot · 2026-04-23T17:52:05Z

+        self._instruments.ttfc_histogram.record(
+            ttft,
+            attributes=common_attributes,
+        )


Method/variable naming is inconsistent with the new metric: _record_ttft and ttft are used while recording ttfc_histogram (time to first chunk). Renaming to _record_ttfc and ttfc will reduce confusion and align with the semantic convention and metric name.

Copilot · 2026-04-23T17:52:05Z

+
+
+def get_metric_data_points(metric_reader, metric_name):
+    """Extract all data points for a given metric name from a metric reader."""
+    results = []
+    metrics = metric_reader.get_metrics_data().resource_metrics
+    if not metrics:
+        return results
+    for scope_metrics in metrics[0].scope_metrics:
+        for m in scope_metrics.metrics:
+            if m.name == metric_name:
+                results.extend(m.data.data_points)
+    return results


This helper looks test-focused and now lives in a production utility module, increasing surface area and coupling the library to test-only introspection patterns. Consider moving get_metric_data_points into the test utilities (e.g., tests/test_utils.py) or a dedicated test helper module, and keep util.genai.instruments focused on instrumentation primitives.

Suggested change

def get_metric_data_points(metric_reader, metric_name):

"""Extract all data points for a given metric name from a metric reader."""

results = []

metrics = metric_reader.get_metrics_data().resource_metrics

if not metrics:

return results

for scope_metrics in metrics[0].scope_metrics:

for m in scope_metrics.metrics:

if m.name == metric_name:

results.extend(m.data.data_points)

return results

Copilot · 2026-04-23T17:52:05Z

+from opentelemetry.util.genai.instruments import (
+    GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK,
+    _GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK_BUCKETS,
+    get_metric_data_points,
+)


The test imports a module-private constant (_GEN_AI_CLIENT_OPERATION_TIME_TO_FIRST_CHUNK_BUCKETS). Tests depending on private names are brittle. Prefer either (a) asserting a locally-defined expected bucket list in the test, or (b) exposing a public accessor/constant if bucket verification is intended to be part of the supported API.

Copilot · 2026-04-23T17:52:06Z

+    @property
+    def monotonic_start_s(self) -> float | None:
+        """Monotonic start time, delegated from the underlying InferenceInvocation."""
+        if self._inference_invocation is not None:
+            return self._inference_invocation._monotonic_start_s
+        return None


This property reaches into a private attribute (_monotonic_start_s) of InferenceInvocation. To avoid tight coupling to internal state, consider adding a public property/method on InferenceInvocation (e.g., monotonic_start_s) and delegating to that instead.

Nik-Reddy requested a review from a team as a code owner April 13, 2026 00:17

github-project-automation Bot added this to Python PR digest Apr 13, 2026

github-actions Bot assigned DylanRussell, keith-decker and lmolkova Apr 13, 2026

Nik-Reddy mentioned this pull request Apr 13, 2026

Metric: gen_ai.server.time_to_first_token #3932

Open

xrmx added the gen-ai Related to generative AI label Apr 13, 2026

xrmx reviewed Apr 13, 2026

View reviewed changes

Comment thread ...lemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/instruments.py Outdated

xrmx reviewed Apr 13, 2026

View reviewed changes

Comment thread instrumentation-genai/opentelemetry-instrumentation-openai-v2/tests/test_ttft_metrics.py Outdated

lmolkova reviewed Apr 13, 2026

View reviewed changes

Comment thread ...lemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/instruments.py Outdated

Nik-Reddy force-pushed the feat/genai-ttft-metric-3932 branch from bf9fbee to 40ac7e3 Compare April 14, 2026 18:05

lmolkova reviewed Apr 14, 2026

View reviewed changes

Comment thread ...lemetry-instrumentation-openai-v2/src/opentelemetry/instrumentation/openai_v2/instruments.py Outdated

Nik-Reddy requested review from lmolkova and xrmx April 15, 2026 01:00

Nik-Reddy force-pushed the feat/genai-ttft-metric-3932 branch from 9a5cba5 to 69ab567 Compare April 15, 2026 01:08

Nik-Reddy changed the title ~~feat(openai): Add gen_ai.server.time_to_first_token metric for streaming~~ feat(openai): Add gen_ai.client.operation.time_to_first_chunk metric for streaming Apr 15, 2026

Nik-Reddy force-pushed the feat/genai-ttft-metric-3932 branch from 376170b to acbf5c4 Compare April 16, 2026 21:11

lmolkova reviewed Apr 16, 2026

View reviewed changes

lmolkova reviewed Apr 17, 2026

View reviewed changes

lmolkova requested a review from Copilot April 23, 2026 17:44

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Copilot started reviewing on behalf of lmolkova April 23, 2026 18:15 View session

		self.time_to_first_token_s: float \| None = None
		"""Time to first token in seconds (streaming responses only)."""

Conversation

Nik-Reddy commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Type of change

How Has This Been Tested?

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nik-Reddy commented Apr 14, 2026

Uh oh!

Uh oh!

Nik-Reddy commented Apr 15, 2026

Uh oh!

Nik-Reddy commented Apr 15, 2026

Uh oh!

lmolkova Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

lmolkova Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

lmolkova Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

eternalcuriouslearner Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eternalcuriouslearner Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

lmolkova Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nik-Reddy Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Nik-Reddy commented Apr 13, 2026 •

edited

Loading

eternalcuriouslearner Apr 17, 2026 •

edited

Loading

lmolkova Apr 17, 2026 •

edited

Loading