Skip to content

Commit 70f9c1b

Browse files
committed
fix(semconv): attach spec-mandated explicit bucket boundaries to GenAI histogram helpers
The four GenAI histogram helpers in opentelemetry-semantic-conventions called meter.create_histogram without passing explicit_bucket_boundaries_advisory. The SDK therefore fell back to _DEFAULT_EXPLICIT_BUCKET_HISTOGRAM_AGGREGATION_BOUNDARIES, which is tuned for request-duration metrics in the seconds range and produces unusable histograms for latency-per-token and TTFT metrics — the exact problem flagged in the semconv spec which says these metrics SHOULD be specified with ExplicitBucketBoundaries. Pass the semconv-prescribed boundaries for all four helpers: * gen_ai.client.operation.duration / gen_ai.server.request.duration / gen_ai.server.time_to_first_token share the latency boundary set [0.01 .. 81.92] seconds. * gen_ai.server.time_per_output_token uses the per-token boundary set [0.01 .. 2.5] seconds. Add tests asserting each factory passes the correct explicit_bucket_boundaries_advisory to Meter.create_histogram. Fixes #4946 Signed-off-by: Ali <alliasgher123@gmail.com>
1 parent 7477b10 commit 70f9c1b

2 files changed

Lines changed: 55 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2020
([#4907](https://github.com/open-telemetry/opentelemetry-python/issues/4907))
2121
- Drop Python 3.9 support
2222
([#5076](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/5076))
23+
- `opentelemetry-semantic-conventions`: Attach spec-mandated explicit bucket boundaries to the GenAI histogram helpers (`gen_ai.client.operation.duration`, `gen_ai.server.request.duration`, `gen_ai.server.time_to_first_token`, `gen_ai.server.time_per_output_token`); without them the default SDK buckets produced unusable histograms for latency-per-token metrics
24+
([#4946](https://github.com/open-telemetry/opentelemetry-python/issues/4946))
2325

2426

2527
## Version 1.41.0/0.62b0 (2026-04-09)

opentelemetry-semantic-conventions/src/opentelemetry/semconv/_incubating/metrics/gen_ai_metrics.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,34 @@
2525
"""
2626

2727

28+
# https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/#metric-gen_aiclientoperationduration
29+
_GEN_AI_CLIENT_OPERATION_DURATION_BUCKETS: Final = (
30+
0.01,
31+
0.02,
32+
0.04,
33+
0.08,
34+
0.16,
35+
0.32,
36+
0.64,
37+
1.28,
38+
2.56,
39+
5.12,
40+
10.24,
41+
20.48,
42+
40.96,
43+
81.92,
44+
)
45+
46+
2847
def create_gen_ai_client_operation_duration(meter: Meter) -> Histogram:
2948
"""GenAI operation duration"""
3049
return meter.create_histogram(
3150
name=GEN_AI_CLIENT_OPERATION_DURATION,
3251
description="GenAI operation duration.",
3352
unit="s",
53+
explicit_bucket_boundaries_advisory=list(
54+
_GEN_AI_CLIENT_OPERATION_DURATION_BUCKETS
55+
),
3456
)
3557

3658

@@ -61,10 +83,15 @@ def create_gen_ai_client_token_usage(meter: Meter) -> Histogram:
6183

6284
def create_gen_ai_server_request_duration(meter: Meter) -> Histogram:
6385
"""Generative AI server request duration such as time-to-last byte or last output token"""
86+
# Shares the latency-style boundaries with client operation duration and
87+
# time-to-first-token per the semconv spec.
6488
return meter.create_histogram(
6589
name=GEN_AI_SERVER_REQUEST_DURATION,
6690
description="Generative AI server request duration such as time-to-last byte or last output token.",
6791
unit="s",
92+
explicit_bucket_boundaries_advisory=list(
93+
_GEN_AI_CLIENT_OPERATION_DURATION_BUCKETS
94+
),
6895
)
6996

7097

@@ -78,12 +105,33 @@ def create_gen_ai_server_request_duration(meter: Meter) -> Histogram:
78105
"""
79106

80107

108+
# https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/#metric-gen_aiservertime_per_output_token
109+
_GEN_AI_SERVER_TIME_PER_OUTPUT_TOKEN_BUCKETS: Final = (
110+
0.01,
111+
0.025,
112+
0.05,
113+
0.075,
114+
0.1,
115+
0.15,
116+
0.2,
117+
0.3,
118+
0.4,
119+
0.5,
120+
0.75,
121+
1.0,
122+
2.5,
123+
)
124+
125+
81126
def create_gen_ai_server_time_per_output_token(meter: Meter) -> Histogram:
82127
"""Time per output token generated after the first token for successful responses"""
83128
return meter.create_histogram(
84129
name=GEN_AI_SERVER_TIME_PER_OUTPUT_TOKEN,
85130
description="Time per output token generated after the first token for successful responses.",
86131
unit="s",
132+
explicit_bucket_boundaries_advisory=list(
133+
_GEN_AI_SERVER_TIME_PER_OUTPUT_TOKEN_BUCKETS
134+
),
87135
)
88136

89137

@@ -97,8 +145,13 @@ def create_gen_ai_server_time_per_output_token(meter: Meter) -> Histogram:
97145

98146
def create_gen_ai_server_time_to_first_token(meter: Meter) -> Histogram:
99147
"""Time to generate first token for successful responses"""
148+
# Shares the latency-style boundaries with client operation duration per
149+
# the semconv spec.
100150
return meter.create_histogram(
101151
name=GEN_AI_SERVER_TIME_TO_FIRST_TOKEN,
102152
description="Time to generate first token for successful responses.",
103153
unit="s",
154+
explicit_bucket_boundaries_advisory=list(
155+
_GEN_AI_CLIENT_OPERATION_DURATION_BUCKETS
156+
),
104157
)

0 commit comments

Comments
 (0)