Skip to content

Commit 047091b

Browse files
committed
RSPEED-3017: use custom buckets for response duration histogram
The response_duration_seconds histogram used prometheus_client default buckets which max out at 10s, causing histogram_quantile in Grafana to appear capped for requests exceeding 10 seconds. Reuse the existing LLM_INFERENCE_DURATION_BUCKETS (0.1-120s) to cover the full expected response time range. Signed-off-by: Major Hayden <major@redhat.com>
1 parent d15d37d commit 047091b

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

src/metrics/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,10 @@
3232
# Histogram to measure response durations
3333
# This will be used to track how long it takes to handle requests
3434
response_duration_seconds = Histogram(
35-
"ls_response_duration_seconds", "Response durations", ["path"]
35+
"ls_response_duration_seconds",
36+
"Response durations",
37+
["path"],
38+
buckets=LLM_INFERENCE_DURATION_BUCKETS,
3639
)
3740

3841
# Metric that indicates what provider + model customers are using so we can

0 commit comments

Comments
 (0)