Commit 047091b
committed
RSPEED-3017: use custom buckets for response duration histogram
The response_duration_seconds histogram used prometheus_client default buckets which max out at 10s, causing histogram_quantile in Grafana to appear capped for requests exceeding 10 seconds. Reuse the existing LLM_INFERENCE_DURATION_BUCKETS (0.1-120s) to cover the full expected response time range.
Signed-off-by: Major Hayden <major@redhat.com>1 parent d15d37d commit 047091b
1 file changed
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
36 | 39 | | |
37 | 40 | | |
38 | 41 | | |
| |||
0 commit comments