Skip to content

Commit 16b693c

Browse files
committed
fix(supervisor): resolve dequeue latency quantiles around the 10s long-poll boundary
The server parks empty dequeues on a ~10s blocking pop, so nearly all observations land just above 10s. With only a 10s and a 30s bucket, histogram_quantile interpolated p95/p99 to ~28-30s while the true latency was ~10-11s. Add 11/12.5/15/20s buckets so quantiles read accurately where the distribution actually sits.
1 parent e2e9ee0 commit 16b693c

1 file changed

Lines changed: 6 additions & 3 deletions

File tree

packages/core/src/v3/runEngineWorker/supervisor/consumerPoolMetrics.ts

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,9 +121,12 @@ export class ConsumerPoolMetrics {
121121
labelNames: ["outcome"],
122122
// The HTTP client retries internally (up to 5 attempts with 0.5-5s backoff),
123123
// so one observation can span multiple requests plus sleeps. A retryable
124-
// failure surfaces as `error` only after >=7.5s of backoff - the 10/30s
125-
// buckets exist so that mode doesn't collapse into +Inf.
126-
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10, 30],
124+
// failure surfaces as `error` only after >=7.5s of backoff - the 10-30s
125+
// buckets exist so that mode doesn't collapse into +Inf. The server also
126+
// long-polls (RUN_ENGINE_DEQUEUE_BLOCKING_TIMEOUT_SECONDS, default 10s),
127+
// parking empty dequeues at ~10s - the 11/12.5/15/20 buckets give the
128+
// quantiles resolution just above that boundary, where the mass sits.
129+
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10, 11, 12.5, 15, 20, 30],
127130
registers: [this.register],
128131
});
129132
}

0 commit comments

Comments
 (0)