Commit 8d86bfc
fix: prevent gunicorn worker recycling from corrupting histogram aggregation (#1068)
Gunicorn worker recycling causes in-memory Prometheus counters to reset.
The OTel aggregation pipeline strips worker.name and sums all workers
into a single cumulative counter via groupbyattrs. When a worker recycles,
its counter resets to 0, decreasing the aggregate.
This manifests as a "hidden counter reset" in Prometheus: if the recycled
worker's final le=+Inf value coincidentally equals the new worker's
starting value (e.g. both are 1 because the new worker immediately handled
a slow request), Prometheus does not detect the reset for le=+Inf. But
le=1000 resets visibly. This inflates rate(le=1000) relative to
rate(le=+Inf), producing SLI ratios greater than 1.
Fix: insert cumulativetodelta before worker aggregation so we sum
per-worker deltas (always non-negative) instead of cumulative totals.
Worker recycles produce a 0-delta rather than a negative value that
corrupts the aggregate. Add deltatorumulative after groupbyattrs to
convert the aggregate delta back to a cumulative counter for the
Prometheus exporter.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>1 parent 8dd7bec commit 8d86bfc
1 file changed
Lines changed: 7 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
| 64 | + | |
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| |||
373 | 373 | | |
374 | 374 | | |
375 | 375 | | |
| 376 | + | |
| 377 | + | |
376 | 378 | | |
377 | 379 | | |
378 | 380 | | |
| |||
543 | 545 | | |
544 | 546 | | |
545 | 547 | | |
| 548 | + | |
| 549 | + | |
546 | 550 | | |
547 | 551 | | |
548 | 552 | | |
| |||
658 | 662 | | |
659 | 663 | | |
660 | 664 | | |
| 665 | + | |
| 666 | + | |
661 | 667 | | |
662 | 668 | | |
663 | 669 | | |
| |||
0 commit comments