Skip to content

Commit 5c996e0

Browse files
dklibanclaude
andcommitted
fix: prevent gunicorn worker recycling from corrupting histogram aggregation
Gunicorn worker recycling causes in-memory Prometheus counters to reset. The OTel aggregation pipeline strips worker.name and sums all workers into a single cumulative counter via groupbyattrs. When a worker recycles, its counter resets to 0, decreasing the aggregate. This manifests as a "hidden counter reset" in Prometheus: if the recycled worker's final le=+Inf value coincidentally equals the new worker's starting value (e.g. both are 1 because the new worker immediately handled a slow request), Prometheus does not detect the reset for le=+Inf. But le=1000 resets visibly. This inflates rate(le=1000) relative to rate(le=+Inf), producing SLI ratios greater than 1. Fix: insert cumulativetodelta before worker aggregation so we sum per-worker deltas (always non-negative) instead of cumulative totals. Worker recycles produce a 0-delta rather than a negative value that corrupts the aggregate. Add deltatorumulative after groupbyattrs to convert the aggregate delta back to a cumulative counter for the Prometheus exporter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 7b1841c commit 5c996e0

1 file changed

Lines changed: 11 additions & 1 deletion

File tree

deploy/clowdapp.yaml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,14 @@ objects:
4343
keys:
4444
- api.request_duration
4545
46+
cumulativetodelta:
47+
include:
48+
metrics:
49+
- api.request_duration
50+
match_type: strict
51+
52+
deltatorumulative:
53+
4654
batch:
4755
4856
exporters:
@@ -61,12 +69,14 @@ objects:
6169
6270
metrics/aggregation:
6371
receivers: [otlp]
64-
processors:
72+
processors:
6573
- memory_limiter
6674
- filter/filter_pulp_api_request_duration
75+
- cumulativetodelta
6776
- attributes/remove_worker_name
6877
- batch/api_aggregation
6978
- groupbyattrs/api_aggregation
79+
- deltatorumulative
7080
exporters: [prometheus]
7181
7282
metrics/main:

0 commit comments

Comments
 (0)