Skip to content

Commit 8d86bfc

Browse files
dklibanclaude
andauthored
fix: prevent gunicorn worker recycling from corrupting histogram aggregation (#1068)
Gunicorn worker recycling causes in-memory Prometheus counters to reset. The OTel aggregation pipeline strips worker.name and sums all workers into a single cumulative counter via groupbyattrs. When a worker recycles, its counter resets to 0, decreasing the aggregate. This manifests as a "hidden counter reset" in Prometheus: if the recycled worker's final le=+Inf value coincidentally equals the new worker's starting value (e.g. both are 1 because the new worker immediately handled a slow request), Prometheus does not detect the reset for le=+Inf. But le=1000 resets visibly. This inflates rate(le=1000) relative to rate(le=+Inf), producing SLI ratios greater than 1. Fix: insert cumulativetodelta before worker aggregation so we sum per-worker deltas (always non-negative) instead of cumulative totals. Worker recycles produce a 0-delta rather than a negative value that corrupts the aggregate. Add deltatorumulative after groupbyattrs to convert the aggregate delta back to a cumulative counter for the Prometheus exporter. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 8dd7bec commit 8d86bfc

1 file changed

Lines changed: 7 additions & 1 deletion

File tree

deploy/clowdapp.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ objects:
6161
6262
metrics/aggregation:
6363
receivers: [otlp]
64-
processors:
64+
processors:
6565
- memory_limiter
6666
- filter/filter_pulp_api_request_duration
6767
- attributes/remove_worker_name
@@ -373,6 +373,8 @@ objects:
373373
value: ${{OTEL_PYTHON_EXCLUDED_URLS}}
374374
- name: PULP_OTEL_PULP_API_HISTOGRAM_BUCKETS
375375
value: ${PULP_OTEL_PULP_API_HISTOGRAM_BUCKETS}
376+
- name: OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
377+
value: "delta"
376378
- name: PULP_REDIS_PORT
377379
value: "6379"
378380
- name: SENTRY_DSN
@@ -543,6 +545,8 @@ objects:
543545
value: ${OTEL_METRIC_EXPORT_TIMEOUT}
544546
- name: OTEL_TRACES_EXPORTER
545547
value: "none"
548+
- name: OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
549+
value: "delta"
546550
- name: PULP_REDIS_PORT
547551
value: "6379"
548552
- name: SENTRY_DSN
@@ -658,6 +662,8 @@ objects:
658662
value: ${{OTEL_EXPORTER_OTLP_ENDPOINT}}
659663
- name: OTEL_TRACES_EXPORTER
660664
value: "none"
665+
- name: OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
666+
value: "delta"
661667
- name: PULP_OTEL_METRICS_DISPATCH_INTERVAL_MINUTES
662668
value: ${PULP_OTEL_METRICS_DISPATCH_INTERVAL_MINUTES}
663669
- name: PULP_REDIS_PORT

0 commit comments

Comments
 (0)