Description
The Flask instrumentation's _rewrapped_app function increments active_requests_counter before calling wsgi_app() but the decrement is not wrapped in try/finally. If an exception propagates out of wsgi_app(), the counter is never decremented, causing a permanent gauge leak.
This is particularly impactful when using this metric for Kubernetes HPA (Horizontal Pod Autoscaler) — leaked counters cause HPA to see phantom load and refuse to scale down.
Location
instrumentation/opentelemetry-instrumentation-flask/src/opentelemetry/instrumentation/flask/__init__.py
Lines 351–425 in _rewrapped_app:
active_requests_counter.add(1, active_requests_count_attrs) # L351 - increment
# ...
result = wsgi_app(wrapped_app_environ, _start_response) # L398 - if this raises...
# ...duration recording (L400-424)...
active_requests_counter.add(-1, active_requests_count_attrs) # L425 - NEVER REACHED
return result
Expected behavior
The decrement should always execute, regardless of whether wsgi_app() raises an exception.
The WSGI instrumentation already has the correct pattern
instrumentation/opentelemetry-instrumentation-wsgi/src/opentelemetry/instrumentation/wsgi/__init__.py:
try:
with trace.use_span(span):
iterable = self.wsgi(environ, start_response)
return _end_span_after_iterating(iterable, span, token)
except Exception as ex:
raise
finally:
self.active_requests_counter.add(-1, active_requests_count_attrs) # always runs
Suggested fix
Wrap lines 398–425 in try/finally:
active_requests_counter.add(1, active_requests_count_attrs)
request_route = None
# ...
try:
result = wsgi_app(wrapped_app_environ, _start_response)
# ...duration histogram recording...
return result
finally:
active_requests_counter.add(-1, active_requests_count_attrs)
How to reproduce
- Run a Flask app with
opentelemetry-instrument and gunicorn
- Send requests that trigger exceptions propagating past Flask's error handlers (e.g., OOM-killed gunicorn workers, or middleware errors)
- Observe
http.server.active_requests gauge — it increments but never decrements for crashed requests
- The gauge stays permanently elevated until the process is restarted
Impact
- HPA scaling: Kubernetes HPA using this metric sees phantom active requests → refuses to scale down → wasted resources
- Monitoring: Dashboards show incorrect active request counts
- Alerting: False positives on active request alerts
Environment
opentelemetry-instrumentation-flask==0.57b0 (also confirmed present on main / 0.62b0)
- gunicorn with
--workers 1 --threads N --worker-class gthread
- OTEL SDK 1.36.0, OTLP HTTP exporter, delta temporality
Description
The Flask instrumentation's
_rewrapped_appfunction incrementsactive_requests_counterbefore callingwsgi_app()but the decrement is not wrapped intry/finally. If an exception propagates out ofwsgi_app(), the counter is never decremented, causing a permanent gauge leak.This is particularly impactful when using this metric for Kubernetes HPA (Horizontal Pod Autoscaler) — leaked counters cause HPA to see phantom load and refuse to scale down.
Location
instrumentation/opentelemetry-instrumentation-flask/src/opentelemetry/instrumentation/flask/__init__.pyLines 351–425 in
_rewrapped_app:Expected behavior
The decrement should always execute, regardless of whether
wsgi_app()raises an exception.The WSGI instrumentation already has the correct pattern
instrumentation/opentelemetry-instrumentation-wsgi/src/opentelemetry/instrumentation/wsgi/__init__.py:Suggested fix
Wrap lines 398–425 in try/finally:
How to reproduce
opentelemetry-instrumentand gunicornhttp.server.active_requestsgauge — it increments but never decrements for crashed requestsImpact
Environment
opentelemetry-instrumentation-flask==0.57b0(also confirmed present onmain/ 0.62b0)--workers 1 --threads N --worker-class gthread