Skip to content

fix(flask): wrap wsgi_app call in try/except to prevent active_requests gauge leak#4433

Open
alliasgher wants to merge 11 commits intoopen-telemetry:mainfrom
alliasgher:fix-flask-active-requests-gauge-leak
Open

fix(flask): wrap wsgi_app call in try/except to prevent active_requests gauge leak#4433
alliasgher wants to merge 11 commits intoopen-telemetry:mainfrom
alliasgher:fix-flask-active-requests-gauge-leak

Conversation

@alliasgher
Copy link
Copy Markdown
Contributor

@alliasgher alliasgher commented Apr 14, 2026

Description

Wrap the wsgi_app call in try/finally so that http.server.active_requests is decremented when the wrapped WSGI app raises. Without this, every exception leaves the gauge permanently above the real number of in-flight requests, causing steady drift upward over the process lifetime.

Fixes #4403

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added test_active_requests_decremented_on_error in test_programmatic.py which drives a handler that raises and asserts the gauge ends at 0.

Checklist

  • Changelog entry added under Unreleased / Fixed

@liyaka
Copy link
Copy Markdown

liyaka commented Apr 14, 2026

Thanks for the quick fix! We hit this exact bug in production — the leaked gauge was preventing our Kubernetes HPA from scaling down.

One suggestion: consider using try/finally instead of try/except to keep the decrement in a single code path. With the current approach, active_requests_counter.add(-1, ...) exists in two places (the except block and line 458), which is easy to get out of sync during future refactors.

The WSGI instrumentation already uses this pattern:

# wsgi/__init__.py
try:
    ...
    return _end_span_after_iterating(iterable, span, token)
except Exception as ex:
    raise
finally:
    self.active_requests_counter.add(-1, active_requests_count_attrs)

For Flask it would look like:

try:
    result = wsgi_app(wrapped_app_environ, _start_response)
    # ...duration recording unchanged...
    return result
finally:
    active_requests_counter.add(-1, active_requests_count_attrs)

This also removes the original decrement at line 458, so there's exactly one decrement path.

Also worth adding a test to prevent regression — I have one in #4437 (test_active_requests_counter_decremented_on_error) that could be adapted here.

@alliasgher
Copy link
Copy Markdown
Contributor Author

alliasgher commented Apr 14, 2026

Good catch — updated to use try/finally so the decrement is in a single code path regardless of whether wsgi_app succeeds or raises @liyaka

@alliasgher alliasgher force-pushed the fix-flask-active-requests-gauge-leak branch from bb98692 to 5c2b07a Compare April 14, 2026 20:20
@alliasgher
Copy link
Copy Markdown
Contributor Author

Added the regression test test_active_requests_counter_decremented_on_error. It hits /hello/500 (which raises ValueError internally), collects http.server.active_requests, and asserts the value is back to 0 after both requests complete.

…ts gauge leak

If wsgi_app() raises an uncaught exception, the active_requests_counter
decrement at the end of _wrapped_app was never reached, causing the gauge
to permanently read high. Kubernetes HPA and similar systems would see
phantom load.

Add a bare try/except that decrements the counter and re-raises on
exception, matching the pattern already used in the WSGI instrumentation.

Fixes open-telemetry#4431

Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: alliasgher <alliasgher123@gmail.com>
Signed-off-by: Ali <alliasgher123@gmail.com>
@alliasgher alliasgher force-pushed the fix-flask-active-requests-gauge-leak branch from 5c2b07a to efb677f Compare April 15, 2026 20:21
@alliasgher alliasgher requested a review from a team as a code owner April 15, 2026 20:21
Copy link
Copy Markdown
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @alliasgher - looks good to me.

@github-project-automation github-project-automation Bot moved this to Approved PRs in Python PR digest Apr 15, 2026
@liyaka
Copy link
Copy Markdown

liyaka commented Apr 16, 2026

Any estimation for this fix to be released?
thank you!

Signed-off-by: Ali <alliasgher123@gmail.com>
@alliasgher alliasgher force-pushed the fix-flask-active-requests-gauge-leak branch from e56a8af to b578c8d Compare April 16, 2026 21:14
The check-links workflow fails on any PR that touches CHANGELOG.md
because the full file is scanned and five historical entries contain
broken URLs:

- open-telemetry#1670 and open-telemetry#227 entries have a stray `]` inside the URL.
- The open-telemetry#1033 entry is missing the `/` between the org and repo in the URL.
- The `aws.ecs.*` spec link points to the old path in
  opentelemetry-specification; the content has since moved to the
  semantic-conventions repo.
- The 1.12.0rc2-0.32b0 release tag does not exist on
  opentelemetry-python; drop the link, keep the heading text.

Signed-off-by: Ali <alliasgher123@gmail.com>
@alliasgher
Copy link
Copy Markdown
Contributor Author

Pushed bb17ad0 to fix broken links in CHANGELOG.md that were failing check-links CI.

Comment thread CHANGELOG.md
Per @MikeGoldsmith's review, the changelog broken-link cleanup that was
folded into this branch via bb17ad0 should not be mixed with the flask
fix. Reset CHANGELOG.md to upstream/main and re-add only this PR's
entry. Most of the broken links from bb17ad0 have already been fixed
on main independently, so no separate cleanup PR is needed.

Signed-off-by: Ali <alliasgher123@gmail.com>
…quests-gauge-leak

Signed-off-by: Ali <alliasgher123@gmail.com>

# Conflicts:
#	CHANGELOG.md
@liyaka
Copy link
Copy Markdown

liyaka commented Apr 27, 2026

@alliasgher looks like PR is approved - can you please solve the conflicts and merge it?
We are waiting for this fix

@alliasgher
Copy link
Copy Markdown
Contributor Author

alliasgher commented Apr 27, 2026

@liyaka no conflicts. only repo maintainers have merge access; I can't merge it

@liyaka
Copy link
Copy Markdown

liyaka commented Apr 28, 2026

@MikeGoldsmith any estimation for this PR merge and release of the fix?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Approved PRs

Development

Successfully merging this pull request may close these issues.

3 participants