Fix TCPConnector.close race nullifying resolver during in-flight resolve (#12497)#12688
Draft
jbbqqf wants to merge 2 commits into
Draft
Fix TCPConnector.close race nullifying resolver during in-flight resolve (#12497)#12688jbbqqf wants to merge 2 commits into
jbbqqf wants to merge 2 commits into
Conversation
Close the connector before the owned resolver so an in-flight non-cached
resolve suspended inside a trace callback (e.g. on_dns_resolvehost_start)
sees the closed connector instead of dereferencing a nullified backend.
The non-cached resolve path now raises ClientConnectionError("Connector
is closed.") after the trace callback await, matching the behavior the
cached path already gets from _close_immediately() cancelling tracked
resolve tasks.
Fixes aio-libs#12497.
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Merging this PR will not alter performance
Comparing Footnotes
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What do these changes do?
When
TCPConnector.close()runs concurrently with an in-flight non-cachedresolve that is suspended inside a trace callback (e.g.
on_dns_resolvehost_start), the owned resolver used to be torn downbefore the connector was marked closed. The suspended resolve then
resumed, called
self._resolver.resolve(...)against a nullified backend,and crashed with
AttributeError: 'NoneType' object has no attribute 'getaddrinfo'.This PR:
TCPConnector.close()(so_closedflipsto
Trueand any tracked resolve tasks are cancelled via_close_immediately), then awaitsself._resolver.close()._resolve_hostpath,short-circuits with
ClientConnectionError("Connector is closed.")whenever
self._closedis set. The non-cached path doesn't store itsresolve task in
_resolve_host_tasks, so this guard is what makes theteardown observable to the resuming coroutine.
Net effect: the resuming coroutine now sees the connector as closed and
surfaces
ClientConnectionErrorto the caller, matching the cached-pathbehavior described in the issue.
Are there changes in behavior for the user?
Yes, but only in the previously-broken edge case: callers that hit this
race now see
ClientConnectionError("Connector is closed.")instead ofAttributeError. The new error matches the wording of the existing checkat
aiohttp/connector.py:621raised on the post-resolve fast path, soit's consistent with how an already-closed connector is reported
elsewhere.
Is it a substantial burden for the maintainers to support this?
No. The fix is two narrow edits in
aiohttp/connector.py(
TCPConnector.closeswap + aself._closedguard right after thetrace-callback await) plus a regression test. No public-API change.
Related issue number
Fixes #12497.
Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
The new regression test
test_close_during_non_cached_resolve_raises_client_errorfails on master with the exact
AttributeError: 'NoneType' object has no attribute 'getaddrinfo'from the issue, and passes on this branch.Edge cases
use_dns_cache=False+ trace cb yielding before resolveAttributeError: 'NoneType'...ClientConnectionError("Connector is closed.")use_dns_cache=Truecached hit + close mid-flight_cached_hosts.next_addrs)use_dns_cache=Truecache miss + tracked task_close_immediatelysuper().close()now precedesresolver.close())close()reentry)_close_immediatelyearly-returns on_closed)resolver_owner=False)resolver.close()Checklist
CONTRIBUTORS.txtCHANGES/folder (CHANGES/12497.bugfix.rst)Drafted with Claude Code (claude-opus-4-7, Anthropic); to be reviewed by @jbbqqf.