dns: coalesce identical concurrent lookup() requests#62599
Open
orgads wants to merge 1 commit intonodejs:mainfrom
Open
dns: coalesce identical concurrent lookup() requests#62599orgads wants to merge 1 commit intonodejs:mainfrom
orgads wants to merge 1 commit intonodejs:mainfrom
Conversation
Collaborator
|
Review requested:
|
orgads
added a commit
to orgads/node
that referenced
this pull request
Apr 5, 2026
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
36f0f63 to
c754d12
Compare
orgads
added a commit
to orgads/node
that referenced
this pull request
Apr 5, 2026
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
c754d12 to
b213b0f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #62599 +/- ##
==========================================
- Coverage 91.53% 89.72% -1.81%
==========================================
Files 352 695 +343
Lines 147833 214543 +66710
Branches 23148 41076 +17928
==========================================
+ Hits 135321 192503 +57182
- Misses 12255 14095 +1840
- Partials 257 7945 +7688
🚀 New features to boost your workflow:
|
mcollina
requested changes
Apr 5, 2026
Member
mcollina
left a comment
There was a problem hiding this comment.
I think this might break async_hooks continuation, could you verify? Adding an AsyncResource on coalescing would be enough.
orgads
added a commit
to orgads/node
that referenced
this pull request
Apr 5, 2026
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
b213b0f to
cee78bf
Compare
Contributor
Author
|
Great observation! Should be fixed now. |
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2). When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls. Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).
Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
cee78bf to
610a0c3
Compare
Member
|
I have a feeling that the failing test on GHA is relevant, can you take a look? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When multiple callers issue dns.lookup() for the same (hostname, family, hints, order) concurrently, only one getaddrinfo call is now dispatched to the libuv threadpool. All callers share the result.
getaddrinfo is a blocking call that runs on the libuv threadpool (capped at 4 threads by default, with a slow I/O concurrency limit of 2). When DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving resolver - identical requests queue behind each other, causing timeouts that grow linearly with the number of concurrent callers:
Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s
This is particularly severe on WSL, where the DNS relay rewrites QNAMEs in responses (appending the search domain), causing glibc to discard them as non-matching and wait for a 5s timeout per retry.
The coalescing is keyed on (hostname, family, hints, order) so lookups with different options still get separate getaddrinfo calls. Each caller independently post-processes the shared raw result (applying the 'all' flag, constructing address objects, etc.).
Fixes: #62503