Skip to content

dns: coalesce identical concurrent lookup() requests#62599

Open
orgads wants to merge 1 commit intonodejs:mainfrom
orgads:dns-coalesce-lookup
Open

dns: coalesce identical concurrent lookup() requests#62599
orgads wants to merge 1 commit intonodejs:mainfrom
orgads:dns-coalesce-lookup

Conversation

@orgads
Copy link
Copy Markdown
Contributor

@orgads orgads commented Apr 5, 2026

When multiple callers issue dns.lookup() for the same (hostname, family, hints, order) concurrently, only one getaddrinfo call is now dispatched to the libuv threadpool. All callers share the result.

getaddrinfo is a blocking call that runs on the libuv threadpool (capped at 4 threads by default, with a slow I/O concurrency limit of 2). When DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving resolver - identical requests queue behind each other, causing timeouts that grow linearly with the number of concurrent callers:

Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
After: 100 parallel lookup('host') -> 1 getaddrinfo call = ~10 s

This is particularly severe on WSL, where the DNS relay rewrites QNAMEs in responses (appending the search domain), causing glibc to discard them as non-matching and wait for a 5s timeout per retry.

The coalescing is keyed on (hostname, family, hints, order) so lookups with different options still get separate getaddrinfo calls. Each caller independently post-processes the shared raw result (applying the 'all' flag, constructing address objects, etc.).

Fixes: #62503

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added dns Issues and PRs related to the dns subsystem. needs-ci PRs that need a full CI run. labels Apr 5, 2026
orgads added a commit to orgads/node that referenced this pull request Apr 5, 2026
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool.  All callers share the result.

getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2).  When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:

  Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
  After:  100 parallel lookup('host') -> 1 getaddrinfo  call = ~10 s

This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.

The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls.  Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).

Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
@orgads orgads force-pushed the dns-coalesce-lookup branch from 36f0f63 to c754d12 Compare April 5, 2026 10:56
orgads added a commit to orgads/node that referenced this pull request Apr 5, 2026
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool.  All callers share the result.

getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2).  When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:

  Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
  After:  100 parallel lookup('host') -> 1 getaddrinfo  call = ~10 s

This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.

The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls.  Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).

Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
@orgads orgads force-pushed the dns-coalesce-lookup branch from c754d12 to b213b0f Compare April 5, 2026 11:09
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.72%. Comparing base (12249cc) to head (610a0c3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #62599      +/-   ##
==========================================
- Coverage   91.53%   89.72%   -1.81%     
==========================================
  Files         352      695     +343     
  Lines      147833   214543   +66710     
  Branches    23148    41076   +17928     
==========================================
+ Hits       135321   192503   +57182     
- Misses      12255    14095    +1840     
- Partials      257     7945    +7688     
Files with missing lines Coverage Δ
lib/dns.js 99.48% <100.00%> (+0.83%) ⬆️
lib/internal/dns/promises.js 99.02% <100.00%> (+1.46%) ⬆️

... and 463 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might break async_hooks continuation, could you verify? Adding an AsyncResource on coalescing would be enough.

orgads added a commit to orgads/node that referenced this pull request Apr 5, 2026
When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool.  All callers share the result.

getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2).  When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:

  Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
  After:  100 parallel lookup('host') -> 1 getaddrinfo  call = ~10 s

This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.

The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls.  Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).

Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
@orgads orgads force-pushed the dns-coalesce-lookup branch from b213b0f to cee78bf Compare April 5, 2026 15:43
@orgads
Copy link
Copy Markdown
Contributor Author

orgads commented Apr 5, 2026

Great observation! Should be fixed now.

When multiple callers issue dns.lookup() for the same (hostname, family,
hints, order) concurrently, only one getaddrinfo call is now dispatched
to the libuv threadpool.  All callers share the result.

getaddrinfo is a blocking call that runs on the libuv threadpool (capped
at 4 threads by default, with a slow I/O concurrency limit of 2).  When
DNS resolution is slow - e.g. ~10-20 s per call due to a misbehaving
resolver - identical requests queue behind each other, causing timeouts
that grow linearly with the number of concurrent callers:

  Before: 100 parallel lookup('host') -> 50 batches × 10 s = 500+ s
  After:  100 parallel lookup('host') -> 1 getaddrinfo  call = ~10 s

This is particularly severe on WSL, where the DNS relay rewrites QNAMEs
in responses (appending the search domain), causing glibc to discard
them as non-matching and wait for a 5s timeout per retry.

The coalescing is keyed on (hostname, family, hints, order) so lookups
with different options still get separate getaddrinfo calls.  Each
caller independently post-processes the shared raw result (applying the
'all' flag, constructing address objects, etc.).

Signed-off-by: Orgad Shaneh <orgad.shaneh@audiocodes.com>
PR-URL: nodejs#62599
Fixes: nodejs#62503
@orgads orgads force-pushed the dns-coalesce-lookup branch from cee78bf to 610a0c3 Compare April 5, 2026 16:25
@mcollina mcollina added semver-minor PRs that contain new features and should be released in the next minor version. needs-citgm PRs that need a CITGM CI run. dont-land-on-v22.x PRs that should not land on the v22.x-staging branch and should not be released in v22.x. dont-land-on-v24.x PRs that should not land on the v24.x-staging branch and should not be released in v24.x. baking-for-lts PRs that need to wait before landing in a LTS release. and removed dont-land-on-v22.x PRs that should not land on the v22.x-staging branch and should not be released in v22.x. dont-land-on-v24.x PRs that should not land on the v24.x-staging branch and should not be released in v24.x. labels Apr 5, 2026
@mcollina
Copy link
Copy Markdown
Member

mcollina commented Apr 5, 2026

I have a feeling that the failing test on GHA is relevant, can you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

baking-for-lts PRs that need to wait before landing in a LTS release. dns Issues and PRs related to the dns subsystem. needs-ci PRs that need a full CI run. needs-citgm PRs that need a CITGM CI run. semver-minor PRs that contain new features and should be released in the next minor version.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dns.lookup: pending promises grow unboundedly under sustained EAI_AGAIN load

3 participants