Skip to content

Pool does not respect keep-alive bug#4332

Closed
Ethan-Arrowood wants to merge 5 commits into
mainfrom
bug-pool-keep-alive
Closed

Pool does not respect keep-alive bug#4332
Ethan-Arrowood wants to merge 5 commits into
mainfrom
bug-pool-keep-alive

Conversation

@Ethan-Arrowood

Copy link
Copy Markdown
Collaborator

Ref #4331

@ronag

ronag commented Jul 11, 2025

Copy link
Copy Markdown
Member

Any chance to have a simpler test?

@Ethan-Arrowood

Copy link
Copy Markdown
Collaborator Author

Not sure how to simplify this any further

@Ethan-Arrowood

Copy link
Copy Markdown
Collaborator Author

I can try to reduce some of the numbers like connections and total requests if thats what you mean.

@Ethan-Arrowood

Copy link
Copy Markdown
Collaborator Author

Okay it looks like it reproduces with a single connection and only a couple requests per batch. Are you looking for something different?

@metcoder95

Copy link
Copy Markdown
Member

Does this happens only on batches or can be reproduced by attempting a single request before the keep alive timeout reaches?

Have you tried attaching a listener to pool for connectionError to see if the socket times out in between?

Will try to spend some time over the week on this

@Ethan-Arrowood

Ethan-Arrowood commented Jul 14, 2025

Copy link
Copy Markdown
Collaborator Author

Okay so it works fine with just Client. It still fails with a singular request (just change batchsize to 1), and I'm not seeing any connectionErrors get emitted. I did try the disconnect event and that fires (obviously) and all its telling me is that after the first request, it gets an "other side closed" error. I don't know how this could be the server's fault since it works fine for a regular Client. Is pool not maintaining the client connections the same way?

It seems like it simply boils down to Pool does not respect keep alive

Its not so simple it turns out... after more testing it seems like a Pool configured to a single connection, with two requests going through it, the second firing half the timeout after the first, sometimes it reuses the connection, sometimes it doesn't.

For very short values like 2s it is flaky. For values from 4s to 10s it seems to consistently respect the keep-alive, but after 20s it seems to always fail.

I'm always setting the max timeout to double the timeout value.

Client always respects keep alive no matter the timing.


I debugged and at least this part: https://github.com/nodejs/undici/blob/bug-pool-keep-alive/lib/dispatcher/pool.js#L99 is always returning the existing client for the second request. But yet the connection is not being reused.

I added some log lines around the parser timeout (https://github.com/nodejs/undici/blob/bug-pool-keep-alive/lib/dispatcher/client-h1.js#L744) and I don't see it timing out. So its gotta be something else.

Does anyone else have a better idea? I'm going to keep trying here but would love second set of eyes.

@Ethan-Arrowood

Copy link
Copy Markdown
Collaborator Author

I've been chasing ghosts. And I feel pretty dumb about it. I was missing the fact that Node.js HTTP server has a 5s default keep alive timeout (keepAliveTimeout property here https://nodejs.org/api/http.html#httpcreateserveroptions-requestlistener). So it was the Node.js server all along that was shutting down connections.

My test code verifying Client was invalid (i had the second request happening outside the setTimeout and didn't realize until further review).

So there actually was no difference between Client and Server, and the specific timings I was seeing had to do with that 5s default on the server side.

All in all Pool does respect keep alive just like Client does, just make sure the server respects it as well!

@Ethan-Arrowood

Copy link
Copy Markdown
Collaborator Author

Apologies for the noise.

@Ethan-Arrowood Ethan-Arrowood deleted the bug-pool-keep-alive branch July 15, 2025 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants