fix: wait for pending requests before aborting transports on close#1692
fix: wait for pending requests before aborting transports on close#1692travisbreaks wants to merge 2 commits intomodelcontextprotocol:mainfrom
Conversation
When close() is called on SSEClientTransport or StreamableHTTPClientTransport, the immediate AbortController.abort() causes Undici to mark in-flight requests as failed (UND_ERR_ABORTED), even when the server has already returned 200 OK. This breaks OpenTelemetry traces by misclassifying successful requests as errors. Track pending send() requests and wait for them to settle (with a 2s timeout to prevent hanging) before calling abort(). This allows successful responses to complete processing, so Undici/OpenTelemetry sees the actual 200 OK instead of an abort error. Stuck requests are still cleaned up after the timeout. Fixes modelcontextprotocol#1231
|
@modelcontextprotocol/client
@modelcontextprotocol/server
@modelcontextprotocol/express
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
|
@claude review |
There was a problem hiding this comment.
The approach looks sound, but this modifies core transport close behavior used by all clients and has no new tests covering the graceful-close path. Worth a human look to confirm the 2-second timeout and tracking logic are appropriate for all use cases.
Extended reasoning...
Overview
This PR modifies SSEClientTransport and StreamableHTTPClientTransport in packages/client/src/client/ to add graceful close semantics. Instead of immediately calling abort(), close() now waits up to 2 seconds for in-flight requests to settle via Promise.allSettled before aborting. The send() method is split into a public tracking wrapper and a private _doSend() implementation. Tracked promises use .catch(() => {}) to prevent unhandled rejections.
Security risks
No security concerns. The changes are purely about transport lifecycle management — no auth, crypto, or permission logic is touched. The abort signal is still called after the grace period, so there is no risk of leaving connections open indefinitely.
Level of scrutiny
High. These are core transport classes that every MCP client depends on. A 2-second close delay could affect real-world usage patterns (e.g., CLI tools, test suites, serverless functions). The PR also has no new tests covering the graceful close path, and the changeset bot flagged a missing changeset. The timer leak nit (posted as inline comment) could cause test runners to report open handles.
Other factors
- The
_endpoint\!non-null assertion insse.ts_doSend()is safe becausesend()guards withif (\!this._endpoint)before calling_doSend(). - The tracked-promise pattern (
.catch(() => {})+.finally()cleanup) correctly avoids unhandled rejections while preserving the original rejection for callers. - No existing reviewer comments to address — this is the first substantive review.
- CODEOWNERS assigns all files to
@modelcontextprotocol/typescript-sdkby default.
| // successful responses are not marked as aborted by Undici/OpenTelemetry. | ||
| if (this._pendingRequests.size > 0) { | ||
| const GRACEFUL_CLOSE_TIMEOUT = 2000; | ||
| await Promise.race([ | ||
| Promise.allSettled(this._pendingRequests), | ||
| new Promise<void>(resolve => setTimeout(resolve, GRACEFUL_CLOSE_TIMEOUT)) | ||
| ]); |
There was a problem hiding this comment.
🟡 Nit: The setTimeout fallback in Promise.race within close() is never cleared when Promise.allSettled resolves first. In Node.js, an active timer keeps the event loop alive, so the process cannot exit for up to 2 seconds after close() completes — this will cause test runners (Jest/Vitest) to report open handles. Fix by storing the timer ID and calling clearTimeout after the race, or by using setTimeout(...).unref(). Same issue exists in streamableHttp.ts line 462.
Extended reasoning...
What the bug is
In both sse.ts (line 246) and streamableHttp.ts (line 462), the close() method uses Promise.race between Promise.allSettled(this._pendingRequests) and a setTimeout-based promise as a 2-second fallback. However, the timer ID from setTimeout is never stored, so when Promise.allSettled wins the race (the common case — pending requests complete quickly), the timer is never cleared.
Why this matters
In Node.js, an active timer handle keeps the event loop alive. This means that even after close() has resolved and all cleanup is done, the process cannot exit for up to 2 seconds while waiting for the orphaned timer to fire. This is particularly problematic for:
- Test suites: Jest and Vitest will report "open handles" warnings and may hang for 2 seconds after each test that calls
close()with pending requests. - CLI tools: Short-lived processes will appear to hang briefly before exiting.
- Any scenario requiring prompt process exit after closing the transport.
Step-by-step proof
- A client sends a request via
send(), which adds a tracked promise to_pendingRequests. - The server responds quickly (say, within 50ms).
- The caller invokes
close(). Since_pendingRequests.size > 0, the code enters thePromise.raceblock. Promise.allSettledresolves almost immediately (the request already completed).Promise.raceresolves, andclose()proceeds to abort and clean up.- However, the
setTimeout(resolve, 2000)timer is still active in the event loop — nobody stored its ID or calledclearTimeout. - The process cannot exit for ~2 seconds until the timer fires.
How to fix
The simplest fix is to use .unref() on the timer, which tells Node.js not to keep the event loop alive for it:
new Promise<void>(resolve => setTimeout(resolve, GRACEFUL_CLOSE_TIMEOUT).unref())Alternatively, store the timer ID and clear it after the race:
let timerId: ReturnType<typeof setTimeout>;
const timeout = new Promise<void>(resolve => { timerId = setTimeout(resolve, GRACEFUL_CLOSE_TIMEOUT); });
await Promise.race([Promise.allSettled(this._pendingRequests), timeout]);
clearTimeout(timerId\!);Both approaches are straightforward and prevent the delayed-exit behavior.
|
Thanks for this. I think the pattern here is to await requests before closing: const result = await client.callTool(...);
await client.close();If For the SIGINT / graceful-shutdown case where you don't know what's in flight, I'd rather see that as an opt-in on |
Summary
Fixes #1231.
When
close()is called onSSEClientTransportorStreamableHTTPClientTransport, it immediately callsAbortController.abort(). If anysend()POST requests are still in-flight (even if the server already returned 200 OK), Undici receives the abort signal and marks them asUND_ERR_ABORTED, corrupting OpenTelemetry traces and making successful requests appear failed.Changes:
_pendingRequestsSet to track in-flight send operations in both transport classesclose()to wait for pending requests to settle (Promise.allSettled) with a 2-second timeout before callingabort()send()into a public method (tracks the promise) and_doSend()(does the actual work).catch(() => {})to prevent unhandled rejections while preserving the original rejection for callersAll 386 existing tests pass.