Skip to content

Stateless mode: per-request McpServer+Protocol allocation causes memory leak at scale #2090

@RomKadria

Description

@RomKadria

Problem

In stateless mode (sessionIdGenerator: undefined), the recommended pattern — including the SDK's own simpleStatelessStreamableHttp.ts example — creates a full McpServer + Protocol + StreamableHTTPServerTransport on every HTTP request:

app.post('/mcp', async (req, res) => {
    const server = getServer();  // new McpServer per request
    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
    await server.connect(transport);
    await transport.handleRequest(req, res, req.body);
    res.on('close', () => { transport.close(); server.close(); });
});

Each request allocates:

  • McpServerServerProtocol: 9 Maps/Sets (_requestHandlers, _responseHandlers, _progressHandlers, _notificationHandlers, _requestHandlerAbortControllers, _timeoutInfo, _pendingDebouncedNotifications, _taskProgressTokens, _requestResolvers), plus _loggingLevels Map
  • Server: new AjvJsonSchemaValidator (compiles JSON schemas)
  • StreamableHTTPServerTransportWebStandardStreamableHTTPServerTransport: 3 Maps (_streamMapping, _requestToStreamMapping, _requestResponseMap), plus getRequestListener from @hono/node-server

This works fine for low-traffic dev/demo scenarios. But for production HTTP servers handling sustained concurrent traffic, V8's GC can't reclaim these objects fast enough, causing steady memory growth until OOMKill.

Real-world impact

We run an MCP server (platform-mcp-gateway) in production on Kubernetes with 1200Mi memory limit. Using this pattern, memory grew ~1-2% per hour until hitting the limit, triggering repeated OOMKill alerts. The service has been running for months — this is a slow leak, not a burst.

Benchmark

We benchmarked the per-request McpServer approach vs. a lightweight JSON-RPC dispatcher that reuses the same handler functions (2,000 requests, --expose-gc):

Metric McpServer per request Lightweight dispatcher Delta
Throughput 2,797 req/s 6,536 req/s 2.3x faster
Heap growth +3.78 MB +1.41 MB 2.7x less
Per-request retained ~1,984 bytes ~738 bytes -63%

Why you can't just reuse a McpServer

The obvious fix — share one McpServer across concurrent requests — doesn't work because Protocol.connect(transport) replaces this._transport. If request A and B overlap:

  1. connect(transportA) → sets this._transport = transportA
  2. connect(transportB) → sets this._transport = transportB
  3. Request A's onmessage fires → _onrequest captures this._transport (now transportB) → response goes to wrong client

Suggestions

  1. Lightweight stateless mode: for stateless servers, the full Protocol/Transport stack is overkill — there's no session state, no SSE streaming needed, no server-initiated notifications. A StatelessMcpServer (or a flag on McpServer) could skip all the per-request infrastructure and just dispatch JSON-RPC directly.

  2. Fix the connect() transport race: if _onrequest captured the transport from the onmessage callback's closure (the transport that received the message) instead of from this._transport, a single McpServer could safely handle concurrent stateless requests.

  3. At minimum, document the trade-off: the stateless example should note that creating a server per request has significant overhead at scale and suggest alternatives for production deployments.

Environment

  • @modelcontextprotocol/sdk: 1.29.0
  • Node.js: 24.x
  • Runtime: Kubernetes pods (1200Mi limit, --max-old-space-size=900)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions