Stateless mode: per-request McpServer+Protocol allocation causes memory leak at scale

## Problem

In stateless mode (`sessionIdGenerator: undefined`), the recommended pattern — including the SDK's own `simpleStatelessStreamableHttp.ts` example — creates a full `McpServer` + `Protocol` + `StreamableHTTPServerTransport` on every HTTP request:

```typescript
app.post('/mcp', async (req, res) => {
    const server = getServer();  // new McpServer per request
    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
    await server.connect(transport);
    await transport.handleRequest(req, res, req.body);
    res.on('close', () => { transport.close(); server.close(); });
});
```

Each request allocates:
- `McpServer` → `Server` → `Protocol`: **9 Maps/Sets** (`_requestHandlers`, `_responseHandlers`, `_progressHandlers`, `_notificationHandlers`, `_requestHandlerAbortControllers`, `_timeoutInfo`, `_pendingDebouncedNotifications`, `_taskProgressTokens`, `_requestResolvers`), plus `_loggingLevels` Map
- `Server`: new `AjvJsonSchemaValidator` (compiles JSON schemas)
- `StreamableHTTPServerTransport` → `WebStandardStreamableHTTPServerTransport`: **3 Maps** (`_streamMapping`, `_requestToStreamMapping`, `_requestResponseMap`), plus `getRequestListener` from `@hono/node-server`

**This works fine for low-traffic dev/demo scenarios.** But for production HTTP servers handling sustained concurrent traffic, V8's GC can't reclaim these objects fast enough, causing steady memory growth until OOMKill.

## Real-world impact

We run an MCP server (`platform-mcp-gateway`) in production on Kubernetes with 1200Mi memory limit. Using this pattern, memory grew ~1-2% per hour until hitting the limit, triggering repeated OOMKill alerts. The service has been running for months — this is a slow leak, not a burst.

## Benchmark

We benchmarked the per-request `McpServer` approach vs. a lightweight JSON-RPC dispatcher that reuses the same handler functions (2,000 requests, `--expose-gc`):

| Metric | McpServer per request | Lightweight dispatcher | Delta |
|:--|--:|--:|:--|
| Throughput | 2,797 req/s | 6,536 req/s | 2.3x faster |
| Heap growth | +3.78 MB | +1.41 MB | 2.7x less |
| Per-request retained | ~1,984 bytes | ~738 bytes | -63% |

## Why you can't just reuse a McpServer

The obvious fix — share one `McpServer` across concurrent requests — doesn't work because `Protocol.connect(transport)` replaces `this._transport`. If request A and B overlap:

1. `connect(transportA)` → sets `this._transport = transportA`
2. `connect(transportB)` → sets `this._transport = transportB`
3. Request A's `onmessage` fires → `_onrequest` captures `this._transport` (now `transportB`) → response goes to wrong client

## Suggestions

1. **Lightweight stateless mode**: for stateless servers, the full Protocol/Transport stack is overkill — there's no session state, no SSE streaming needed, no server-initiated notifications. A `StatelessMcpServer` (or a flag on `McpServer`) could skip all the per-request infrastructure and just dispatch JSON-RPC directly.

2. **Fix the `connect()` transport race**: if `_onrequest` captured the transport from the `onmessage` callback's closure (the transport that received the message) instead of from `this._transport`, a single McpServer could safely handle concurrent stateless requests.

3. **At minimum, document the trade-off**: the stateless example should note that creating a server per request has significant overhead at scale and suggest alternatives for production deployments.

## Environment

- `@modelcontextprotocol/sdk`: 1.29.0
- Node.js: 24.x
- Runtime: Kubernetes pods (1200Mi limit, `--max-old-space-size=900`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stateless mode: per-request McpServer+Protocol allocation causes memory leak at scale #2090

Problem

Real-world impact

Benchmark

Why you can't just reuse a McpServer

Suggestions

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	McpServer per request	Lightweight dispatcher	Delta
Throughput	2,797 req/s	6,536 req/s	2.3x faster
Heap growth	+3.78 MB	+1.41 MB	2.7x less
Per-request retained	~1,984 bytes	~738 bytes	-63%

Stateless mode: per-request McpServer+Protocol allocation causes memory leak at scale #2090

Description

Problem

Real-world impact

Benchmark

Why you can't just reuse a McpServer

Suggestions

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions