Skip to content

Streamed responses fail under concurrent load with "Response object has been garbage collected" #1659

@meitalbensinai

Description

@meitalbensinai

Symptom

Under concurrent streaming load, gateway logs show:

```
Error during stream processing: openai [ undefined, 'Response object has been garbage collected' ]
Failed to close the writer: openai TypeError [ERR_INVALID_STATE]: Invalid state: WritableStream is closed
```

Downstream consumers (Node fetch / undici / OpenAI SDK) see:

```
TypeError: terminated
at Fetch.onAborted (node:internal/deps/undici/undici:...)
at TLSSocket.onHttpSocketClose (...)
cause: { name: 'SocketError', message: 'other side closed' }
```

The consumer's finishReason becomes 'other' and the stream is truncated mid-token — often missing trailing tool-call payloads, which can break agent / function-calling flows.

The failure frequency scales with concurrent stream count (allocation churn from many concurrent Responses drives V8 GC).

Root cause (brief)

In src/handlers/streamHandler.tshandleStreamingMode, the upstream Response is referenced only via response.body.getReader(). The async IIFE that pipes upstream → writer captures reader and writer, but not response itself. After the function returns, the caller only holds the new Response wrapping readable, so the upstream response becomes unreachable. Node's ReadableStreamDefaultReader doesn't keep its parent Response alive — the Response is what owns the network connection — so V8 GC can collect it mid-stream, and the next reader.read() throws.

The unawaited IIFE promise is also unanchored, a secondary GC hazard.

Related prior work

PR #1306 ("fix: handle stream close failures", merged 2025-09-03) explicitly names this error in its description but its scope was wrapping the secondary writer.close() failure in try/catch. That stops the unhandled rejection from crashing Node, but does not prevent the primary GC of the upstream Response. The stream is still lost.

Proposed fix

PR #1658 — anchor the upstream response, reader, writer, and the piping promise on the returned readable so the caller's Response keeps the entire chain alive for the stream's lifetime.

Reproducing

Open enough concurrent streamed /v1/chat/completions requests against any provider (we observed it most clearly with OpenAI-compatible providers at 30+ concurrent in-flight streams) and watch gateway logs for the GC error. Frequency depends on Node version, stream durations, and how much per-stream allocation churn the response transformer adds.

Environment

  • Node 20.x (newer Node is more aggressive about freeing unreferenced HTTP resources, makes this worse)
  • Gateway main branch (also present on 1.15.x and older releases — same code structure)
  • Any streaming OpenAI-compatible provider

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions