Skip to content

Sandbox SDK: startProcess stream controller closes prematurely, killing agent processes #13442

@jbbjbb

Description

@jbbjbb

Bug Report: Sandbox SDK Stream Controller Premature Closure

Environment

  • @cloudflare/sandbox via Cloudflare Workers
  • Sandbox container running OpenClaw AI agents (Node.js processes)
  • Bun-based container runtime (/$bunfs/root/sandbox)

Description

When running long-lived processes via sandbox.startProcess() with onOutput callbacks, the internal ReadableStream controller closes prematurely while the child process is still streaming output. This causes:

  1. TypeError: Invalid state: Controller is already closed in the SDK's internal Bun runtime
  2. The onOutput callback stops receiving data
  3. The process is marked as "Failed to execute streaming command"
  4. Subsequent SDK operations (containerFetch, exec) become unreliable — the SDK bridge enters a degraded state

Reproduction

This occurs consistently when:

  • A startProcess() command runs for >2 minutes producing continuous output
  • Multiple concurrent startProcess() calls are active (e.g., parent agent spawns a child agent via OpenClaw's sessions_spawn)
  • The container is under memory pressure (~1.4GB+ of 5.4GB used)

Approximate frequency: Every 30-80 minutes during sustained agent activity.

Error Details

Two correlated errors appear in Cloudflare container logs:

Error 1: Background streaming failure

{
  "level": "error",
  "component": "container",
  "originalError": "Invalid state: Controller is already closed",
  "error": {
    "message": "Invalid state: Controller is already closed",
    "stack": "TypeError: Invalid state: Controller is already closed\n    at unknown\n    at <anonymous> (native:1:11)\n    at _ (/$bunfs/root/sandbox:26:12)\n    at <anonymous> (/$bunfs/root/sandbox:50:9084)\n    at forEach (native:1:11)\n    at <anonymous> (/$bunfs/root/sandbox:50:9076)\n    at <anonymous> (/$bunfs/root/sandbox:159:5464)\n    at processTicksAndRejections (native:7:39)"
  },
  "$metadata": {
    "error": "Error during background streaming",
    "message": "Error during background streaming",
    "type": "cf-container"
  }
}

Error 2: Command execution failure (same timestamp)

{
  "level": "error",
  "component": "container",
  "command": "openclaw agent --agent 'clarity-ordinator' --message '...' --json",
  "error": {
    "message": "Invalid state: Controller is already closed",
    "stack": "(same as above)"
  },
  "$metadata": {
    "error": "Failed to execute streaming command",
    "message": "Failed to execute streaming command",
    "type": "cf-container"
  }
}

Impact

After this error:

  • The SDK bridge enters a degraded state where containerFetch() fails but exec() may still work briefly
  • Eventually the bridge becomes fully unresponsive ("bridge-dead")
  • The only recovery is sandbox.destroy() + fresh container creation
  • All in-progress agent work is lost

What we'd like

  1. onError callback in startProcess() options — so callers can detect streaming failures and handle gracefully (log, mark task as failed, skip waiting for onExit)
  2. Stream controller resilience — if the controller closes, the error should be surfaced to the caller rather than silently breaking the bridge
  3. Graceful degradation — a single stream failure shouldn't poison the entire SDK bridge for subsequent operations

Workaround

We've mitigated the worst case by:

  • Denying OpenClaw's sessions_spawn tool to prevent concurrent startProcess() calls
  • Running agents serially through a queue (max_batch_size=1)
  • Circuit-breaker that destroys the container after detecting the degraded state

But the underlying stream controller issue still causes flaps every 30-80 minutes during any sustained agent activity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions