Sandbox SDK: startProcess stream controller closes prematurely, killing agent processes

## Bug Report: Sandbox SDK Stream Controller Premature Closure

### Environment
- `@cloudflare/sandbox` via Cloudflare Workers
- Sandbox container running OpenClaw AI agents (Node.js processes)
- Bun-based container runtime (`/$bunfs/root/sandbox`)

### Description

When running long-lived processes via `sandbox.startProcess()` with `onOutput` callbacks, the internal ReadableStream controller closes prematurely while the child process is still streaming output. This causes:

1. `TypeError: Invalid state: Controller is already closed` in the SDK's internal Bun runtime
2. The `onOutput` callback stops receiving data
3. The process is marked as "Failed to execute streaming command"
4. Subsequent SDK operations (`containerFetch`, `exec`) become unreliable — the SDK bridge enters a degraded state

### Reproduction

This occurs consistently when:
- A `startProcess()` command runs for >2 minutes producing continuous output
- Multiple concurrent `startProcess()` calls are active (e.g., parent agent spawns a child agent via OpenClaw's `sessions_spawn`)
- The container is under memory pressure (~1.4GB+ of 5.4GB used)

**Approximate frequency:** Every 30-80 minutes during sustained agent activity.

### Error Details

Two correlated errors appear in Cloudflare container logs:

**Error 1: Background streaming failure**
```
{
  "level": "error",
  "component": "container",
  "originalError": "Invalid state: Controller is already closed",
  "error": {
    "message": "Invalid state: Controller is already closed",
    "stack": "TypeError: Invalid state: Controller is already closed\n    at unknown\n    at <anonymous> (native:1:11)\n    at _ (/$bunfs/root/sandbox:26:12)\n    at <anonymous> (/$bunfs/root/sandbox:50:9084)\n    at forEach (native:1:11)\n    at <anonymous> (/$bunfs/root/sandbox:50:9076)\n    at <anonymous> (/$bunfs/root/sandbox:159:5464)\n    at processTicksAndRejections (native:7:39)"
  },
  "$metadata": {
    "error": "Error during background streaming",
    "message": "Error during background streaming",
    "type": "cf-container"
  }
}
```

**Error 2: Command execution failure (same timestamp)**
```
{
  "level": "error",
  "component": "container",
  "command": "openclaw agent --agent 'clarity-ordinator' --message '...' --json",
  "error": {
    "message": "Invalid state: Controller is already closed",
    "stack": "(same as above)"
  },
  "$metadata": {
    "error": "Failed to execute streaming command",
    "message": "Failed to execute streaming command",
    "type": "cf-container"
  }
}
```

### Impact

After this error:
- The SDK bridge enters a degraded state where `containerFetch()` fails but `exec()` may still work briefly
- Eventually the bridge becomes fully unresponsive ("bridge-dead")
- The only recovery is `sandbox.destroy()` + fresh container creation
- All in-progress agent work is lost

### What we'd like

1. **`onError` callback** in `startProcess()` options — so callers can detect streaming failures and handle gracefully (log, mark task as failed, skip waiting for `onExit`)
2. **Stream controller resilience** — if the controller closes, the error should be surfaced to the caller rather than silently breaking the bridge
3. **Graceful degradation** — a single stream failure shouldn't poison the entire SDK bridge for subsequent operations

### Workaround

We've mitigated the worst case by:
- Denying OpenClaw's `sessions_spawn` tool to prevent concurrent `startProcess()` calls
- Running agents serially through a queue (max_batch_size=1)
- Circuit-breaker that destroys the container after detecting the degraded state

But the underlying stream controller issue still causes flaps every 30-80 minutes during any sustained agent activity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sandbox SDK: startProcess stream controller closes prematurely, killing agent processes #13442

Bug Report: Sandbox SDK Stream Controller Premature Closure

Environment

Description

Reproduction

Error Details

Impact

What we'd like

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sandbox SDK: startProcess stream controller closes prematurely, killing agent processes #13442

Description

Bug Report: Sandbox SDK Stream Controller Premature Closure

Environment

Description

Reproduction

Error Details

Impact

What we'd like

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions