Bug Report: Sandbox SDK Stream Controller Premature Closure
Environment
@cloudflare/sandbox via Cloudflare Workers
- Sandbox container running OpenClaw AI agents (Node.js processes)
- Bun-based container runtime (
/$bunfs/root/sandbox)
Description
When running long-lived processes via sandbox.startProcess() with onOutput callbacks, the internal ReadableStream controller closes prematurely while the child process is still streaming output. This causes:
TypeError: Invalid state: Controller is already closed in the SDK's internal Bun runtime
- The
onOutput callback stops receiving data
- The process is marked as "Failed to execute streaming command"
- Subsequent SDK operations (
containerFetch, exec) become unreliable — the SDK bridge enters a degraded state
Reproduction
This occurs consistently when:
- A
startProcess() command runs for >2 minutes producing continuous output
- Multiple concurrent
startProcess() calls are active (e.g., parent agent spawns a child agent via OpenClaw's sessions_spawn)
- The container is under memory pressure (~1.4GB+ of 5.4GB used)
Approximate frequency: Every 30-80 minutes during sustained agent activity.
Error Details
Two correlated errors appear in Cloudflare container logs:
Error 1: Background streaming failure
{
"level": "error",
"component": "container",
"originalError": "Invalid state: Controller is already closed",
"error": {
"message": "Invalid state: Controller is already closed",
"stack": "TypeError: Invalid state: Controller is already closed\n at unknown\n at <anonymous> (native:1:11)\n at _ (/$bunfs/root/sandbox:26:12)\n at <anonymous> (/$bunfs/root/sandbox:50:9084)\n at forEach (native:1:11)\n at <anonymous> (/$bunfs/root/sandbox:50:9076)\n at <anonymous> (/$bunfs/root/sandbox:159:5464)\n at processTicksAndRejections (native:7:39)"
},
"$metadata": {
"error": "Error during background streaming",
"message": "Error during background streaming",
"type": "cf-container"
}
}
Error 2: Command execution failure (same timestamp)
{
"level": "error",
"component": "container",
"command": "openclaw agent --agent 'clarity-ordinator' --message '...' --json",
"error": {
"message": "Invalid state: Controller is already closed",
"stack": "(same as above)"
},
"$metadata": {
"error": "Failed to execute streaming command",
"message": "Failed to execute streaming command",
"type": "cf-container"
}
}
Impact
After this error:
- The SDK bridge enters a degraded state where
containerFetch() fails but exec() may still work briefly
- Eventually the bridge becomes fully unresponsive ("bridge-dead")
- The only recovery is
sandbox.destroy() + fresh container creation
- All in-progress agent work is lost
What we'd like
onError callback in startProcess() options — so callers can detect streaming failures and handle gracefully (log, mark task as failed, skip waiting for onExit)
- Stream controller resilience — if the controller closes, the error should be surfaced to the caller rather than silently breaking the bridge
- Graceful degradation — a single stream failure shouldn't poison the entire SDK bridge for subsequent operations
Workaround
We've mitigated the worst case by:
- Denying OpenClaw's
sessions_spawn tool to prevent concurrent startProcess() calls
- Running agents serially through a queue (max_batch_size=1)
- Circuit-breaker that destroys the container after detecting the degraded state
But the underlying stream controller issue still causes flaps every 30-80 minutes during any sustained agent activity.
Bug Report: Sandbox SDK Stream Controller Premature Closure
Environment
@cloudflare/sandboxvia Cloudflare Workers/$bunfs/root/sandbox)Description
When running long-lived processes via
sandbox.startProcess()withonOutputcallbacks, the internal ReadableStream controller closes prematurely while the child process is still streaming output. This causes:TypeError: Invalid state: Controller is already closedin the SDK's internal Bun runtimeonOutputcallback stops receiving datacontainerFetch,exec) become unreliable — the SDK bridge enters a degraded stateReproduction
This occurs consistently when:
startProcess()command runs for >2 minutes producing continuous outputstartProcess()calls are active (e.g., parent agent spawns a child agent via OpenClaw'ssessions_spawn)Approximate frequency: Every 30-80 minutes during sustained agent activity.
Error Details
Two correlated errors appear in Cloudflare container logs:
Error 1: Background streaming failure
Error 2: Command execution failure (same timestamp)
Impact
After this error:
containerFetch()fails butexec()may still work brieflysandbox.destroy()+ fresh container creationWhat we'd like
onErrorcallback instartProcess()options — so callers can detect streaming failures and handle gracefully (log, mark task as failed, skip waiting foronExit)Workaround
We've mitigated the worst case by:
sessions_spawntool to prevent concurrentstartProcess()callsBut the underlying stream controller issue still causes flaps every 30-80 minutes during any sustained agent activity.