Skip to content

Durable Object facets can hit parent WebSocket Native handles in production but not local dev #6702

@threepointone

Description

@threepointone

Summary

A Durable Object facet/sub-agent can fail in production with a cross-DO Native I/O error when the facet is initialized during a parent Durable Object WebSocket webSocketMessage / framework onMessage turn.

The same repro does not fail under local wrangler dev, so this appears to be either a local/prod parity gap, a facet isolation issue, or both.

This was originally reported against cloudflare/agents in cloudflare/agents#1410. The Agents SDK has a mitigation in cloudflare/agents#1425, but the runtime behavior still seems worth tracking here.

Minimal shape

The repro uses Cloudflare Agents, but the shape is just Durable Object facets plus WebSockets:

  • ParentAgent extends Agent
  • ChildAgent extends Agent
  • GET /http-spawn calls parent.spawnChild() over HTTP/RPC as a control path
  • WS /ws calls the same spawnChild() from ParentAgent.onMessage()
  • spawnChild() calls this.subAgent(ChildAgent, childName) and then child.ping(...)
  • ParentAgent is bound as a top-level Durable Object namespace
  • ChildAgent is listed in new_sqlite_classes and reached as a facet/sub-agent
  • compatibility_date = "2026-04-01"
  • compatibility_flags = ["nodejs_compat"]

Repo-local repro fixture added in the SDK PR:

https://github.com/cloudflare/agents/pull/1425/files#diff-6c4d9adce27774988d0145e4ab645d1cb33f2f0c1d5b26960644b80f522317e4

Original reporter repro:

https://github.com/Gyges-Labs/cf-agents-1410-repro/tree/807198c7664567c630b68fbd2651264360ac8dcb

Production behavior

Deployed repro URL used for validation:

https://cf-sub-agent-io-demo-1410.threepointone.workers.dev

HTTP control path succeeds:

curl "https://cf-sub-agent-io-demo-1410.threepointone.workers.dev/http-spawn?parent=http-control-$(date +%s)"

WebSocket path failed before the SDK mitigation:

node -e '
const ws = new WebSocket("wss://cf-sub-agent-io-demo-1410.threepointone.workers.dev/ws?parent=ws-repro-" + Date.now());
ws.onmessage = (event) => console.log(event.data);
ws.onopen = () => ws.send("spawn child from websocket onMessage");
setTimeout(() => ws.close(), 10_000);
'

Failure payload:

Cannot perform I/O on behalf of a different Durable Object. I/O objects
(such as streams, request/response bodies, and others) created in the context
of one Durable Object cannot be accessed from a different Durable Object in
the same isolate. This is a limitation of Cloudflare Workers which allows us
to improve overall performance. (I/O type: Native)

Stack from the production Worker:

ParentAgent._cf_resolveSubAgent
ParentAgent.subAgent
ParentAgent.spawnChild
ParentAgent.onMessage
ParentAgent._tryCatch

Local behavior

The same repro did not fail under local wrangler dev, including with the reporter's pinned versions (agents@0.11.8, wrangler@4.83.0) and with current local wrangler@4.86.0.

Both /http-spawn and /ws returned ok: true locally.

Diagnostics

Additional diagnostics in production showed:

  • The child facet could read this.ctx.id.name.
  • The child facet could read this.name.
  • The child facet could write to its own storage before super._cf_initAsFacet().
  • The failure occurred during __unsafe_ensureInitialized().
  • Inside Agents startup, broadcastMcpServers() called _broadcastProtocol(), which enumerated/sent through WebSocket connections.
  • In production, when the child facet was spawned during the parent WebSocket message turn, that touched a parent-owned Native WebSocket handle from the child DO context and tripped the cross-DO Native I/O check.

The SDK mitigation was to avoid protocol broadcasts from facets and to avoid carrying native request/WebSocket/email context handles across Agent/facet boundaries. With that mitigation, the deployed repro returns ok: true for both HTTP and WebSocket paths.

Expected runtime behavior / question

From the SDK side, facets should not inherit or touch parent WebSocket handles during startup. The SDK now avoids doing so.

From the runtime side, the confusing parts are:

  1. Production rejects parent-owned Native WebSocket I/O during child facet initialization, but local wrangler dev does not reproduce.
  2. A child facet startup path appears able to observe/trip over parent-owned WebSocket Native handles when the facet is spawned during a parent WebSocket message turn.

Should facet startup be isolated from parent WebSocket Native handles, or should local workerd/wrangler dev reproduce the same rejection as production?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions