| title | WebSocket Protocol — Framing and Lifecycle |
|---|---|
| description | WebSocket endpoints, handshake, authentication, heartbeat, subprotocols, and reconnection for VisionClaw |
This page is the framing-and-lifecycle reference for VisionClaw's WebSocket surfaces: which endpoints exist, how a connection is established and authenticated, how heartbeats and reconnection work, and which subprotocol each socket negotiates. It deliberately does not describe the binary byte layout of the position stream — that single-source spec lives in The Binary Protocol. Read the two together: this page covers the envelope and the connection state machine, the binary doc covers the 52-byte node record carried inside binary frames.
All WebSocket routes are registered in src/main.rs. The server speaks plain
ws:// in development and wss:// behind the nginx/cloudflared edge in
production (frontend :3001, API :4000).
| Path | Handler | Direction | Frames | Subprotocol |
|---|---|---|---|---|
/wss |
socket_flow_handler |
bidirectional | JSON control + binary position stream | permessage-deflate (echoed) |
/wss/agent-events |
agent_events_ws |
inbound (server-to-server) | JSON only | vc-agent-events.v1 |
/ws/presence |
ws_presence |
bidirectional | JSON handshake + binary pose | none |
/ws/speech |
speech_socket_handler |
bidirectional | binary Opus audio | none |
/ws/mcp-relay |
mcp_relay_handler |
bidirectional | JSON MCP relay | none |
/ws/client-messages |
client_messages_handler |
bidirectional | JSON client-to-client | none |
This page details the three load-bearing surfaces: the position stream
(/wss), the agent-events ingest (/wss/agent-events), and the XR
presence socket (/ws/presence). The remaining sockets follow the same
upgrade-and-authenticate pattern described under Authentication.
/wss is a hybrid socket. JSON text frames carry the connection lifecycle
(handshake, subscription control, heartbeat, drag interactions); binary frames
carry the high-frequency GPU position stream. A single connection multiplexes
both — the client branches on event.data instanceof ArrayBuffer.
The upgrade handler (src/handlers/socket_flow_handler/http_handler.rs) runs
four gates before accepting the socket, in order:
- Rate limit — per-client-IP via
WEBSOCKET_RATE_LIMITER. A rejected upgrade returns a rate-limit HTTP response, not a 101. - Origin check — the
Originheader must match theCORS_ALLOWED_ORIGINSallow-list or be same-host asHost/X-Forwarded-Host. This is the cross-site WebSocket hijacking (CSWSH) defence. A missingOriginis rejected in release builds. - Token validation — see Authentication.
- Upgrade header — a missing
Upgradeheader returns400.
On success the server upgrades, then the SocketFlowServer actor's started()
sends an opening sequence over JSON text frames:
{ "type": "connection_established",
"timestamp": 1748500000000,
"is_reconnection": false,
"state_sync_sent": true,
"protocol": { "supported": [3, 5], "preferred": 3 } }protocol.supported advertises the binary wire versions the server can emit;
preferred is the V3 per-tick frame (the current default — see
binary-protocol.md). A full state sync and a
{"type":"loading"} notice follow immediately.
The client opts into the binary stream with a JSON control frame:
{ "type": "subscribe_position_updates",
"data": { "interval": 60, "binary": true, "nodeTypes": ["knowledge", "agent"] } }intervalis the broadcast period in milliseconds (default60, clamped up to the endpoint rate limit1000 / (rpm/60)).nodeTypesis an optional server-side filter — only matching node types are encoded into binary frames for this session.
The server replies {"type":"subscription_confirmed","subscription":"position_updates",...}
and starts a single broadcast loop. Two server-side defences keep reconnect
storms from amplifying:
- Per-session debounce — subscribes arriving within 2 s of the last accepted one are ignored.
- Generation collapse — each
subscribe_position_updatesbumps a generation counter; older self-looping broadcast tasks stop when they observe a newer generation, so exactly one loop survives no matter how many times the client re-subscribes.
The stream supports application-level flow control. After consuming a binary
frame the client may send a ClientBroadcastAck carrying the broadcast sequence
number embedded in the frame. The ClientCoordinatorActor forwards it to the
GPU compute actor as PositionBroadcastAck, which replenishes the broadcast
token budget. The sequence number is the V5 frame's monotonic
broadcast_sequence; gaps signal dropped frames and warrant a
request_full_snapshot. The wire encoding of the sequence header is documented
in binary-protocol.md.
message_routing.rs dispatches JSON text frames by type:
type |
Effect |
|---|---|
ping (plain "ping" or JSON) |
refresh activity clock, reply pong |
subscribe_position_updates |
start/refresh the binary broadcast loop |
request_full_snapshot |
emit one full binary frame immediately |
requestInitialData |
send the sparse initial graph load |
filter_update |
adjust server-side node filtering |
authenticate |
upgrade an unauthenticated session in place |
nodeDragStart / nodeDragUpdate / nodeDragEnd |
pin/move/release a node |
update_physics_params |
rejected — physics changes go via REST |
Unknown types are logged and ignored; malformed JSON returns a
{"type":"error"} frame.
sequenceDiagram
participant C as Client
participant H as "Upgrade Handler"
participant S as SocketFlowServer
participant CC as ClientCoordinator
C->>H: "GET /wss (Upgrade, token, Origin)"
H->>H: "rate-limit, origin, token gates"
H-->>C: "101 Switching Protocols"
S-->>C: "connection_established (JSON)"
S-->>C: "state_sync + loading (JSON)"
C->>S: "subscribe_position_updates (JSON)"
S-->>C: "subscription_confirmed (JSON)"
loop "Every interval (~60ms)"
CC-->>C: "binary frame (V3 records + seq)"
C->>S: "ClientBroadcastAck (optional)"
end
loop "Every 5s"
S-->>C: "Ping"
C-->>S: "Pong"
end
C->>S: "Close"
/wss/agent-events is a server-to-server ingest socket (ADR-059 Phase 2):
agentbox pushes notifications/agent_action JSON-RPC frames into VisionClaw,
which validates them against the canonical AgentActionNotification mirror and
publishes them to the process-global agent-events hub. The GPU beam/gluon render
actor subscribes to that hub.
- Subprotocol — the upgrade negotiates
vc-agent-events.v1. Clients should offer it inSec-WebSocket-Protocol. - JSON only — binary frames are rejected; this is the inbound complement to
the server-to-browser
0x23beam frame, never a position channel. - Frame contract — each text frame must parse as an
AgentActionNotificationand passis_canonical()(method == "notifications/agent_action", envelopeversion >= 3). Non-canonical frames get{"error":"non_canonical_envelope"}; unparseable frames get{"error":"malformed_json"}. Neither closes the socket. - Provenance — a per-frame provenance status (signed / anonymous) is
recorded and any foreign
urn:agentbox:*source/target URN is translated through the BC20 namespace bridge. Frames publish regardless of attribution so the render path never drops on missing identity. - Heartbeat — the actor answers
PingwithPongand honoursClose; there is no server-initiated keepalive timer. The pushing client owns liveness.
Because this is not a browser socket, Origin is not enforced: non-browser
clients do not send it, and a cross-site script cannot forge the bearer token
cross-origin, so CSWSH is already mitigated by the token gate.
/ws/presence (PRD-008) runs a JSON challenge/response handshake, then switches
to binary pose frames (opcode 0x43) forwarded to a per-room PresenceActor.
It does not negotiate a subprotocol.
On connect the server immediately sends a challenge; the client signs the nonce
with its did:nostr key and replies:
// server → client
{ "type": "challenge", "nonce": "<hex32>", "ts": 1748500000000000 }
// client → server
{ "type": "auth", "did": "did:nostr:<pubkey>", "signature": "<hex>",
"room_id": "<room>", "metadata": { "display_name": "...", "model_uri": null } }The signature is verified by the IdentityVerifier port against the issued
nonce; the recovered did must equal the claimed one. On success the server
replies {"type":"joined", ...} with the current room roster, transitions the
session to the Joined phase, and begins forwarding pose frames. After the
handshake, binary frames are pose updates and JSON frames are room
events (join/leave).
| Constant | Value | Meaning |
|---|---|---|
HANDSHAKE_TIMEOUT |
10 s | close if auth not received in time |
HEARTBEAT_INTERVAL |
15 s | server Ping cadence |
| heartbeat deadline | 2× interval (30 s) | close if no client liveness |
RATE_LIMIT_FRAMES_PER_SEC |
120 | per-session pose-frame ceiling |
| Close code | Symbol | Cause |
|---|---|---|
| 4400 | CLOSE_CODE_VALIDATION |
malformed/out-of-phase frame, bad room_id, binary before auth |
| 4401 | CLOSE_CODE_AUTH_FAIL |
signature verification failed, did mismatch, handshake timeout |
| 4429 | CLOSE_CODE_RATE_LIMIT |
pose frame rate exceeded |
stateDiagram-v2
[*] --> Challenged: "connect → server sends challenge"
Challenged --> Joined: "valid signed auth"
Challenged --> Closed: "timeout 4401 / bad sig 4401"
Joined --> Joined: "binary pose frame (rate-limited 120/s)"
Joined --> Closed: "rate limit 4429 / validation 4400 / disconnect"
Closed --> [*]
VisionClaw WebSocket auth rests on Nostr identity. There are two shapes:
Session-token gate (/wss, /wss/agent-events, and the other /ws/*
sockets). The caller presents a Nostr session token issued by the
NIP-07/NIP-98 login flow, either as an Authorization: Bearer <token> header or
a ?token=<token> query parameter. The handler validates it with
NostrService::get_session; a valid token yields the authenticated pubkey and
power-user flag, which the session carries for filtered broadcasts (ADR-060). An
invalid or missing token returns 401. The /wss handler also stores the
HTTP-equivalent connection_url on the session so NIP-98 request-bound
validation (src/utils/nip98.rs) can verify the signed u/method tags where
required.
Signed-challenge gate (/ws/presence). Rather than a pre-issued token, the
presence socket issues a fresh nonce per connection and verifies a did:nostr
signature over it (see XR presence). This binds the
socket to a key without a separate login round-trip.
Insecure-defaults override. A connection that fails the token gate is
accepted only when ALLOW_INSECURE_DEFAULTS is set, and that branch is
compile-gated to debug_assertions / --features dev-auth builds (ADR-06).
Release binaries cannot reach it — the function is a const false stub. Never
rely on it outside local development.
NIP-98 itself (the HTTP authentication event kind) is documented as a
cross-cutting scheme in the Security model;
the token issuance flow is owned by NostrService.
Each socket runs its own keepalive cadence. All of them treat any inbound frame
(including Pong) as a liveness signal by refreshing an activity clock.
| Socket | Server ping | Idle close | Notes |
|---|---|---|---|
/wss |
every 5 s (ctx.ping) |
activity-clock based | also accepts plain "ping" text and JSON {"type":"ping"}, replies pong |
/ws/presence |
every 15 s | 30 s (2× interval) | plus 10 s handshake deadline |
/wss/agent-events |
none | transport-driven | answers client Ping with Pong |
src/utils/socket_flow_constants.rs defines HEARTBEAT_INTERVAL = 30 and
CLIENT_TIMEOUT = 60 as the contract-level keepalive budget; the position
actor's own 5 s server ping keeps connections warm well inside that window.
The shared WebSocketHeartbeat trait (src/utils/websocket_heartbeat.rs) also
piggybacks server-to-client directives on pong frames (ADR-031):
reload_config, force_full_sync, and update_available { version }. Clients
should apply and clear any directives carried in a pong.
Subprotocol negotiation is per-endpoint and intentional:
/wssechoespermessage-deflate. This is technically a compression extension, not a subprotocol, but echoing it inSec-WebSocket-Protocolkeeps the cloudflared/nginx proxy chain happy — proxies in that chain expect the server to reflect the header, and dropping it broke connections. Offer it from the client./wss/agent-eventsnegotiatesvc-agent-events.v1. This is a real, versioned application subprotocol; bumping the suffix is how the agent-events wire contract is versioned./ws/presenceand the other/ws/*sockets negotiate nothing. Their framing is fixed by the handshake, not by a subprotocol token.
WebSocket reconnection is a client responsibility; the server cooperates rather than resuming sessions transparently — there is no server-side session resume, each reconnect is a fresh socket and a fresh state sync.
Signalling a reconnect. A client that sets the X-Client-Session header on
the upgrade is flagged is_reconnection; the server reflects this in
connection_established (is_reconnection: true) and sends a "Restoring
state..." loading notice instead of the cold-start "Calculating initial
layout...". The graph and positions are re-synced either way.
Backoff. Use exponential backoff with a cap (for example 1 s doubling to a
30 s ceiling) and reset on a clean open. The /wss server actively defends
against reconnect/subscribe loops via the 2 s subscribe debounce and the
broadcast-generation collapse described above, but a well-behaved client should
not depend on those.
Re-subscription. After reconnect, re-send subscribe_position_updates. If
the binary broadcast_sequence shows a gap on resume, send
request_full_snapshot rather than accumulating against a stale base. The
desktop client decomposes this reconnect/resubscribe state machine into the
WebSocket store described in
ADR-012.
- The Binary Protocol — byte layout of binary position frames (the complement to this page).
- Reference index · Configuration · Error codes
- Security model — NIP-98 / Nostr auth, CSWSH defence, insecure-defaults gating.
- Actor hierarchy — where
SocketFlowServer,ClientCoordinatorActor, andPresenceActorsit. - XR architecture — the presence and pose
pipeline behind
/ws/presence. - Agent control surface — the agent-events channel in context.
- Governing records: ADR-059 (bidirectional agent channel), ADR-060 (pubkey-filtered broadcasts), ADR-031 (heartbeat directives).