Status: Draft Author: Architecture Audit (codex consult + qe-code-reviewer agent + cadence-reframing pass) Date: 2026-04-30 Priority: P0 — fixes per-frame waste of ~36 MB/s on a 25k-node graph; client init hangs traced to JSON bloat have a parallel cause on the WS side Related: ADR-037 (binary position protocol — to be retired), ADR-038 (push-path consolidation), ADR-050 (sovereign ownership / privacy bits), ADR-031 (broadcast backpressure), PRD-005 (v2 ontology pipeline). Supersedes ADR-037's per-node payload schema. Companion ADR: ADR-061 DDD Context: Binary Protocol Bounded Context
The WebSocket binary channel was designed for one purpose: stream per-physics-tick position and velocity from the GPU to subscribed clients at 60 Hz, identified only by node index. The on-disk justification (src/utils/binary_protocol.rs:11-31) reads:
ADR-037: literal-only, full snapshot of every node's absolute position + velocity.
Today the same channel ships 48 bytes per node, exactly double the original intent. The added 24 bytes are not extra physics — they are sticky labels and session-static type discriminators riding the 60 Hz wire:
| Current field | Bytes | Producer | Recompute cadence | Per-frame transmission justified? |
|---|---|---|---|---|
position (3×f32) |
12 | ForceComputeActor GPU kernel |
every physics tick | YES — original intent |
velocity (3×f32) |
12 | ForceComputeActor GPU kernel |
every physics tick | YES — needed for client tween between sparse broadcasts |
sssp_distance (f32) |
4 | SsspActor GPU kernel |
only when SSSP source/topology changes | NO — zero/Infinity 99% of the time |
sssp_parent (i32) |
4 | SsspActor GPU kernel |
same | NO — sticky-per-N-frames |
cluster_id (u32) |
4 | ClusteringActor (k-means on GPU) |
recompute interval ≈ seconds | NO — same value rides the wire 60×/s between recomputes |
community_id (u32) |
4 | ClusteringActor (Louvain on GPU) |
recompute interval ≈ seconds | NO — same |
anomaly_score (f32) |
4 | AnomalyDetectionActor GPU kernel |
LOF recompute interval (slow loop) | NO — same |
id_with_flags (u32) |
4 | encoder | id is per-frame; flag bits 26-31 are session-invariant | partial — keep id, demote 6 flag bits |
Plus an envelope:
[u8 version=5][u64 broadcast_sequence](9 bytes per frame — keep; backpressure replenishment relies on it viaClientBroadcastAck).
On a 25 k-node graph at 30 Hz steady-state:
- 20 redundant bytes/node × 25 000 × 30 Hz = ≈ 18 MB/s of wire spend on values that did not change.
- Browser-side, every frame the decoder runs through the analytics offsets in
parseBinaryFrameDatafor fields it then re-stores at the same index of the same side-table — wasted main-thread cycles 30 times a second.
The binary_protocol.rs codebase carries:
PROTOCOL_V3— the original 48-byte node formatPROTOCOL_V4— delta encoding (retired 2026-04-20 per ADR-037)PROTOCOL_V5— V3 nodes wrapped with an 8-byte broadcast sequence (current default)- legacy / "extended" / "with-sssp" / "with-privacy" flavours of
encode_node_data_*
The client decoder mirrors this with branches at binaryProtocol.ts:421-457. Version proliferation has become a soft cost: every new contributor reads the matrix of code paths, every change has to choose a flavour, every test fixture has to declare which one it speaks.
The user directive (2026-04-30) is to stop framing this as a "versioning" problem. There will be one binary protocol. It is the GPU's per-frame stream. New analytics outputs go on a different channel. Code, ADRs, tests, and docs unify on that single name.
| # | Goal | Success Metric |
|---|---|---|
| G1 | Reduce per-node wire payload to its original intent | Per-node binary frame is 28 bytes (id + position + velocity) |
| G2 | Move sticky GPU outputs to an on-change side stream — same producer, different cadence | ClusteringActor, AnomalyDetectionActor, SsspActor push their results only on recompute completion. No analytics field appears in the per-frame stream. |
| G3 | Remove version numbering from the protocol surface | No V3/V4/V5 tokens in src/utils/binary_protocol.rs, in client decoder, in test names, or in current ADRs. The wire format byte is removed (or fixed and undocumented). One name: "binary protocol." |
| G4 | Demote session-invariant type/visibility discriminators to JSON init | Bits 26-31 of id-with-flags removed from the wire. Node type / privacy state ride /api/graph/data's JSON response once at init; client side-table caches them. |
| G5 | Preserve all current observable behaviour | No regression in: physics live updates, drag pinning, settings-driven reheat, ADR-050 anonymous-viewer opacification, client-side cluster overlays, SSSP overlays, anomaly highlights. |
| G6 | Wire-cost telemetry is in place | A counter exposes bytes_per_frame_per_client and analytics_updates_per_minute so we can confirm the saving in production. |
- No rewrite of GPU kernels. Producers stay where they are.
- No change to the broadcast-sequence/ack mechanism. The envelope's 9-byte header is kept.
- No introduction of delta/diff encoding. The original "literal-only, full snapshot" property is preserved for the per-frame stream.
- No change to the polling REST path (
/api/graph/data). It already gets the slow-path JSON treatment. - No new client framework. The existing
binaryProtocol.tsdecoder is simplified, not replaced.
Frame layout:
[u8 preamble = 0x42] ← "B" — fixed, used only for sanity, not version dispatch
[u64 broadcast_sequence_LE] ← preserved from prior envelope; backpressure ack key
[N × Node]
Per-Node layout (28 bytes, fixed):
[u32 id_LE] ← raw node id; no flag bits
[f32 x_LE]
[f32 y_LE]
[f32 z_LE]
[f32 vx_LE]
[f32 vy_LE]
[f32 vz_LE]
Notes:
- The preamble byte stays at 0x42 forever. It is not a version. If the protocol ever needs to evolve it does so via a new endpoint, not a version byte.
- The broadcast-sequence is the only acked unit.
ClientBroadcastAcksemantics are unchanged. - ADR-050 privacy: the wire id is the raw id. Anonymous viewers receive no position frames for nodes they cannot see (server-side filter in
ClientCoordinator::broadcast_with_filter). Bit-29 opacification is no longer needed in the per-frame stream because per-node visibility was always known at init time and is enforced by the per-client filter.
A separate WebSocket message type, sent by the GPU-side actors only when their compute kernel completes:
Message layout (text or binary; recommend binary for size at scale):
type: "analytics_update"
source: "clustering" | "community" | "anomaly" | "sssp"
generation: u64 ← monotonic, for client merge-ordering
entries: [
{ id: u32,
cluster_id?: u32, ← only when source == "clustering"
community_id?: u32, ← only when source == "community"
anomaly_score?: f32, ← only when source == "anomaly"
sssp_distance?: f32, ← only when source == "sssp"
sssp_parent?: i32 ← only when source == "sssp"
}, …
]
Cadence: 0.1 – 1 Hz for clustering/community, on demand for SSSP, slow loop for anomaly. Never per physics tick.
Client behaviour: maintain a side-table analyticsByNodeId: Map<u32, AnalyticsRow> populated from analytics_update messages. Renderers (cluster hulls, anomaly overlay, SSSP gradients) read from the side-table, not from per-frame data.
| Field | New home | Cadence |
|---|---|---|
sssp_distance, sssp_parent |
analytics_update source=sssp |
on SSSP recompute |
cluster_id |
analytics_update source=clustering |
on k-means recompute |
community_id |
analytics_update source=community |
on Louvain recompute |
anomaly_score |
analytics_update source=anomaly |
on LOF recompute |
| Type flags (agent / knowledge / private / ontology-subtype) bits 26-31 | /api/graph/data JSON node.node_type (already present) |
once, at init |
| File | Change |
|---|---|
src/utils/binary_protocol.rs |
Replace PROTOCOL_V3/V4/V5 constants, the encode_node_data_* family, and the flag-bit constants. Single encode_position_frame(positions: &[(u32, BinaryNodeData)], broadcast_sequence: u64) -> Vec<u8>. 28 B/node fixed. No flags. |
src/actors/client_coordinator_actor.rs::serialize_positions (~line 472) |
Drop the private_opaque_ids, analytics_data, and node_type_arrays parameters from the per-frame path. Per-client visibility enforcement moves up one level — the client filter already drops invisible nodes from positions before serialise. |
src/actors/gpu/clustering_actor.rs |
On k-means / Louvain completion, emit `analytics_update{source:"clustering" |
src/actors/gpu/anomaly_detection_actor.rs |
Same shape, source=anomaly. |
src/actors/gpu/sssp_actor.rs (or wherever SSSP completes) |
Same shape, source=sssp. Triggered by the SSSP-mode toggle, not by physics ticks. |
src/handlers/socket_flow_handler/position_updates.rs::handle_subscribe_position_updates (~line 474) |
Remove min_interval_ms and requests_per_minute from the subscription_confirmed payload (zero client consumers per audit). |
src/handlers/socket_flow_handler/types.rs |
Subscription state no longer needs subscribed_node_types filter — node types are session-static and resolved at JSON init. |
src/utils/binary_protocol.rs tests |
Replace V3/V5 fixture tests with a single "binary_protocol" round-trip test. |
tests/integration/binary_protocol_*.rs |
Rename + simplify; one happy path, one "anonymous viewer sees no private positions" path. |
| File | Change |
|---|---|
client/src/store/websocket/binaryProtocol.ts |
Delete PROTOCOL_V3/V5 branches. processBinaryData becomes one path: validate preamble byte, read broadcast_sequence, iterate 28 B/node. The flag-bit decode loop and currentNodeTypeMap builder go away. |
client/src/types/binaryProtocol.ts |
Same — collapse the type matrix to one node shape. |
client/src/store/websocket/index.ts (message dispatch) |
Add analytics_update text-message handler that merges into a useAnalyticsStore Zustand slice. |
client/src/features/graph/components/ClusterHulls.tsx and similar |
Read cluster_id / community_id / anomaly_score from the analytics store, not from per-node parsed binary fields. |
client/src/features/graph/managers/graphDataManager.ts |
Per-frame decoded data simplifies to Map<u32, {x,y,z,vx,vy,vz}>. Drop the analytics-side-effect path. |
client/src/features/graph/components/InstancedLabels.tsx |
Read nodeType from the JSON-init side-table by id, not from per-frame flag bits. |
| Document | Change |
|---|---|
docs/adr/ADR-037-binary-position-protocol.md |
Mark Superseded by PRD-007. Move historical content to a "Historical context" section. |
docs/adr/ADR-038-physics-push-path.md (if exists) |
Refresh to reference binary_protocol (no version), drop V5 mentions. |
docs/adr/ADR-050-sovereign-ownership.md |
Section H2: replace "bit-29 opacification on the wire" with "filtered out of the per-frame stream by ClientCoordinator::broadcast_with_filter." Same outcome, simpler implementation. |
CLAUDE.md, multi-agent-docker/CLAUDE.md, client/.../README.md |
Replace any V3/V4/V5 references with the binary protocol. Single name throughout. |
client/src/store/websocket/binaryProtocol.ts JSDoc |
Update the doc-comment that currently lists three protocol versions. |
agentbox/docs/... references |
Audit; any cross-link that says "VisionClaw V5 wire" gets the same treatment. |
Per the velocity directive (2026-04-30), all changes ship in ONE sprint, ONE landing. No phasing, no fallback layer. The risk of an analytics renderer not migrating cleanly is mitigated by the parallel-stream agent topology described below — every renderer gets migrated in lockstep with the wire trim. If a renderer can't read from the analytics store, the sprint doesn't land.
| Stream | Owner | Deliverable |
|---|---|---|
| A. Rust wire | server agent | binary_protocol.rs collapsed to single encode_position_frame(positions, broadcast_sequence) -> Vec<u8>; flag-bit constants removed; legacy encode_node_data_* family deleted; new BroadcastAnalyticsUpdate actor message + serialiser |
| B. Rust analytics emitters | server agent | ClusteringActor / AnomalyDetectionActor / SsspActor emit BroadcastAnalyticsUpdate on recompute completion only; remove their writes to per-node binary columns |
| C. TypeScript client | client agent | binaryProtocol.ts decoder reduced to one path (28 B/node); analytics_update text-message handler; new useAnalyticsStore Zustand slice; renderers (ClusterHulls, anomaly overlay, SSSP gradients, InstancedLabels for nodeType) read from store/JSON-init not per-frame |
| D. Tests | qe agent | Replace V3/V5 fixtures with one binary_protocol_roundtrip_test; client decoder unit tests; renderer integration tests verifying side-table reads; one E2E smoke proving cluster hulls + anomaly overlay still render after wire trim |
| E. Doc consolidation | doc agent | Mark ADR-037 Superseded by ADR-061; remove every V3/V4/V5 token from src/, client/src/, docs/, tests/; new docs/binary-protocol.md single-source spec; CLAUDE.md / README updates; JSDoc cleanup |
The streams are parallel but the merge has one ordering constraint:
analytics emitters (B) must reach main before — or with — the wire trim
(A), so that even a brief in-flight build never has a renderer reading
analytics columns that the wire no longer provides. The sprint coordinator
gates merge on (A+B+C+D+E green) as a single atomic landing. Branch is
feat/binary-protocol-unification.
Before the sprint can claim success:
cargo build --lib && cargo testgreen; the binary protocol roundtrip test exercises the new 28 B/node format end-to-end.cargo grep -r "PROTOCOL_V3\|PROTOCOL_V4\|PROTOCOL_V5\|encode_node_data_with_"returns zero matches insrc/.tsc --noEmit && jestgreen;grep -rE "PROTOCOL_V[345]|BINARY_NODE_SIZE_V" client/srcreturns zero matches.- Browser smoke: hard-refresh against a freshly-rebuilt backend. Per-node wire payload measured at exactly 28 bytes via DevTools network inspection of the WS frames; cluster hulls / anomaly overlay / SSSP overlay all render correctly; settings-driven reheat still works.
- Telemetry counter
bytes_per_frame_per_clientreports ≈9 + 28*Non a connected client.
Confirm production shows predicted wire-cost reduction (≈18 MB/s saved at 25 k nodes / 30 Hz). If the saving is materially less than predicted, follow-up ADR investigates.
| # | Risk | Mitigation |
|---|---|---|
| R1 | Renderers that silently relied on per-frame analytics columns break when those columns disappear | Phase 1 ships the side stream first; Phase 2 only proceeds after renderers are confirmed to read from the store. Add a Phase-1 deprecation warning in client logs whenever a renderer reads analytics from per-frame data. |
| R2 | The broadcast_sequence envelope removal accidentally lands as part of the cleanup |
Explicit non-goal in §3. Tests should pin the envelope shape. |
| R3 | Anonymous-viewer behaviour shifts — ADR-050 H2 currently relies on bit-29 opacification at the encoder | The new model enforces visibility at the per-client filter (drop the position from the frame entirely for non-owners). Outcome is the same: anonymous viewer sees nothing for private nodes. Update ADR-050 H2 prose. |
| R4 | The analytics_update cadence is mis-tuned and floods clients |
Add a per-source rate limit (max 1/sec for clustering, max 1/min for anomaly) and a server-side coalesce. |
| R5 | Existing serialised binary fixtures or snapshot tests break en masse | Phase 2 PR rewrites them in lockstep; CI must go green before merge. |
| R6 | Old client (out of date) reconnects expecting V5 envelope | The preamble byte (0x42) differs from V3 (3) and V5 (5). Old clients see "invalid version" warning and reconnect attempts fail closed. Acceptable — clients are pinned by deployment. |
- Wire: per-node payload is 28 bytes, every frame, every client. No exceptions.
- Code: zero matches for
PROTOCOL_V3,PROTOCOL_V4,PROTOCOL_V5,binary_protocol_vinsrc/,client/src/,docs/,tests/. - Docs: every reference to "V3" / "V5" wire format in CLAUDE.md / README.md / ADRs has been removed or relegated to a "Historical context" subsection.
- Behaviour: at 25 k nodes / 30 Hz, the browser shows physics, tweening, cluster hulls, anomaly overlays, and SSSP overlays correctly; client init does not hang; settings-driven reheat propagates as before.
- Telemetry:
bytes_per_frame_per_clientaverages around9 + 28*N(where N is the visible node count for that client).analytics_updates_per_minuteis bounded.
- Delta encoding of the per-frame stream (revisit only if profiling shows wire is still the bottleneck).
- Promotion of the analytics-update channel to its own protocol with its own ADR.
- A binary serialisation for the JSON-init payload (current 3 MB at 22 k nodes is not a problem post the metadata-allowlist fix).
- Migrating
/api/bots/*polling to the analytics-update channel (ADR-038 ground; tracked separately).
- Is there any current consumer of
cluster_id/community_idper-frame that requires per-frame freshness rather than per-recompute? If yes, name it; the rate-limit on the side stream may need raising. - Should the analytics-update message use binary or JSON? JSON is simpler; binary is ~3-5× smaller. Recommend binary if the analytics store updates are expected at >1 Hz.
- Is the preamble byte
0x42worth keeping at all, or do we just declare the WS subprotocol and let the absence of versioning be load-bearing? (Recommend keeping it as a one-byte sanity check for malformed frames.)