Agent-First Web DAW Jam Room — v5 (Production-Audio Grade)

A real-time collaborative music environment where human musicians and AI agents are equal peers, connected via WebSocket to a shared jam room. All communication is lightweight JSON note events ("Cloud MIDI") — no audio transmitted.

Important

v5 Upgrade: This version incorporates expert-level audio engineering critiques (Phase 10) that address hidden physical constraints in distributed real-time audio: SNTP clock sync, React thread starvation, DO I/O thrashing, live input delay, hanging notes, and LLM agent ergonomics. These mitigations are folded into every relevant section below.

Phase 1: Problem Space

Goal: Enable humans and AI agents to jam together in real-time by exchanging symbolic note events over WebSocket, with each client synthesizing audio locally.

Hard Constraints:

Solo founder, no team — must be buildable in <2 weeks
Cloudflare Workers ecosystem (existing muscle memory)
Zero infrastructure cost at idle (Hibernation API)
Browser-only for humans (no desktop app)
No audio transmission — JSON events only

Primary Read Path: A peer connects to a room and receives (1) current room state (BPM, key), (2) peer list, (3) real-time stream of note events from all other peers.

Phase 2: Critique & Destroy 🔥

#	Criticism	Severity	Mitigation
1	Clock Skew (SNTP + Audio Anchor) — `beat_time` depends on synchronized time, but naive `clockOffset = serverTime - localTime` ignores Network Round-Trip Time (RTT). A 60ms one-way packet delay makes the offset permanently wrong by 60ms — audible as drum flams. Furthermore, `Date.now()` (OS system clock) and `Tone.now()` (hardware audio clock) tick at different rates and will drift apart over a 20-minute jam session.	🔴 Critical	True SNTP Ping-Pong Handshake: (1) Client sends ping with `t0` = `performance.now()`. (2) DO responds with `t1` (receive time) and `t2` (send time). (3) Client receives at `t3`. (4) `TrueOffset = ((t1 - t0) + (t2 - t3)) / 2`. Repeat every 30s to track drift. Audio Clock Anchor: Map the synchronized server `beat_time` directly to `Tone.now()`. Never use `Date.now()` for audio scheduling. All scheduling math uses `Tone.Transport.seconds` as the single source of truth. Encapsulated in `src/shared/utils/clock-sync.ts`.
2	DO is SPOF per room — if DO crashes, all peers disconnect and in-memory state is lost.	🟡 Medium	Clients auto-reconnect with exponential backoff. DO re-instantiates from persisted storage. `sequences` survive in `ctx.storage`. Transport state persisted on every mutation.
3	No auth in MVP — any peer can impersonate any `peer_id`, send garbage, or spam notes.	🟡 Medium	Acceptable for MVP (prototype). Phase 2 adds Bearer token for agents + rate limiting (max 100 note events/sec per peer). Server assigns `peer_id` — clients cannot choose their own.
4	Broadcast O(N) — at 20 peers × 10 notes/sec = 200 msgs/sec, DO broadcasts 200 × 19 = 3,800 sends/sec. Single-thread blocks inbound processing during broadcast.	🟡 Medium	DO soft limit is 1000 inbound req/s. Outbound `ws.send()` is not rate-limited. At 20 peers this is fine. At 50+, shard rooms into sub-rooms or use fan-out Worker.
5	Patch consistency — peers use different Tone.js versions or configurations, causing different sounds for same note events.	🟢 Low	MVP uses one hardcoded PolySynth patch. Server broadcasts `patch_config` in `room_state`. Future: hash-verified patch distribution.
6	React Main Thread Starvation — piping 50–100 incoming WebSocket note events per second into React state (`setNotes([...notes, newNote])`) triggers constant reconciliation cycles. This causes heavy Garbage Collection (GC) spikes, blocking the browser's main thread. A blocked main thread causes Tone.js to instantly glitch, crackle, and stutter.	🔴 Critical	Decouple hot-path from React entirely. (1) Audio Path: Route WS `onmessage` directly to a vanilla TypeScript `AudioEngine` class that triggers Tone.js — zero React involvement. (2) Visual Path: Push note data into a mutable `useRef`. Have `Visualizer.tsx` read from the ref using a native `requestAnimationFrame` canvas loop. React remains entirely ignorant of the real-time event loop. Encapsulated in `src/client/features/synth/model/audio-engine.ts`.
7	DO Serialization & I/O Thrashing — if 10 peers send 10 notes per second = 100 SQLite inserts/sec. Synchronous disk writes or `JSON.parse()` 100 times/sec inside the DO's single event loop introduces micro-stutters to the WS relay, ruining the tightness of the master clock.	🟡 Medium	Zero-Cost Router + Batch Persist. (1) Hot Path: Treat the DO as a "dumb router" for note relays. String-match `"type":"note_event"` and instantly `ws.send(rawString)` to other peers — no `JSON.parse()`. (2) Cold Path: Push raw strings into an in-memory array. Use `ctx.waitUntil()` or a `setInterval` to parse and bulk-insert the array to SQLite every 5 seconds via `flushToSQLite()`.
8	Live Input Playout Delay Paradox — Lookahead scheduling (`beat_time: 4.5`) is perfect for AI agents, but humans play live. If a human hits a key on Beat 4.0 and broadcasts `beat_time: 4.0`, it takes ~40ms to reach remote peers. Remote peers receive the note in the past (at beat 4.0 + 40ms). Tone.js will either drop it or play it out of phase.	🟡 Medium	Hybrid Playout Delay. Local playback is instant (zero latency for the performer). For broadcast: schedule the note at `Tone.now() + playoutDelay` (configurable, default 40ms) and send that exact future timestamp to the DO. Remote peers schedule at the received timestamp. This sacrifices perfect global sync by ~40ms but preserves instant local feedback — the standard approach used by every professional DAW for monitoring latency.
9	Hanging Notes / MIDI Panic — if a human or AI sends `note_on` but their Wi-Fi drops or script crashes before `note_off`, that note drones indefinitely in every other peer's speakers, ruining the jam.	🟡 Medium	DO Connection Lifecycle Cleanup. (1) DO maintains `activeNotes: Map<WebSocket, Set<string>>` tracking which pitches each peer has active. (2) On native Cloudflare `webSocketClose` and `webSocketError` events, DO automatically broadcasts fabricated `note_off` events for all of that peer's active pitches. (3) Client-side safety: auto-release any note held longer than 30 seconds.

Phase 3: Trade-off Matrices 📊

Real-Time Transport

Criteria	WebSocket (via DO)	WebRTC Data Channel	Server-Sent Events
Latency	20-80ms (via CF edge)	5-30ms (P2P direct)	50-200ms (unidirectional)
Topology	Star (DO is hub)	Mesh (N² connections)	Star (server push only)
Bidirectional	✅	✅	❌ (server→client only)
Agent compatibility	✅ (any WS client)	❌ (needs browser/WebRTC stack)	⚠️ (send via HTTP POST)
State management	✅ (DO is state authority)	❌ (no central state)	⚠️ (separate state server)
Complexity	Low	High (signaling, NAT traversal, STUN/TURN)	Low
Max peers per room	~50 (DO limit)	~6-8 (mesh degrades)	Unlimited (read-only)
Cost at idle	$0 (Hibernation API)	$0 (P2P)	N/A

Recommendation: WebSocket via DO. We're sending ~150-byte JSON events, not audio. The 20-80ms latency is absorbed by lookahead scheduling. P2P mesh breaks at >8 peers and can't accommodate headless agents. The DO gives us free state authority.

State Manager

Criteria	Durable Objects	Upstash Redis	Supabase Realtime
Latency	<10ms (same-origin)	20-50ms (external)	50-100ms (Postgres)
WebSocket native	✅ (Hibernation API)	❌ (need WS proxy)	✅ (channels)
Strong consistency	✅ (single-thread)	⚠️ (eventual)	⚠️ (Postgres MVCC)
Cost at idle	$0	$0 (free tier)	$0 (free tier)
Cost at 10K MAU	~$5/mo	~$10/mo	~$25/mo
Vendor lock-in	Medium (CF only)	Low	Medium
Agent-first API	✅ (WS upgrade)	❌ (need custom)	⚠️ (REST/Realtime)

Recommendation: Durable Objects. Native WebSocket + state in one primitive. Single-thread = no race conditions. Hibernation = zero cost idle. No external dependency.

Synth Engine

Criteria	Tone.js v15	Elementary Audio	Raw Web Audio API
Bundle size	~150KB gzipped	~80KB	0KB (native)
PolySynth built-in	✅	❌ (build from primitives)	❌
Scheduling (Transport)	✅ (Tone.Transport)	❌ (manual)	❌ (manual)
GitHub stars	13K+	1.5K	N/A
Learning curve	Low	Medium	High
AudioWorklet DSP	⚠️ (some support)	✅ (core design)	✅ (native)

Recommendation: Tone.js for MVP. Built-in PolySynth, Transport, and scheduling. Swap to Elementary Audio for custom DSP in Phase 2 if needed.

Clock Synchronization (v5 — NEW)

Criteria	Naive Offset (`serverTime - localTime`)	SNTP Ping-Pong + Audio Anchor	NTP Library (e.g., `ntp-client`)
RTT compensation	❌ (ignores network delay entirely)	✅ (full round-trip calculation)	✅ (full NTP algorithm)
Audio clock alignment	❌ (uses `Date.now()`, drifts from `Tone.now()`)	✅ (anchored to `AudioContext.currentTime`)	❌ (system clock only)
Implementation complexity	Trivial (~5 LOC)	Moderate (~80 LOC)	High (external dependency)
Works in browser	✅	✅	❌ (needs UDP, blocked in browser)
Drift correction over time	❌ (one-shot)	✅ (periodic re-sync every 30s)	✅
Precision for music	❌ (~60ms error possible)	✅ (<5ms error after calibration)	N/A (not browser-compatible)

Recommendation: SNTP Ping-Pong + Audio Anchor. The only approach that compensates for RTT AND anchors to the hardware audio clock. Critical for maintaining rhythmic integrity across a 20-minute session.

Audio Architecture (v5 — NEW)

Criteria	React State-Driven (`useState` per note)	Vanilla TS AudioEngine (decoupled)
GC pressure at 100 events/sec	🔴 Severe (constant re-renders, array allocations)	🟢 None (no React involvement)
Tone.js scheduling reliability	🔴 Glitches when main thread blocked by React reconciliation	🟢 Runs independently of React lifecycle
Visualizer performance	🟡 React re-renders throttled by event volume	🟢 `requestAnimationFrame` reads from mutable ref
Code complexity	Low (familiar React patterns)	Medium (two-layer architecture: AudioEngine + React wrapper)
Testability	Medium	High (AudioEngine is pure TS, unit-testable without DOM)

Recommendation: Vanilla TS AudioEngine. The performance characteristics of real-time audio are fundamentally incompatible with React's reconciliation model at high event rates. The AudioEngine class handles the hot path; React handles the cold path (UI controls, peer list, transport bar).

Phase 4: Constraints & Limits Audit 🚧

Service	Hard Limit	Escape Hatch
DO: SQLite storage	10 GB per DO (paid), unlimited per account	More than enough — 10K events ≈ 1.5MB
DO: KV key+value	2 MB combined (SQLite-backed)	Use SQL API instead for sequences
DO: SQL row/blob	2 MB max per row	Each note event JSON ≈ 150 bytes
DO: Inbound requests	~1000 req/s (20 WS msgs = 1 req)	Shard rooms at 50+ peers
DO: WS message size	32 MiB max	Note events are ~150 bytes — no risk
DO: Single-thread I/O	One event loop — synchronous disk writes or `JSON.parse()` at high frequency will block WS relay	Hot-path router: string-match note events and relay raw strings without parsing. Batch persist to SQLite every 5 seconds via `flushToSQLite()`.
Worker: CPU time	30s paid / 10ms free	Note relay is sub-1ms
Worker: Memory	128 MB	DO state is <1MB even with 10K events
Tone.js: AudioContext	6 per page (browser)	We use 1
Web Audio: Latency	~128 samples minimum (~3ms at 44.1kHz)	Acceptable for music
Web Audio: Clock drift	`AudioContext.currentTime` drifts from `Date.now()` at ~1ms/min	Never use `Date.now()` for audio scheduling — anchor all beat math to `Tone.now()` via `clock-sync.ts`
React: Main thread budget	~16ms per frame at 60fps — React reconciliation on high-frequency state updates consumes this budget	Route WS audio events directly to `AudioEngine` class — React never sees note events
WebSocket: Free tier	100K requests/day	MVP usage << 100K

Note

Storage clarified from official docs: SQLite-backed DOs (new default) have 2 MB key+value combined and 10 GB SQLite storage per DO. The 128 KiB limit only applies to legacy KV-backed DOs. The 32 KiB figure previously cited was incorrect. We use ctx.storage.sql — no chunking needed.

Warning

v5 Audio Physics Constraints: Three constraints are invisible in standard web architectures but critical for real-time audio: (1) OS clock vs audio clock drift, (2) React GC spikes blocking the audio thread, (3) DO event loop stalls from synchronous I/O. All three are mitigated in this architecture — see Phase 2 items 1, 6, and 7.

Phase 5: UI Data Access Pattern 🖥️

Page	Data Needed	Source	Query Pattern
Landing	None (room ID input)	N/A	N/A
Jam Room	Room state, peer list, note stream	WebSocket	Single persistent connection — all data pushed

Ideal query in plain English: "Connect to room X and continuously receive BPM, key, who's connected, and every note anyone plays — in real-time, ordered by beat position."

Schema derivation: No traditional database. DO in-memory state IS the schema:

// This is the "database" — lives in DO memory + ctx.storage
roomState: { bpm, key, transportStartTime, serverTime }
peers: Map<peerId, { name, kind, instrument, color }>
sequences: NoteEvent[]  // ring buffer, persisted in SQLite table (batch-flushed every 5s)
activeNotes: Map<WebSocket, Set<string>>  // v5: tracks active pitches per peer for MIDI panic
pendingFlush: string[]  // v5: raw event strings awaiting batch SQLite insert

N+1 risk: None. All data arrives via single WebSocket. No sub-queries.

Data flow architecture (v5):

WebSocket onmessage
    ├── Hot Path (note_event) ──→ string-match, relay raw to peers (no JSON.parse)
    │                           └──→ push raw string to pendingFlush[]
    └── Cold Path (all other) ──→ JSON.parse ──→ state mutation ──→ broadcast

Client-side:
WebSocket onmessage
    ├── Audio Path ──→ AudioEngine.playNote() ──→ Tone.js (bypasses React entirely)
    └── Visual Path ──→ noteBufferRef.current.push() ──→ rAF canvas loop reads ref

Phase 6: Failure Mode Analysis 💀

Component	Failure Mode	User Impact	Mitigation	Recovery
DO instance	Crash / eviction	All peers disconnect	Auto-reconnect with backoff	DO re-instantiates, loads state from storage
WebSocket	Network drop	Single peer loses connection	Client reconnect + re-hydrate from DO	Peer rejoins, gets `room_state` + `sequences`
Tone.js	AudioContext blocked	No sound	"Click to start" button (browser requirement)	User gesture triggers `Tone.start()`
Agent	Script crash	Agent stops playing, others unaffected	Agent-side reconnect logic	Agent rejoins room
BPM change mid-playback	Timeline drift	Notes play at wrong time	Recalculate `transportStartTime`	All peers receive corrected `room_state`
Clock skew (SNTP)	`beat_time` drift between peers	Notes slightly out of sync	SNTP ping-pong re-sync every 30s, anchored to `Tone.now()`	Client recalculates `TrueOffset` on each sync cycle
Clock drift (Audio vs OS)	`Date.now()` and `Tone.now()` diverge over 20-min session	Gradually worsening timing errors	Never use `Date.now()` for scheduling — all beat math anchored to `AudioContext.currentTime` via `clock-sync.ts`	Periodic re-anchor on each SNTP sync
Malicious peer	Spam 1000 notes/sec	Room flooded	Server-side rate limit (100 events/sec/peer)	Excess events dropped, peer warned
React GC spike	Main thread blocked >16ms during high note volume	Tone.js glitches, audio crackle, visual stutter	Audio path fully decoupled from React — `AudioEngine` class handles Tone.js directly; visualizer reads from `useRef` via `requestAnimationFrame`	No recovery needed — architecture prevents the failure mode
Hanging note (MIDI panic)	Peer sends `note_on` then disconnects before `note_off` (Wi-Fi drop, script crash, tab close)	Note drones indefinitely in all other peers' speakers	DO maintains `activeNotes: Map<WebSocket, Set<string>>`. On `webSocketClose`/`webSocketError`, DO broadcasts fabricated `note_off` for all of that peer's active pitches. Client-side safety: auto-release notes held >30s.	Immediate cleanup — other peers hear the note stop within one WS broadcast cycle
DO I/O thrashing	100+ SQLite inserts/sec from 10 peers × 10 notes/sec blocks WS relay	Micro-stutters in master clock, notes arrive late	Hot-path router: note events relayed as raw strings without `JSON.parse()`. `pendingFlush[]` batch-inserts to SQLite every 5 seconds.	No recovery needed — architecture prevents the failure mode

Phase 7: Migration & Rollback Plan 🔄

Migration: Greenfield project — no existing state to migrate. Deploy creates new DO namespace.

Rollback:

Branch: feat/web-daw-jam-room → PR to staging
Revert: git revert merge commit. No persistent state affected (rooms are ephemeral).
DO storage: rooms die when empty — no orphaned data.

Canary Criteria:

WebSocket connection success rate > 99%
Note event round-trip < 150ms (p95)
SNTP clock sync accuracy < 10ms drift (p95)
Zero audio glitches during 5-minute sustained 4-peer jam test
Zero 5xx errors in Worker logs for 30 minutes post-deploy

Phase 8: Architecture Decision Record

Context

Building an agent-first collaborative music environment. Need real-time bidirectional communication between humans (browsers) and AI agents (headless scripts) with consistent state management and zero idle cost.

Decision

Use Cloudflare Durable Objects with Hibernation WebSocket API as the central hub. Star topology with DO as state authority. JSON "Cloud MIDI" protocol with beat_time scheduling anchored to AudioContext.currentTime via SNTP ping-pong sync. Tone.js for client-side audio synthesis routed through a vanilla TypeScript AudioEngine class decoupled from React's render cycle. No audio transmission.

Trade-offs Accepted

Higher latency than WebRTC (20-80ms vs 5-30ms) — acceptable because we're sending symbolic events, not audio. Lookahead scheduling absorbs jitter.
Single-thread bottleneck at scale — DO caps around 50 active peers. Acceptable for MVP. Escape: shard into sub-rooms.
No preset ecosystem — single hardcoded synth patch. Acceptable for prototype.
Hybrid playout delay (v5) — local playback is instant, but broadcast is delayed by ~40ms to ensure remote peers can schedule notes in the future rather than the past. Trades perfect global sync for natural local feedback. This is the standard approach used by professional DAWs and networked game engines.
Two-layer client architecture (v5) — vanilla TS AudioEngine handles the real-time hot path (Tone.js scheduling, note triggering) while React handles the cold path (UI controls, peer list, transport bar). Adds architectural complexity but eliminates the fundamental incompatibility between React's reconciliation model and real-time audio at high event rates.
DO hot-path routing (v5) — note events are relayed as raw strings without JSON.parse() to avoid blocking the event loop. SQLite persistence is batch-flushed every 5 seconds. Trades strict per-event durability for relay performance. Acceptable because note events are ephemeral — if the DO crashes, the room re-initializes and peers reconnect.

Limits & Risks

DO SQLite storage: 10GB per DO, 2MB per row — more than sufficient. No chunking needed.
SNTP clock sync → periodic ping-pong handshake every 30s, TrueOffset anchored to Tone.now()
Audio vs OS clock drift → Date.now() banned from audio scheduling; all beat math uses AudioContext.currentTime
No auth in MVP → server-assigned peer_id, rate limiting in Phase 2

Alternatives Considered

WebRTC Data Channels: Lower latency but mesh topology degrades at >8 peers and headless agents need full WebRTC stack. Rejected.
Upstash Redis: External dependency, no native WebSocket, eventual consistency. Rejected.
Supabase Realtime: Higher latency, Postgres overhead for ephemeral event relay. Rejected.
React state for audio events: Familiar pattern but fundamentally incompatible with real-time audio at >50 events/sec due to GC pressure and main thread starvation. Rejected.
Naive Date.now() clock sync: Ignores RTT and drifts from audio clock. Rejected in favor of SNTP + Audio Anchor.

Phase 9: Iterative Refinement

9a: Cross-System Validation

No conflicts with existing systems (empty workspace).
Synergy: User's shader physics skills directly transferable to WebGL audio visualization.
Enables: Future Agent Hub integration — agents connect to jam rooms via the same API they'd use for any other tool.

9b: Compliance Audit

Rule	Status	Notes
FSD layer hierarchy	✅ Pass	`shared/` → `features/` → `App.tsx`
Barrel file enforcement	✅ Pass	Every slice has `index.ts`
300 LOC limit	✅ Pass	Largest file ~200 LOC (`jam-room.ts` with hot-path router)
TypeScript strict (no `any`)	✅ Pass	All types defined in protocol
Zod at network boundary	✅ Pass	All inbound WS messages validated (cold path only — hot path uses string-match)
Conventional commits	✅ Pass	Branch + PR workflow
PascalCase components, kebab-case files	✅ Pass	Per file inventory
Unified Worker pattern	✅ Pass	`wrangler.jsonc` with assets + DO
React decoupled from audio hot-path	✅ Pass	`AudioEngine` is vanilla TS — no React imports
Clock sync anchored to AudioContext	✅ Pass	`clock-sync.ts` uses `Tone.now()`, never `Date.now()`
DO I/O batched	✅ Pass	`flushToSQLite()` runs every 5s via `setInterval`
MIDI panic on peer disconnect	✅ Pass	`webSocketClose` handler broadcasts `note_off` for all active pitches

9c: File Inventory (24 files)

#	Action	Path	LOC	Notes
1	NEW	`package.json`	35
2	NEW	`tsconfig.json`	20
3	NEW	`wrangler.jsonc`	20
4	NEW	`vite.config.ts`	25
5	NEW	`src/shared/protocol/types.ts`	100	v5: added `ClockSyncPing`, `ClockSyncPong`, `NoteOff` (MIDI panic), `MeasureComplete` types
6	NEW	`src/shared/protocol/constants.ts`	45	v5: added `PLAYOUT_DELAY_MS`, `CLOCK_SYNC_INTERVAL_MS`, `FLUSH_INTERVAL_MS`, `MAX_NOTE_HOLD_S`
7	NEW	`src/shared/protocol/index.ts`	5
8	NEW	`src/shared/utils/clock-sync.ts`	80	v5 NEW — SNTP ping-pong RTT calculation, `TrueOffset` computation, periodic re-sync, `getTrueNetworkTime()` function, anchored to `Tone.now()`
9	NEW	`src/server/index.ts`	25
10	NEW	`src/server/jam-room.ts`	250	v5: expanded with hot-path string-match router, `pendingFlush[]` array, `flushToSQLite()` debounced batch insert, `activeNotes` map, `webSocketClose`/`webSocketError` MIDI panic handler
11	NEW	`src/server/env.ts`	10
12	NEW	`src/client/main.tsx`	15
13	NEW	`src/client/App.tsx`	60
14	NEW	`src/client/features/room/model/use-room-socket.ts`	130	v5: routes note events directly to `AudioEngine` (not React state). Only cold-path events (peer list, room state, transport) update React state.
15	NEW	`src/client/features/room/ui/JamRoom.tsx`	100
16	NEW	`src/client/features/room/ui/PeerList.tsx`	50
17	NEW	`src/client/features/room/ui/TransportBar.tsx`	80
18	NEW	`src/client/features/room/index.ts`	5
19	NEW	`src/client/features/synth/model/audio-engine.ts`	150	v5 NEW — Pure vanilla TS class (no React imports). Manages Tone.js PolySynth, handles playout delay scheduling (`Tone.now() + PLAYOUT_DELAY_MS`), note-on/note-off tracking, AudioContext lifecycle. Exposes `playNote()`, `stopNote()`, `setTransport()`, `dispose()`.
20	NEW	`src/client/features/synth/model/use-synth-engine.ts`	60	v5: reduced from 100 LOC — now a thin React wrapper around `AudioEngine`. Creates/disposes the engine on mount/unmount. Exposes engine ref for direct WS routing.
21	NEW	`src/client/features/synth/ui/Keyboard.tsx`	120
22	NEW	`src/client/features/synth/index.ts`	5
23	NEW	`src/client/features/visualizer/ui/Visualizer.tsx`	80	v5: reads from `noteBufferRef.current` via `requestAnimationFrame` loop — not React state. Canvas-based rendering.
24	NEW	`src/client/features/visualizer/index.ts`	5
		Total	~1,340	+120 LOC from v4 (new `clock-sync.ts` + `audio-engine.ts` + expanded `jam-room.ts`)

Agent SDK (src/agent-sdk/index.ts, ~80 LOC) deferred to Phase 1b after core room works.

9d: Pre-Execution Prerequisites

Create feat/web-daw-jam-room branch
Scaffold Vite + React project with npx create-vite
Install: tone, zod, hono (for future API routes)
Configure wrangler.jsonc with DO binding
Verify npx wrangler dev works with DO locally

9e: End-Result Vision

What you CAN do after Phase 1:

Open https://jam-room.example.com/room/my-room in two browser tabs
Play notes on a virtual keyboard (or MIDI controller) — hear them in both tabs
See a peer list showing who's connected with color-coded note attribution
Change BPM/key — all peers sync instantly
See a real-time waveform visualizer driven by the audio output
Experience tight rhythmic sync thanks to SNTP clock anchoring
No audio glitches even at high note density thanks to React-decoupled AudioEngine
Notes properly cleaned up when a peer disconnects (no hanging drones)

What's explicitly DEFERRED:

Agent SDK (Phase 1b — after core room proven)
LLM Agent Measure Aggregation (Phase 1b — measure_complete event bundling for Foundation Models. The DO emits a JSON bundle of all notes at the end of each measure, allowing LLMs to receive bounded context windows instead of a real-time firehose. LLM agents can sit dormant, receive the measure summary, take 2 seconds to "think", and schedule their response to drop at the start of Measure N+2.)
Multi-instrument selection
Step sequencer / pattern recording
AI composition endpoint
Room persistence / lobby
Authentication

What DOESN'T change: No existing projects affected. Greenfield workspace.

Phase 10: Production-Audio Grade Mitigations (Expert Critique Summary) 🔥

Note

These mitigations were identified via expert-level audio engineering review and address physical constraints in distributed real-time audio that are invisible in standard web architectures. Each has been folded into the relevant phase above. This section serves as a cross-reference summary.

🔴 Critical (Must implement before first playable build)

#	Dragon	Root Cause	Mitigation	Integrated In
1	Clock Skew Math Ignores RTT	Naive `serverTime - localTime` offset ignores 20-80ms network delay. Also, `Date.now()` and `Tone.now()` drift apart at ~1ms/min.	SNTP ping-pong handshake with `TrueOffset = ((t1 - t0) + (t2 - t3)) / 2`. All audio scheduling anchored to `Tone.now()`, never `Date.now()`.	Phase 2 #1, Phase 4, Phase 6, new file `clock-sync.ts`
2	React Main Thread Starvation	50-100 WS events/sec in React state → constant reconciliation → GC spikes → Tone.js glitches	Vanilla TS `AudioEngine` class handles audio hot path. React only handles cold-path UI. Visualizer reads from `useRef` via `requestAnimationFrame`.	Phase 2 #6, Phase 4, Phase 6, new file `audio-engine.ts`

🟡 Important (Implement during Phase 1)

#	Dragon	Root Cause	Mitigation	Integrated In
3	DO Serialization Thrashing	100+ `JSON.parse()` + SQLite inserts/sec blocks DO event loop → WS relay stutters	String-match note events for raw relay (no parse). Batch persist to SQLite every 5s.	Phase 2 #7, Phase 4, Phase 5, `jam-room.ts`
4	Live Input Playout Delay	Human `note_on` at Beat 4.0 arrives 40ms late at remote peers → played in the past	Hybrid delay: instant local playback + broadcast at `Tone.now() + playoutDelay`.	Phase 2 #8, `audio-engine.ts`

🟢 Important (Straightforward additions)

#	Dragon	Root Cause	Mitigation	Integrated In
5	Hanging Notes / MIDI Panic	`note_on` without `note_off` due to disconnect → infinite drone	DO tracks `activeNotes` per peer. `webSocketClose` broadcasts fabricated `note_off`.	Phase 2 #9, Phase 6, `jam-room.ts`
6	LLM Agent Firehose	Foundation Models can't process continuous 150-byte frames — need bounded context windows	`measure_complete` event bundles all notes at measure end. LLMs receive, think, respond 2 measures later.	Phase 9e (deferred to Phase 1b)

Verification Plan

Automated: Vitest for protocol Zod schemas + DO unit tests (@cloudflare/vitest-pool-workers). Cypress for two-client WebSocket relay + frontend audio playback.

Clock Sync Verification (v5): Unit test clock-sync.ts with mocked latencies (10ms, 50ms, 100ms) — verify TrueOffset calculation accuracy within 5ms.

Audio Decoupling Verification (v5): Stress test with 100 synthetic WS events/sec — verify zero React re-renders on note events, zero Tone.js glitches over 60 seconds.

MIDI Panic Verification (v5): Simulate peer disconnect with active notes — verify note_off broadcast within 100ms and no hanging audio.

DO Hot-Path Verification (v5): Load test with 10 synthetic peers × 10 notes/sec — verify WS relay latency stays <10ms p95, SQLite batch flush occurs every ~5s.

Manual: Two-tab jam test, latency feel test, BPM change propagation.

Canary: WS success rate >99%, note round-trip <150ms p95, SNTP sync accuracy <10ms, zero 5xx for 30 min.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent-First Web DAW Jam Room — v5 (Production-Audio Grade)

Phase 1: Problem Space

Phase 2: Critique & Destroy 🔥

Phase 3: Trade-off Matrices 📊

Real-Time Transport

State Manager

Synth Engine

Clock Synchronization (v5 — NEW)

Audio Architecture (v5 — NEW)

Phase 4: Constraints & Limits Audit 🚧

Phase 5: UI Data Access Pattern 🖥️

Phase 6: Failure Mode Analysis 💀

Phase 7: Migration & Rollback Plan 🔄

Phase 8: Architecture Decision Record

Context

Decision

Trade-offs Accepted

Limits & Risks

Alternatives Considered

Phase 9: Iterative Refinement

9a: Cross-System Validation

9b: Compliance Audit

9c: File Inventory (24 files)

9d: Pre-Execution Prerequisites

9e: End-Result Vision

Phase 10: Production-Audio Grade Mitigations (Expert Critique Summary) 🔥

🔴 Critical (Must implement before first playable build)

🟡 Important (Implement during Phase 1)

🟢 Important (Straightforward additions)

Verification Plan

Checklist Before Code

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Agent-First Web DAW Jam Room — v5 (Production-Audio Grade)

Phase 1: Problem Space

Phase 2: Critique & Destroy 🔥

Phase 3: Trade-off Matrices 📊

Real-Time Transport

State Manager

Synth Engine

Clock Synchronization (v5 — NEW)

Audio Architecture (v5 — NEW)

Phase 4: Constraints & Limits Audit 🚧

Phase 5: UI Data Access Pattern 🖥️

Phase 6: Failure Mode Analysis 💀

Phase 7: Migration & Rollback Plan 🔄

Phase 8: Architecture Decision Record

Context

Decision

Trade-offs Accepted

Limits & Risks

Alternatives Considered

Phase 9: Iterative Refinement

9a: Cross-System Validation

9b: Compliance Audit

9c: File Inventory (24 files)

9d: Pre-Execution Prerequisites

9e: End-Result Vision

Phase 10: Production-Audio Grade Mitigations (Expert Critique Summary) 🔥

🔴 Critical (Must implement before first playable build)

🟡 Important (Implement during Phase 1)

🟢 Important (Straightforward additions)

Verification Plan

Checklist Before Code