Describe the problem
In rooms with many participants, a burst of server-initiated renegotiations (e.g. several participants leaving within a few seconds — each leave triggers onMediaSectionsRequirement) causes severe client-side degradation:
-
RTCEngine.negotiate() is invoked repeatedly in rapid succession. Each invocation registers Closing and Restarting listeners on the engine emitter (this.on(EngineEvent.Closing, handleClosed) / this.on(EngineEvent.Restarting, handleClosed)). With ~11 concurrent calls, Node's EventEmitter raises MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 restarting/closing listeners added.
-
When one of the concurrent negotiations fails, fullReconnectOnNext = true is set unconditionally, triggering a full engine restart (cleanupPeerConnections + cleanupClient + rejoin signal + rejoin engine).
-
Any track publish in flight at that moment (publishOrRepublishTrack) awaits reconnectFuture.promise and stalls for the full duration of the restart.
Observed impact in production (12-participant room, 7 leaves within 4s, Chrome 146 / macOS, livekit-client@2.18.6, stable network, signal WS never disconnected):
setMicrophoneEnabled(true) stalled for 5.4 seconds when the user first enabled their mic during the mass-leave moment.
- Noise-suppression processor initialization (
AudioWorklet.addModule) failed immediately after the forced restart with AbortError: Unable to load a worklet's module, forcing passthrough fallback.
- Two
MaxListenersExceededWarning entries appeared in Sentry (one for restarting, one for closing).
Relevant breadcrumbs:
07:23:17.395Z publishing track {kind:“audio”, source:“microphone”}
07:23:17-19 7 × video_room.member.status_updated user_state:“left”
07:23:19.730Z MaxListenersExceededWarning: 11 restarting listeners added
(stack: RTCEngine.negotiate → client.onMediaSectionsRequirement → SignalClient.handleSignalResponse)
07:23:19.939Z MaxListenersExceededWarning: 11 closing listeners added (same stack)
07:23:21.421Z [mic] slow toggle — setMicMs=5399 totalMs=5433 trackExists=false
The root cause is not a network failure (HTTP and WS were fine throughout). It is the combination of (a) no listener-accumulation guard on the engine emitter, and (b) overly aggressive escalation to full-reconnect on the first NegotiationError within a concurrent burst.
Describe the proposed solution
-
Stop accumulating listeners per negotiate() call. In RTCEngine.negotiate(), instead of adding a fresh Closing / Restarting listener pair on every invocation, either:
- Register the abort-on-close/restart logic once at engine construction, routing to all in-flight abort controllers through a shared set, or
- Call
this.setMaxListeners(0) (or a reasonable bound like 64) on the engine emitter so that expected concurrency does not trip Node's leak-detection warning.
-
Do not force full-reconnect on a single transient NegotiationError when other negotiations are concurrently in flight. Under signaling bursts, one negotiation failing (e.g. because the PC state was mutated by a parallel negotiation) does not mean the peer connection is genuinely broken. Retry the failing negotiation locally first; only set fullReconnectOnNext = true if a retry confirms the PC is unrecoverable.
-
Coalesce bursty onMediaSectionsRequirement into a single renegotiation. Debounce the handler (e.g. by microtask or a short timer) so that N consecutive media-section-requirement signals within a short window collapse into one negotiate() call. This is the highest-leverage fix — it eliminates the burst at the source rather than hardening downstream layers.
Any one of these reduces the impact significantly; together they remove the class of bug.
Alternatives considered
- Pass a custom
AudioContext via webAudioMix.audioContext in Room options to avoid livekit closing our context on disconnect. This only mitigates the AudioWorklet.addModule failure (a downstream symptom), not the core 5.4s publish stall or the listener accumulation.
- Application-level retry of the mic toggle in client code. Does not help:
publishOrRepublishTrack is already queued and will wait on reconnectFuture regardless of how many times the user clicks.
- Raise
setMaxListeners from app code. Not possible — the RTCEngine emitter is internal and not exposed on the Room API.
- Increase signaling timeouts. Does not address the root cause (timeouts were not hit; the 5.4s stall is from the full-reconnect flow completing successfully, not from a timeout).
Importance
nice to have
Additional Information
Code references (livekit-client 2.18.6 bundled as livekit-client.esm.mjs):
- Listener registration per negotiate:
RTCEngine.negotiate() — this.on(EngineEvent.Closing, handleClosed) and this.on(EngineEvent.Restarting, handleClosed) with matching off() in finally. Cleanup is correct, but under concurrency 11+ pending calls stack 22 listeners.
- Full-reconnect escalation:
fullReconnectOnNext = true inside the NegotiationError branch of negotiate().
- Publish stall point:
publishOrRepublishTrack — first statement awaits this.reconnectFuture?.promise.
- Context close on disconnect:
Room.disconnect() closes this.audioContext when typeof webAudioMix === 'boolean' (default), which is what breaks downstream AudioWorklet.addModule after forced restart.
Happy to share full Sentry breadcrumb dump or test against a candidate fix if it helps.
Describe the problem
In rooms with many participants, a burst of server-initiated renegotiations (e.g. several participants leaving within a few seconds — each leave triggers
onMediaSectionsRequirement) causes severe client-side degradation:RTCEngine.negotiate()is invoked repeatedly in rapid succession. Each invocation registersClosingandRestartinglisteners on the engine emitter (this.on(EngineEvent.Closing, handleClosed)/this.on(EngineEvent.Restarting, handleClosed)). With ~11 concurrent calls, Node's EventEmitter raisesMaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 restarting/closing listeners added.When one of the concurrent negotiations fails,
fullReconnectOnNext = trueis set unconditionally, triggering a full engine restart (cleanupPeerConnections+cleanupClient+ rejoin signal + rejoin engine).Any track publish in flight at that moment (
publishOrRepublishTrack) awaitsreconnectFuture.promiseand stalls for the full duration of the restart.Observed impact in production (12-participant room, 7 leaves within 4s, Chrome 146 / macOS,
livekit-client@2.18.6, stable network, signal WS never disconnected):setMicrophoneEnabled(true)stalled for 5.4 seconds when the user first enabled their mic during the mass-leave moment.AudioWorklet.addModule) failed immediately after the forced restart withAbortError: Unable to load a worklet's module, forcing passthrough fallback.MaxListenersExceededWarningentries appeared in Sentry (one forrestarting, one forclosing).Relevant breadcrumbs:
The root cause is not a network failure (HTTP and WS were fine throughout). It is the combination of (a) no listener-accumulation guard on the engine emitter, and (b) overly aggressive escalation to full-reconnect on the first
NegotiationErrorwithin a concurrent burst.Describe the proposed solution
Stop accumulating listeners per
negotiate()call. InRTCEngine.negotiate(), instead of adding a freshClosing/Restartinglistener pair on every invocation, either:this.setMaxListeners(0)(or a reasonable bound like 64) on the engine emitter so that expected concurrency does not trip Node's leak-detection warning.Do not force full-reconnect on a single transient
NegotiationErrorwhen other negotiations are concurrently in flight. Under signaling bursts, one negotiation failing (e.g. because the PC state was mutated by a parallel negotiation) does not mean the peer connection is genuinely broken. Retry the failing negotiation locally first; only setfullReconnectOnNext = trueif a retry confirms the PC is unrecoverable.Coalesce bursty
onMediaSectionsRequirementinto a single renegotiation. Debounce the handler (e.g. by microtask or a short timer) so that N consecutive media-section-requirement signals within a short window collapse into onenegotiate()call. This is the highest-leverage fix — it eliminates the burst at the source rather than hardening downstream layers.Any one of these reduces the impact significantly; together they remove the class of bug.
Alternatives considered
AudioContextviawebAudioMix.audioContextin Room options to avoid livekit closing our context on disconnect. This only mitigates theAudioWorklet.addModulefailure (a downstream symptom), not the core 5.4s publish stall or the listener accumulation.publishOrRepublishTrackis already queued and will wait onreconnectFutureregardless of how many times the user clicks.setMaxListenersfrom app code. Not possible — theRTCEngineemitter is internal and not exposed on theRoomAPI.Importance
nice to have
Additional Information
Code references (livekit-client 2.18.6 bundled as
livekit-client.esm.mjs):RTCEngine.negotiate()—this.on(EngineEvent.Closing, handleClosed)andthis.on(EngineEvent.Restarting, handleClosed)with matchingoff()infinally. Cleanup is correct, but under concurrency 11+ pending calls stack 22 listeners.fullReconnectOnNext = trueinside theNegotiationErrorbranch ofnegotiate().publishOrRepublishTrack— first statement awaitsthis.reconnectFuture?.promise.Room.disconnect()closesthis.audioContextwhentypeof webAudioMix === 'boolean'(default), which is what breaks downstreamAudioWorklet.addModuleafter forced restart.Happy to share full Sentry breadcrumb dump or test against a candidate fix if it helps.