Conversation
When the WebRTC peer connection went to .disconnected/.failed, the transport delegate called primaryTransportConnectedCompleter.reset(), which threw LiveKitError(.cancelled) to anyone awaiting the completer. Consumers couldn't distinguish a network-induced disconnect from a user-initiated cancel through room.disconnectError or didDisconnectWithError. - AsyncCompleter.reset(throwing:) accepts an optional Error and normalizes via LiveKitError.from. WaitEntry.cancel(throwing:) takes an optional LiveKitError, defaulting to .cancelled, so the withTaskCancellationHandler.onCancel path still produces .cancelled for real Task cancellation. - Room+TransportDelegate now resets the transport completers with a typed LiveKitError(.network, message:) on PC disconnect/failed. - Room.cleanUp / SignalClient.cleanUp thread the disconnect cause through to in-flight completer waiters instead of masking it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the previous .network-typing fix to the completers that the initial change missed: - CompleterMapActor.reset(throwing:) mirrors AsyncCompleter; SignalClient now passes disconnectError to _addTrackCompleters so a sendAddTrack waiter caught mid-disconnect sees the underlying cause. - DataChannelPair.reset(throwing:) + Room.cleanUpRTC(withError:) so the publisherDataChannel.openCompleter waiter inside Room.send(dataPacket:) sees .network instead of .cancelled when the room is being torn down. Adds CompleterMapActorTests covering fan-out + default-to-cancelled behavior, and promotes waitForRegistration to file scope so both suites share one helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the only completer that previously had no error-propagation path
on disconnect. Both halves of this change are needed; either alone
leaves the leak open.
- Room.cleanUp(withError:) now resets activeParticipantCompleters with
the typed disconnectError, so any RemoteParticipant.waitUntilActive
in flight is unblocked immediately with .network (or whatever cause)
instead of waiting up to its full 10s timeout.
- CompleterMapActor.resume(throwing:, for:) is now a no-op when no
completer exists for the key. cleanUpParticipants mutates each
participant's state to .unknown, which fires a fire-and-forget
Task { resume(throwing: .participantRemoved, ...) } from the
state observer in Participant.swift. Without this guard, those
Tasks land after the reset and re-populate the just-cleared map
with stale .failure(.participantRemoved) entries, which then
greet the next session's same-identity waitUntilActive.
resume(returning:, for:) keeps its auto-create behavior: success
is meaningfully sticky ("the participant is active, anyone asking
later should know"), failure is not.
Tests refactored to a do/catch + Issue.record helper because
#expect(throws:) returns Void (not E?) on the swift-testing bundled
with Xcode 16.2. New CompleterMapActorTests cover the asymmetry
(no-op on missing key for throws, remember-success for returning,
existing-waiter still receives throws).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The compiler's "never mutated" analysis flags `weak var weakRP: ... = remoteParticipant` because the runtime nil-out isn't user-code mutation — but its suggested fix (`let`) won't compile, since `weak let` is illegal. Move `weak` into the closure capture list instead. The closure now holds the only weak reference and reads it when the leak check fires; same semantics, no warning, fewer lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stopwatch is a deprecated typealias for Span and split(label:) was renamed to record(_:at:). Update PublishDeviceOptimization tests accordingly and rename the local sw → span to match the type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Span is a class, so the local doesn't need to be var — the var triggers Swift's "never mutated" warning on stricter CI builds (Xcode 26.4). Switch to let; record(...) calls work either way. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
One small concern: this now turns every Could this also happen during a normal Maybe we should gate this to active connection/reconnect states, or skip the typed network reset when the room is already |
|
@hiroshihorie yes, I'm afraid of wrapping here; will check your case maybe adding more tests around it. BTW I started working on typed throws #988 to see how much damage it does. |
Per @hiroshihorie's review on PR #987: a peer-connection .disconnected/.failed callback that fires while the room is already .disconnecting/.disconnected (i.e. as part of Transport.close() during cleanUpRTC, before _pc.delegate = nil takes effect) would otherwise reset the transport completers with LiveKitError(.network). A caller that registered a wait between cleanUp's reset and the late delegate firing would then see .network — even though the disconnect was user-initiated. Gate the typed-network construction on connection state: when the room is in teardown, fall back to default .cancelled so the distinction between network-induced disconnects and user cancellation stays reliable. The race is timing-dependent and not deterministic to reproduce as a unit test (would require either a real livekit-server connection or a Transport mock that doesn't currently exist in the test infrastructure). Existing CompleterTests cover the underlying mechanism on both branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cleans up the gate added in aa99573 by reading the state once via a small computed property on the enum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@hiroshihorie fixed in 56d4a8b |
Small but important updates to
AsyncCompleter(Map), allowing custom error types.Should not collide with (upcoming) typed throws.