Propagate more n/w errors by pblazej · Pull Request #987 · livekit/client-sdk-swift

pblazej · 2026-05-06T08:18:53Z

Small but important updates to AsyncCompleter(Map), allowing custom error types.

Should not collide with (upcoming) typed throws.

When the WebRTC peer connection went to .disconnected/.failed, the transport delegate called primaryTransportConnectedCompleter.reset(), which threw LiveKitError(.cancelled) to anyone awaiting the completer. Consumers couldn't distinguish a network-induced disconnect from a user-initiated cancel through room.disconnectError or didDisconnectWithError. - AsyncCompleter.reset(throwing:) accepts an optional Error and normalizes via LiveKitError.from. WaitEntry.cancel(throwing:) takes an optional LiveKitError, defaulting to .cancelled, so the withTaskCancellationHandler.onCancel path still produces .cancelled for real Task cancellation. - Room+TransportDelegate now resets the transport completers with a typed LiveKitError(.network, message:) on PC disconnect/failed. - Room.cleanUp / SignalClient.cleanUp thread the disconnect cause through to in-flight completer waiters instead of masking it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends the previous .network-typing fix to the completers that the initial change missed: - CompleterMapActor.reset(throwing:) mirrors AsyncCompleter; SignalClient now passes disconnectError to _addTrackCompleters so a sendAddTrack waiter caught mid-disconnect sees the underlying cause. - DataChannelPair.reset(throwing:) + Room.cleanUpRTC(withError:) so the publisherDataChannel.openCompleter waiter inside Room.send(dataPacket:) sees .network instead of .cancelled when the room is being torn down. Adds CompleterMapActorTests covering fan-out + default-to-cancelled behavior, and promotes waitForRegistration to file scope so both suites share one helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the only completer that previously had no error-propagation path on disconnect. Both halves of this change are needed; either alone leaves the leak open. - Room.cleanUp(withError:) now resets activeParticipantCompleters with the typed disconnectError, so any RemoteParticipant.waitUntilActive in flight is unblocked immediately with .network (or whatever cause) instead of waiting up to its full 10s timeout. - CompleterMapActor.resume(throwing:, for:) is now a no-op when no completer exists for the key. cleanUpParticipants mutates each participant's state to .unknown, which fires a fire-and-forget Task { resume(throwing: .participantRemoved, ...) } from the state observer in Participant.swift. Without this guard, those Tasks land after the reset and re-populate the just-cleared map with stale .failure(.participantRemoved) entries, which then greet the next session's same-identity waitUntilActive. resume(returning:, for:) keeps its auto-create behavior: success is meaningfully sticky ("the participant is active, anyone asking later should know"), failure is not. Tests refactored to a do/catch + Issue.record helper because #expect(throws:) returns Void (not E?) on the swift-testing bundled with Xcode 16.2. New CompleterMapActorTests cover the asymmetry (no-op on missing key for throws, remember-success for returning, existing-waiter still receives throws). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The compiler's "never mutated" analysis flags `weak var weakRP: ... = remoteParticipant` because the runtime nil-out isn't user-code mutation — but its suggested fix (`let`) won't compile, since `weak let` is illegal. Move `weak` into the closure capture list instead. The closure now holds the only weak reference and reads it when the leak check fires; same semantics, no warning, fewer lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stopwatch is a deprecated typealias for Span and split(label:) was renamed to record(_:at:). Update PublishDeviceOptimization tests accordingly and rename the local sw → span to match the type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Span is a class, so the local doesn't need to be var — the var triggers Swift's "never mutated" warning on stricter CI builds (Xcode 26.4). Switch to let; record(...) calls work either way. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hiroshihorie · 2026-05-07T06:46:00Z

One small concern: this now turns every .disconnected / .failed peer-connection callback into LiveKitError(.network) before checking the room state.

Could this also happen during a normal room.disconnect()? In that path we move to .disconnecting, then cleanUpRTC() closes the transports, and WebRTC may emit a disconnect/failed state as part of that teardown. If a caller is concurrently waiting on the transport completers, they could now see .network even though the disconnect was user-initiated.

Maybe we should gate this to active connection/reconnect states, or skip the typed network reset when the room is already .disconnecting / .disconnected, so the new distinction between network failures and user cancellation stays reliable.

pblazej · 2026-05-07T08:37:21Z

@hiroshihorie yes, I'm afraid of wrapping here; will check your case maybe adding more tests around it.

BTW I started working on typed throws #988 to see how much damage it does.

@hiroshihorie

Per @hiroshihorie's review on PR #987: a peer-connection .disconnected/.failed callback that fires while the room is already .disconnecting/.disconnected (i.e. as part of Transport.close() during cleanUpRTC, before _pc.delegate = nil takes effect) would otherwise reset the transport completers with LiveKitError(.network). A caller that registered a wait between cleanUp's reset and the late delegate firing would then see .network — even though the disconnect was user-initiated. Gate the typed-network construction on connection state: when the room is in teardown, fall back to default .cancelled so the distinction between network-induced disconnects and user cancellation stays reliable. The race is timing-dependent and not deterministic to reproduce as a unit test (would require either a real livekit-server connection or a Transport mock that doesn't currently exist in the test infrastructure). Existing CompleterTests cover the underlying mechanism on both branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cleans up the gate added in aa99573 by reading the state once via a small computed property on the enum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pblazej · 2026-05-07T09:33:47Z

@hiroshihorie fixed in 56d4a8b

pblazej · 2026-05-11T06:16:01Z

cc @dfed should help with your case

pblazej and others added 2 commits May 6, 2026 09:59

pblazej requested a review from hiroshihorie May 6, 2026 08:18

pblazej changed the title ~~Propagate more tranport errors~~ Propagate more transport errors May 6, 2026

pblazej and others added 3 commits May 6, 2026 10:35

pblazej changed the title ~~Propagate more transport errors~~ Propagate more n/w errors May 6, 2026

pblazej and others added 2 commits May 7, 2026 11:15

refactor(core): extract ConnectionState.isTearingDown helper

56d4a8b

Cleans up the gate added in aa99573 by reading the state once via a small computed property on the enum. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pblazej requested a review from xianshijing-lk as a code owner May 7, 2026 09:33

pblazej mentioned this pull request May 8, 2026

Single PC improvements #986

Merged

Merge branch 'main' into blaze/completer-error-propagation

394df87

hiroshihorie approved these changes May 8, 2026

View reviewed changes

pblazej merged commit f5f8883 into main May 11, 2026
67 of 71 checks passed

pblazej deleted the blaze/completer-error-propagation branch May 11, 2026 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Propagate more n/w errors#987

Propagate more n/w errors#987
pblazej merged 9 commits into
mainfrom
blaze/completer-error-propagation

pblazej commented May 6, 2026

Uh oh!

hiroshihorie commented May 7, 2026

Uh oh!

pblazej commented May 7, 2026

Uh oh!

pblazej commented May 7, 2026

Uh oh!

pblazej commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pblazej commented May 6, 2026

Uh oh!

hiroshihorie commented May 7, 2026

Uh oh!

pblazej commented May 7, 2026

Uh oh!

pblazej commented May 7, 2026

Uh oh!

pblazej commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants