Skip to content

Commit 0800b6b

Browse files
authored
docs: VoIP subsystem documentation (#7302)
1 parent cd2faa0 commit 0800b6b

6 files changed

Lines changed: 600 additions & 1 deletion

File tree

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,6 @@ e2e/e2e_account.ts
9292
skills-lock.json
9393
CLAUDE.local.md
9494
AGENTS.md
95-
docs/
95+
/docs/
9696
.superset/
9797
.jest-cache/

UBIQUITOUS_LANGUAGE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,9 @@
8686
| **Direct Video Conference** | A 1-on-1 Video Conference ||
8787
| **Group Video Conference** | A multi-participant Video Conference with title and anonymous user support ||
8888
| **VOIP** | Voice-over-IP phone-style call, separate from Video Conference — uses ICE servers and media streams | Phone call, voice call |
89+
| **Native Accept** | An incoming VOIP call answered by native code (CallKit on iOS, Telecom on Android) before the JS runtime is available; native issues the REST accept and JS reconciles state on launch via initial events | JS accept, app accept |
90+
| **Per-call DDP** | A short-lived DDP client opened by native code per incoming VOIP call so accept and signaling land before JS boots; separate from the main app DDP session | Native socket, side socket |
91+
| **Media Signal** | A typed event on the `@rocket.chat/media-signaling` wire protocol (offer, answer, ICE candidate, state update) carried over DDP `stream-notify-user` and replayable via REST `media-calls.stateSignals` | Signal, RTC event |
8992

9093
## Server & Connection
9194

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# VoIP Architecture
2+
3+
Load-bearing reference for the structure of the VoIP subsystem. Read this before `FLOWS.md`, `iOS.md`, `ANDROID.md`, or `RUNBOOK.md` — those documents assume the vocabulary defined here.
4+
5+
## Overview
6+
7+
VoIP is a peer-to-peer audio call subsystem spanning three runtimes: **TypeScript** (the React Native app), **Swift** (iOS native modules), and **Kotlin** (Android native modules). Each runtime fills a layer:
8+
9+
- **Native UX layer** — Swift and Kotlin own the system call UI (CallKit on iOS, Telecom on Android), wake the app on incoming calls, and answer calls before JS exists.
10+
- **Signaling layer** — TypeScript drives the `@rocket.chat/media-signaling` session, which exchanges typed Media Signals over DDP (`stream-notify-user`) with the Rocket.Chat server.
11+
- **Media layer**`react-native-webrtc` carries the audio streams; ICE servers come from server settings.
12+
13+
VoIP is not VideoConf. VideoConf (`app/sagas/videoConf.ts`, `app/lib/methods/videoConf.ts`) is a Jitsi-based group meeting feature using **Redux** for state and a different wire protocol. VoIP uses **Zustand** stores (`useCallStore`, `usePeerAutocompleteStore`), a singleton `MediaSessionInstance`, and the `@rocket.chat/media-signaling` protocol. The two features must not share Redux keys, sagas, selectors, or call objects.
14+
15+
---
16+
17+
## State Model
18+
19+
### Singleton lifecycle
20+
21+
`MediaSessionInstance` is a module-level singleton (`mediaSessionInstance`). It is created at module load but holds no `MediaSignalingSession` until `init(userId)` runs. Every entry point that depends on the session — `startCall`, `startCallByRoom`, `answerCall`, `applyRestStateSignals` — must guard against a null `instance`. The user-facing recovery for "instance not yet initialized" is `VoIP_Still_Connecting`.
22+
23+
`init(userId)` calls `reset()` first, registers the WebRTC processor factory and the DDP signal transport on `mediaSessionStore`, asks the store for an instance bound to that user id, replays REST state signals, and subscribes to the `stream-notify-user` event stream. `reset()` tears down listeners, disposes the store, and clears the call store back to its initial state — but preserves `nativeAcceptedCallId` so a pending native-accepted incoming call can still be reconciled.
24+
25+
### Call store (`useCallStore`)
26+
27+
The Zustand call store holds the active call and its UI mirror. Key fields:
28+
29+
- `call: IClientMediaCall | null` — the bound call object from `@rocket.chat/media-signaling`.
30+
- `callId: string | null` — id of the bound call; cleared on `reset()`.
31+
- `nativeAcceptedCallId: string | null` — id of an incoming call that native accepted before JS bound it. **Survives `reset()`**, governed by a 60s stale-clearing timer.
32+
- `callState`, `isMuted`, `isOnHold`, `remoteMute`, `remoteHeld`, `isSpeakerOn`, `callStartTime`, `controlsVisible`, `focused`, `dialpadValue` — UI mirror of the call.
33+
- `roomId` — DM room id resolved from the contact (used to highlight the right room in the chats list).
34+
- `direction``'incoming' | 'outgoing'`, set during call binding.
35+
36+
`setCall` binds a call object and registers `stateChange`, `trackStateChange`, and `ended` emitter listeners. `reset` removes those listeners, stops `InCallManager`, and on Android calls `NativeVoipModule.stopAudioRouteSync()`. `endCall` is the user-driven path (in-app hangup) and converges with the native CallKit/Telecom end event on `mediaSessionInstance.endCall(callUUID)`.
37+
38+
### Native accept race bridge
39+
40+
Native code on iOS and Android can accept an incoming call before the JS app is alive — see [Native Bridge Contract](#native-bridge-contract). Once JS boots, it does not know which call object on the eventual `MediaSignalingSession` corresponds to the native-accepted one. The bridge is `nativeAcceptedCallId`:
41+
42+
- Native emits `VoipAcceptSucceeded { callId, host, type: 'incoming_call' }` (cold start: stashed and read via `getInitialEvents()`; warm: `NativeEventEmitter` / `DeviceEventEmitter`).
43+
- JS calls `setNativeAcceptedCallId(callId)`, which (re)starts a 60s stale timer (`STALE_NATIVE_MS`).
44+
- Two paths can subsequently bind a call to that id:
45+
1. The DDP `media-signal` stream delivers a `notification` signal with `type: 'accepted'`, matching `signedContractId` (mobile device id) and matching `callId`. `tryAnswerIfNativeAcceptedNotification` triggers `answerCall(callId)`.
46+
2. `applyRestStateSignals` replays `media-calls.stateSignals` and finds the `accepted` notification first, triggering the same `answerCall`.
47+
- `answerCall` is **idempotent**: if `existingCall.callId === callId`, it returns early. Both paths can fire without harm.
48+
- On any path that fails (no `MediaSignalingSession` call data found, `accept()` rejects), JS calls `terminateNativeCall(callId)` and clears `nativeAcceptedCallId` so the user is not left with a stuck CallKit/Telecom session.
49+
- The 60s stale timer guards against a native accept that never reconciles (e.g. server returned a fatal error). On expiry, `nativeAcceptedCallId` is cleared if `call` is still null and the id still matches the scheduled token.
50+
51+
### In-call guard
52+
53+
`isInActiveVoipCall()` returns true when either `call` or `nativeAcceptedCallId` is non-null. It gates new outgoing calls, suppresses incoming VideoConf invitations (`voipBlocksIncomingVideoconf`), and is re-evaluated **after** the OS microphone permission prompt resolves (an incoming call may have arrived during the prompt).
54+
55+
### Self-call guard
56+
57+
`isSelfUserId(userId)` compares against `login.user?.id` (not `username`, because `username` may be undefined in stale Redux state). `startCall` short-circuits silently for the self case, and the autocomplete UI filters self out. See PR #7236.
58+
59+
### Live ICE configuration
60+
61+
ICE servers and gathering timeout are sourced from Redux settings (`VoIP_TeamCollab_Ice_Servers`, `VoIP_TeamCollab_Ice_Gathering_Timeout`). `MediaSessionInstance` subscribes to the Redux store and forwards changes to `instance.setIceServers` / `instance.setIceGatheringTimeout` whenever they differ (`dequal`-checked). This means an admin changing ICE servers does not require a client restart.
62+
63+
### Peer autocomplete store
64+
65+
`usePeerAutocompleteStore` is a separate Zustand store for the "start a call to…" picker. It has no coupling to the active call; it tracks `options`, `selectedPeer`, and `filter`, with a sequence counter to discard stale fetches.
66+
67+
---
68+
69+
## Signaling Protocol
70+
71+
> **Stability: experimental.** The wire protocol and message shapes evolve with `@rocket.chat/media-signaling`. The boundary documented here is what the RN app consumes; the package owns the canonical specification.
72+
73+
### Roles
74+
75+
The RN app is a **client** of `@rocket.chat/media-signaling`. It does not implement the protocol — it wires the package to a transport (DDP), a media stack (`react-native-webrtc`), and an identity (`mobileDeviceId` from `react-native-device-info`).
76+
77+
A call has two **participants**: a `caller` (initiates `startCall`) and a `callee` (receives a `newCall` event with `localParticipant.role === 'callee'`). The package surfaces calls as `IClientMediaCall` objects with an `emitter` and lifecycle methods (`accept`, `reject`, `hangup`, `sendDTMF`, `localParticipant.setMuted`, `localParticipant.setHeld`).
78+
79+
### Transport — DDP
80+
81+
Outbound `ClientMediaSignal`s are sent via `sdk.methodCall('stream-notify-user', '<userId>/media-calls', JSON.stringify(signal))`. Inbound `ServerMediaSignal`s arrive on `sdk.onStreamData('stream-notify-user', …)` and are filtered to the `media-signal` event before being handed to `instance.processSignal(signal)`.
82+
83+
### Replay — REST
84+
85+
`media-calls.stateSignals` returns the current set of "live" signals for the device, used to recover state when:
86+
87+
- The session is created (`init` calls `applyRestStateSignals`).
88+
- A native accept races ahead of the DDP stream and `setNativeAcceptedCallId` fires before the corresponding `notification/accepted` arrives (`MediaCallEvents` triggers replay on `VoipAcceptSucceeded`).
89+
90+
`applyRestStateSignals` is **idempotent**. It calls `instance.processSignal(signal)` for each signal — the package deduplicates internally. After processing each signal it also runs `tryAnswerIfNativeAcceptedNotification`, which only triggers when device id, call id, and `nativeAcceptedCallId` all line up and no `call` is bound yet.
91+
92+
### Notification subtype — `accepted`
93+
94+
The `notification` signal with `signedContractId === mobileDeviceId && type === 'accepted'` is the JS-side indicator that **this device's** native code has already accepted **this call**. Both the live DDP path and REST replay run the same matcher; mismatched `signedContractId` (a different device on the same account) is ignored.
95+
96+
### Package boundary
97+
98+
The RN app does not reach into the package's internals. It:
99+
100+
- Provides `MediaCallWebRTCProcessor` factory (configured with the live ICE servers and gathering timeout).
101+
- Provides the DDP signal transport (out) and forwards DDP messages (in) to `processSignal`.
102+
- Provides `randomStringFactory`, `mediaStreamFactory` (camera/mic via `react-native-webrtc`), and `mobileDeviceId`.
103+
- Listens on `instance.on('newCall', …)` to bind outgoing calls into `useCallStore`.
104+
105+
Everything below the wire (offer/answer SDP, ICE candidate exchange, call state machine) is the package's responsibility. The package's own protocol spec is canonical.
106+
107+
---
108+
109+
## Native Bridge Contract
110+
111+
> **Stability: experimental.** Event names and payload shapes are coordinated between `app/lib/native/NativeVoip` and the iOS/Android native modules. Any change must be made in lockstep across all three.
112+
113+
### Why native must own the early path
114+
115+
A VoIP push wakes the app process but does not necessarily wake the JS runtime. iOS, in particular, requires the app to report an incoming call to CallKit within ~5 seconds of receiving a PushKit payload or the OS kills the app. Android uses an FCM data payload with a foreground service. Either way, the native side must:
116+
117+
1. Show the system call UI before JS is reliably running.
118+
2. If the user accepts on the lock screen / Telecom UI, send the REST `media-calls.answer` request **before** JS exists, so the server records acceptance with no perceptible delay.
119+
3. Stash the accept result so JS can reconcile when it boots.
120+
121+
### Per-call DDP
122+
123+
The RN app's main DDP socket is owned by `sdk` and is bound to the active workspace once login completes. That socket is not available during a cold-start incoming call. Native code therefore opens a short-lived **per-call DDP** client per incoming call (`VoipPerCallDdpRegistry` on both platforms) so the REST accept and any inbound signaling can land before the main socket is up. The per-call client closes when the call ends or when JS takes over.
124+
125+
### Events crossing JS ↔ native
126+
127+
JS subscribes to native events via a single emitter (`NativeEventEmitter(NativeVoipModule)` on iOS, `DeviceEventEmitter` on Android) plus `RNCallKeep`'s emitter. The contract:
128+
129+
| Event | Direction | Carrier | Purpose |
130+
| ---------------------------------------------------------- | ----------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
131+
| `VoipPushTokenRegistered` | native → JS | iOS only | A new PushKit token is available; JS calls `registerPushToken()`. |
132+
| `VoipAcceptSucceeded` | native → JS | both | Native accept completed; payload `{ callId, host, type, username? }`; JS sets `nativeAcceptedCallId` and either replays REST (host matches) or hands off to deep linking (host differs). |
133+
| `VoipAcceptFailed` | native → JS | both | Native accept failed; JS dispatches the deep-linking pipeline so the user lands on a usable state on the right workspace. |
134+
| `VoipCommunicationDeviceChanged` | native → JS | Android only | OS audio route changed (speaker on/off); JS mirrors `isSpeakerOn`. |
135+
| `RNCallKeep:endCall` | native → JS | both | User pressed end on the system UI; JS calls `mediaSessionInstance.endCall(callUUID)`. |
136+
| `RNCallKeep:didPerformSetMutedCallAction` | native → JS | iOS | OS mute toggle; JS reconciles via the `if (muted !== isMuted) toggleMute()` echo guard. |
137+
| `RNCallKeep:didToggleHoldCallAction` | native → JS | both | OS hold (e.g. competing call); JS uses `wasAutoHeld` to distinguish OS-driven hold from manual hold. |
138+
| `NativeVoipModule.startAudioRouteSync` | JS → native | Android only | Begin observing audio route changes for `VoipCommunicationDeviceChanged`. |
139+
| `NativeVoipModule.setSpeakerOn` | JS → native | Android only | Drive `AudioManager` directly (Android speaker toggle). |
140+
| `NativeVoipModule.getInitialEvents` / `clearInitialEvents` | JS → native | both | Cold-start handoff: read and clear the stashed accept event. |
141+
142+
### Cold start — initial events
143+
144+
When JS boots after an incoming call was native-accepted while the app was killed, `getInitialMediaCallEvents()` runs early in the app init sequence. It reads `NativeVoipModule.getInitialEvents()` and decides:
145+
146+
- If `voipAcceptFailed`, dispatch the deep-linking pipeline and **return true** so the standard `appInit()` is skipped (the deep-linking saga will handle it).
147+
- If the host matches the current workspace, on iOS replay REST signals and return true (skip `appInit`); on Android return false so `appInit` proceeds (Android takes a different cold-start handoff path).
148+
- If the host differs, dispatch deep linking with `{ callId, host }` to switch workspaces and resume.
149+
150+
Both warm and cold paths call `NativeVoipModule.clearInitialEvents()` after consuming, so a re-launch in the same process does not re-fire.
151+
152+
### Cross-workspace race
153+
154+
If a push arrives for workspace B while the user is currently logged into workspace A, the host check fails. JS calls `onOpenDeepLink` with `{ callId, host }`, which the deep-linking saga uses to switch the active server, run login, and resume the call binding. The dedup sentinels (`lastHandledVoipAcceptFailureCallId`, `lastHandledVoipAcceptSucceededCallId`) guard against double-firing when the warm event and cold-start replay both deliver the same payload.
155+
156+
---
157+
158+
## Invariants
159+
160+
Each invariant is grounded in a test in `app/lib/services/voip/*.test.ts` or an inline code comment. CI does not enforce these — they are author obligations during code review.
161+
162+
- **Singleton readiness**`startCall`, `startCallByRoom`, and `answerCall` must guard against `instance == null`. Verified by `MediaSessionInstance.test.ts` cases under `startCall` ("shows alert and skips … when instance is null") and `startCallByRoom shows alert when instance is null`.
163+
- **Native accept race bridge**`nativeAcceptedCallId` survives `useCallStore.reset()`. Verified by `useCallStore.test.ts` ("reset preserves nativeAcceptedCallId", "after 60s unbound, clears nativeAcceptedCallId when id still matches scheduled token").
164+
- **`answerCall` idempotency** — calling `answerCall(callId)` twice for the same call is a no-op. Verified by the `stream-notify-user (notification/accepted gated)` and `REST state signals replay (native accept race)` blocks in `MediaSessionInstance.test.ts`.
165+
- **In-call guard re-evaluation**`startCall` re-checks `isInActiveVoipCall()` after the permission prompt resolves. Verified by `startCall post-permission guard (B6)` block ("throws VoIP_Already_In_Call when active call arrives during permission prompt").
166+
- **Self-call guard** — calls cannot be initiated to the logged-in user's own id. Verified by `startCall` ("silently drops self-call when userId matches logged-in user id") and `isSelfUserId.test.ts`. See PR #7236.
167+
- **REST replay idempotency**`applyRestStateSignals` can run repeatedly (init, post native-accept, recovery) without producing duplicate state. Each signal is handed to `instance.processSignal`, which the package deduplicates; the local `tryAnswerIfNativeAcceptedNotification` matcher only fires when `call == null`.
168+
- **Mute echo guard** — the iOS `didPerformSetMutedCallAction` listener calls `toggleMute()` only when `muted !== isMuted`, breaking the OS↔JS feedback loop. See inline code in `MediaCallEvents.ts`.
169+
- **Stale-session UUID gating**`didPerformSetMutedCallAction` and `didToggleHoldCallAction` compare the event UUID (lowercased) against the active call UUID and drop the event on mismatch. Required because `setupMediaCallEvents` lives on Root and survives logout/server-switch. See inline comments in `MediaCallEvents.ts`.
170+
- **Auto-hold vs manual hold**`wasAutoHeld` distinguishes OS-driven hold (a competing CallKit/Telecom call) from a user manually pressing hold; only auto-held calls are auto-resumed. See inline code in `MediaCallEvents.ts`.
171+
- **Optimistic `roomId` rollback**`startCallByRoom` clears its optimistic `setRoomId` if `startCall` rejects, so a concurrent incoming call can resolve its own DM context. Verified by `roomId population` block ("startCallByRoom clears optimistic roomId when post-permission guard rejects").
172+
- **Self-host signal gating**`notification/accepted` signals are only acted on when `signedContractId === mobileDeviceId`; another device on the same account cannot trick this device into binding. Verified by the `stream-notify-user (notification/accepted gated)` block.

0 commit comments

Comments
 (0)