Conversation
…e use-after-free
Add a serial dispatch queue to protect the `webSockets` dictionary in
WebSocketManager from concurrent access across URLSession delegate
callbacks, main queue, and the RN bridge. All reads/writes are now
wrapped in `queue.sync {}` with atomic remove-and-return in
`invalidateClient` and atomic snapshot+removeAll in `invalidateContext`
to eliminate TOCTOU races.
In WebSocketWrapper.didReceive, capture the weak `self.delegate` into
a strong local at method entry so the delegate cannot deallocate
mid-callback, preventing EXC_BAD_ACCESS.
Addresses Sentry issues MATTERMOST-MOBILE-IOS-XHW and
MATTERMOST-MOBILE-IOS-YQG.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to fix iOS WebSocket crashes by addressing concurrency issues in WebSocketManager and a potential delegate lifetime race in WebSocketWrapper during Starscream callbacks.
Changes:
- Add a serial
DispatchQueueto synchronize access to thewebSocketsdictionary and make remove/snapshot operations atomic. - Update
WebSocketWrapper.didReceiveto capture the delegate strongly at callback entry to avoid use-after-free during event handling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| ios/WebSocket/WebSocketWrapper.swift | Changes delegate handling in didReceive to prevent delegate deallocation mid-callback. |
| ios/WebSocket/WebSocketManager.swift | Adds a serial queue and wraps dictionary reads/writes to avoid concurrent access races. |
Comments suppressed due to low confidence (1)
ios/WebSocket/WebSocketManager.swift:82
invalidateClientforce unwrapsurl.host!when accessing/deleting the client certificate. If the URL lacks a host, this will crash even thoughURL(string:)may have succeeded upstream. Preferguard let host = url.host else { ... }and skip keychain operations (or report an error) when no host is available.
if let _ = try? Keychain.getClientIdentityAndCertificate(for: url.host!) {
do {
try Keychain.deleteClientP12(for: url.host!)
} catch {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public func didReceive(event: WebSocketEvent, client: Starscream.WebSocketClient) { | ||
|
|
||
| guard let url = client.url() else { return } | ||
| guard let url = client.url(), let delegate = self.delegate else { return } | ||
|
|
||
| switch event { | ||
| case .connected(let headers): | ||
| delegate?.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["OPEN"]!]) | ||
| delegate?.sendEvent(name: WSEvents.OPEN_EVENT.rawValue, result: ["url": url, "message": ["headers": headers]]) | ||
| delegate.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["OPEN"]!]) | ||
| delegate.sendEvent(name: WSEvents.OPEN_EVENT.rawValue, result: ["url": url, "message": ["headers": headers]]) | ||
| errorCounter.removeValue(forKey: url) | ||
| case .disconnected(let reason, let code): | ||
| delegate?.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!]) | ||
| delegate?.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url, "message": ["reason": reason, "code": code]]) | ||
| delegate.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!]) | ||
| delegate.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url, "message": ["reason": reason, "code": code]]) | ||
| case .text(let text): | ||
| if let data = text.data(using: .utf16), let json = JSON(data).dictionaryObject { | ||
| delegate?.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": json]) | ||
| delegate.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": json]) | ||
| } else { | ||
| delegate?.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": text]) | ||
| delegate.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": text]) | ||
| } | ||
| case .cancelled: | ||
| delegate?.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!]) | ||
| delegate?.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url]) | ||
| delegate.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!]) | ||
| delegate.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url]) | ||
| client.disconnect(closeCode: 1001) |
There was a problem hiding this comment.
didReceive now returns early when delegate is nil. Previously, the switch still ran (e.g., .cancelled would still call client.disconnect and .connected would still clear errorCounter) even if no events could be emitted. With the new guard, those non-delegate side effects are skipped, which can leave sockets undisconnected / state uncleared during teardown. Consider only guarding url, capturing let delegate = self.delegate as an optional strong local, and using optional sends (delegate?.sendEvent(...)) while still executing the rest of the switch logic.
📝 WalkthroughWalkthroughMade WebSocketManager's Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Manager as WebSocketManager
participant Socket as WebSocket
rect rgba(100,150,250,0.5)
Client->>Manager: createWebSocket(url)
Manager->>Manager: sync check webSockets[url]
alt existing socket present
Manager->>Socket: update delegate
Manager-->>Client: return existing Socket
else no socket
Manager->>Socket: create new WebSocket (outside lock)
Manager->>Manager: sync reconcile
alt another inserted concurrently
Manager->>Socket: clear new socket delegate
Manager->>Socket: use existing socket (delegate updated)
Manager-->>Client: return existing Socket
else
Manager->>Manager: store new Socket
Manager-->>Client: return new Socket
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ios/WebSocket/WebSocketManager.swift`:
- Around line 57-64: In the queue.sync critical section that currently checks
webSockets[url] and rebinds delegates, change the logic to detect an existing
socket and atomically reject/throw the duplicate create instead of mutating the
existing socket or discarding the new one; do not set existing.delegate =
delegate or webSocket.delegate = nil there. Keep reuse behavior only inside
ensureClientFor(...); ensure the create path honors the caller's options by
leaving the new webSocket intact and returning an error when webSockets[url]
already exists.
In `@ios/WebSocket/WebSocketWrapper.swift`:
- Around line 152-179: The handler currently short-circuits all event handling
by making delegate a required guard; change it to keep the existing URL guard
but capture delegate into a strong local (e.g., let strongDelegate =
self.delegate) so internal logic (errorCounter updates,
client.disconnect(closeCode:), etc.) always runs even if delegate is nil, and
replace direct delegate.sendEvent(...) calls with optional calls
(strongDelegate?.sendEvent(...)) so sending events is skipped but cleanup still
executes; update cases .connected (ensure errorCounter.removeValue(forKey: url)
always runs), .cancelled (ensure client.disconnect(closeCode: 1001) always
runs), and .error (ensure errorCounter increment and READY_STATE/CLOSE_EVENT
logic run regardless of delegate presence) and keep references to symbols
WSEvents, READY_STATE, errorCounter, client.disconnect, and URL extraction
intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5c4a5a18-e3d8-4cf8-b11a-3ab2ee867f80
📒 Files selected for processing (2)
ios/WebSocket/WebSocketManager.swiftios/WebSocket/WebSocketWrapper.swift
| let snapshot: [WebSocket] = queue.sync { | ||
| let values = Array(webSockets.values) | ||
| webSockets.removeAll() | ||
| return values | ||
| } | ||
| for ws in snapshot { | ||
| ws.disconnect() | ||
| } | ||
| } |
There was a problem hiding this comment.
This is throwing me off, why do you need to make a copy?
There was a problem hiding this comment.
when I asked AI about it (I'm basically a swift noob) he replied that it was to avoid the deadlock:
The copy is needed to avoid a deadlock. ws.disconnect() can trigger Starscream delegate callbacks that call back into the manager (e.g., getWebSocket),
which would try to acquire the same serial queue via queue.sync. If we iterated inside the lock, that re-entrant queue.sync would deadlock. So we clear the
dictionary and grab the values inside the lock, then iterate outside it.
Which sounded fine. But if you think this is not needed, I'll be happy to fix it, but I need an expert eye on there
- Restore synchronized early-return in createWebSocket to skip expensive option parsing and keychain work when a socket already exists - Move dictionary removal in invalidateClient back to after disconnect and cleanup, so the entry is only removed on success - Change didReceive delegate capture from required guard to optional strong local so internal side effects (errorCounter cleanup, client.disconnect on cancel) still execute when delegate is nil Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ios/WebSocket/WebSocketManager.swift (1)
80-96:⚠️ Potential issue | 🔴 CriticalRemove the registry entry atomically before teardown.
Line 81 reads the socket and Line 96 removes it later, so a concurrent
createWebSocket(for:)can still see the stale entry, rebind its delegate, and then lose the socket when this cleanup finishes.invalidateContext()already uses remove-first semantics;invalidateClient(for:)should do the sameremoveValue(forKey:)inside onequeue.sync, then clean up the returned socket outside the lock.Suggested fix
func invalidateClient(for url:URL) -> Void { - guard let webSocket = getWebSocket(for: url) else { + guard let webSocket = queue.sync(execute: { webSockets.removeValue(forKey: url) }) else { return } if let _ = try? Keychain.getClientIdentityAndCertificate(for: url.host!) { do { @@ } webSocket.forceDisconnect() webSocket.delegate = nil - queue.sync { webSockets.removeValue(forKey: url) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ios/WebSocket/WebSocketManager.swift` around lines 80 - 96, The invalidateClient(for:) function currently reads the socket with getWebSocket(for:) and only removes the entry later, allowing a race with createWebSocket(for:); change it to perform remove-first semantics by calling queue.sync { let socket = webSockets.removeValue(forKey: url) } (using the same queue and webSockets dictionary used elsewhere, e.g. invalidateContext()) and then perform Keychain deletion, webSocket.forceDisconnect() and webSocket.delegate = nil on the returned socket outside the lock; this ensures the registry entry is removed atomically before teardown and prevents a concurrent createWebSocket(from:) from rebinding a stale socket.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@ios/WebSocket/WebSocketManager.swift`:
- Around line 80-96: The invalidateClient(for:) function currently reads the
socket with getWebSocket(for:) and only removes the entry later, allowing a race
with createWebSocket(for:); change it to perform remove-first semantics by
calling queue.sync { let socket = webSockets.removeValue(forKey: url) } (using
the same queue and webSockets dictionary used elsewhere, e.g.
invalidateContext()) and then perform Keychain deletion,
webSocket.forceDisconnect() and webSocket.delegate = nil on the returned socket
outside the lock; this ensures the registry entry is removed atomically before
teardown and prevents a concurrent createWebSocket(from:) from rebinding a stale
socket.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 93e1f365-00cc-43f2-a17a-5f23002f3a30
📒 Files selected for processing (2)
ios/WebSocket/WebSocketManager.swiftios/WebSocket/WebSocketWrapper.swift
🚧 Files skipped from review as they are similar to previous changes (1)
- ios/WebSocket/WebSocketWrapper.swift
|
@enahum I think I addressed all, can you take a look again? |
Summary
this is the second part of the websocket crash. See: mattermost/mattermost-mobile#9639
fix(ios): synchronize WebSocket dictionary access and prevent delegate use-after-free
Add a serial dispatch queue to protect the
webSocketsdictionary in WebSocketManager from concurrent access across URLSession delegate callbacks, main queue, and the RN bridge. All reads/writes are now wrapped inqueue.sync {}with atomic remove-and-return ininvalidateClientand atomic snapshot+removeAll ininvalidateContextto eliminate TOCTOU races.In WebSocketWrapper.didReceive, capture the weak
self.delegateinto a strong local at method entry so the delegate cannot deallocate mid-callback, preventing EXC_BAD_ACCESS.Addresses Sentry issues MATTERMOST-MOBILE-IOS-XHW and MATTERMOST-MOBILE-IOS-YQG.
Once this is done, we'll need to update the dependencies on the app
Ticket Link
MM-67999