Skip to content

[MM-67999] fix websocket crash#163

Open
Willyfrog wants to merge 2 commits intomasterfrom
MM_67999_websocket_crash
Open

[MM-67999] fix websocket crash#163
Willyfrog wants to merge 2 commits intomasterfrom
MM_67999_websocket_crash

Conversation

@Willyfrog
Copy link
Copy Markdown

@Willyfrog Willyfrog commented Mar 26, 2026

Summary

this is the second part of the websocket crash. See: mattermost/mattermost-mobile#9639

fix(ios): synchronize WebSocket dictionary access and prevent delegate use-after-free

Add a serial dispatch queue to protect the webSockets dictionary in WebSocketManager from concurrent access across URLSession delegate callbacks, main queue, and the RN bridge. All reads/writes are now wrapped in queue.sync {} with atomic remove-and-return in invalidateClient and atomic snapshot+removeAll in invalidateContext to eliminate TOCTOU races.

In WebSocketWrapper.didReceive, capture the weak self.delegate into a strong local at method entry so the delegate cannot deallocate mid-callback, preventing EXC_BAD_ACCESS.

Addresses Sentry issues MATTERMOST-MOBILE-IOS-XHW and MATTERMOST-MOBILE-IOS-YQG.

Once this is done, we'll need to update the dependencies on the app

Ticket Link

MM-67999

…e use-after-free

Add a serial dispatch queue to protect the `webSockets` dictionary in
WebSocketManager from concurrent access across URLSession delegate
callbacks, main queue, and the RN bridge. All reads/writes are now
wrapped in `queue.sync {}` with atomic remove-and-return in
`invalidateClient` and atomic snapshot+removeAll in `invalidateContext`
to eliminate TOCTOU races.

In WebSocketWrapper.didReceive, capture the weak `self.delegate` into
a strong local at method entry so the delegate cannot deallocate
mid-callback, preventing EXC_BAD_ACCESS.

Addresses Sentry issues MATTERMOST-MOBILE-IOS-XHW and
MATTERMOST-MOBILE-IOS-YQG.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Willyfrog Willyfrog requested review from Copilot and enahum March 26, 2026 18:17
@Willyfrog Willyfrog self-assigned this Mar 26, 2026
@Willyfrog Willyfrog added the 2: Dev Review Requires review by a core committer label Mar 26, 2026
@Willyfrog Willyfrog changed the title [MM_67999] fix websocket crash [MM-67999] fix websocket crash Mar 26, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix iOS WebSocket crashes by addressing concurrency issues in WebSocketManager and a potential delegate lifetime race in WebSocketWrapper during Starscream callbacks.

Changes:

  • Add a serial DispatchQueue to synchronize access to the webSockets dictionary and make remove/snapshot operations atomic.
  • Update WebSocketWrapper.didReceive to capture the delegate strongly at callback entry to avoid use-after-free during event handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ios/WebSocket/WebSocketWrapper.swift Changes delegate handling in didReceive to prevent delegate deallocation mid-callback.
ios/WebSocket/WebSocketManager.swift Adds a serial queue and wraps dictionary reads/writes to avoid concurrent access races.
Comments suppressed due to low confidence (1)

ios/WebSocket/WebSocketManager.swift:82

  • invalidateClient force unwraps url.host! when accessing/deleting the client certificate. If the URL lacks a host, this will crash even though URL(string:) may have succeeded upstream. Prefer guard let host = url.host else { ... } and skip keychain operations (or report an error) when no host is available.
        if let _ = try? Keychain.getClientIdentityAndCertificate(for: url.host!) {
            do {
                try Keychain.deleteClientP12(for: url.host!)
            } catch {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 151 to 171
public func didReceive(event: WebSocketEvent, client: Starscream.WebSocketClient) {

guard let url = client.url() else { return }
guard let url = client.url(), let delegate = self.delegate else { return }

switch event {
case .connected(let headers):
delegate?.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["OPEN"]!])
delegate?.sendEvent(name: WSEvents.OPEN_EVENT.rawValue, result: ["url": url, "message": ["headers": headers]])
delegate.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["OPEN"]!])
delegate.sendEvent(name: WSEvents.OPEN_EVENT.rawValue, result: ["url": url, "message": ["headers": headers]])
errorCounter.removeValue(forKey: url)
case .disconnected(let reason, let code):
delegate?.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!])
delegate?.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url, "message": ["reason": reason, "code": code]])
delegate.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!])
delegate.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url, "message": ["reason": reason, "code": code]])
case .text(let text):
if let data = text.data(using: .utf16), let json = JSON(data).dictionaryObject {
delegate?.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": json])
delegate.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": json])
} else {
delegate?.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": text])
delegate.sendEvent(name: WSEvents.MESSAGE_EVENT.rawValue, result: ["url": url, "message": text])
}
case .cancelled:
delegate?.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!])
delegate?.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url])
delegate.sendEvent(name: WSEvents.READY_STATE_EVENT.rawValue, result: ["url": url, "message": READY_STATE["CLOSED"]!])
delegate.sendEvent(name: WSEvents.CLOSE_EVENT.rawValue, result: ["url": url])
client.disconnect(closeCode: 1001)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didReceive now returns early when delegate is nil. Previously, the switch still ran (e.g., .cancelled would still call client.disconnect and .connected would still clear errorCounter) even if no events could be emitted. With the new guard, those non-delegate side effects are skipped, which can leave sockets undisconnected / state uncleared during teardown. Consider only guarding url, capturing let delegate = self.delegate as an optional strong local, and using optional sends (delegate?.sendEvent(...)) while still executing the rest of the switch logic.

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

Made WebSocketManager's webSockets dictionary thread-safe with a private serial DispatchQueue, refactored socket creation/reuse and context invalidation to use synchronized snapshot/replace patterns, and changed WebSocketWrapper to take a local snapshot of its delegate when handling incoming events.

Changes

Cohort / File(s) Summary
WebSocket Manager (core logic)
ios/WebSocket/WebSocketManager.swift
Made webSockets private and synchronized all accesses with a private serial DispatchQueue. Reworked createWebSocket to perform lookup/reuse and delegate reassignment inside a synchronized block, add a second synchronized reconciliation after creating a socket to handle races, replaced disconnectAll() with an atomic snapshot-and-clear pattern in invalidateContext.
WebSocket Wrapper (delegate snapshot)
ios/WebSocket/WebSocketWrapper.swift
In didReceive(event:client:) take a local snapshot let delegate = self.delegate (after validating client.url()) and use that snapshot when dispatching events to avoid repeated property reads. No signature or public API changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Manager as WebSocketManager
    participant Socket as WebSocket

    rect rgba(100,150,250,0.5)
    Client->>Manager: createWebSocket(url)
    Manager->>Manager: sync check webSockets[url]
    alt existing socket present
        Manager->>Socket: update delegate
        Manager-->>Client: return existing Socket
    else no socket
        Manager->>Socket: create new WebSocket (outside lock)
        Manager->>Manager: sync reconcile
        alt another inserted concurrently
            Manager->>Socket: clear new socket delegate
            Manager->>Socket: use existing socket (delegate updated)
            Manager-->>Client: return existing Socket
        else
            Manager->>Manager: store new Socket
            Manager-->>Client: return new Socket
        end
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing a WebSocket crash in iOS by implementing thread-safe dictionary access and delegate handling.
Description check ✅ Passed The description is directly related to the changeset, detailing the specific fixes for WebSocket crashes including synchronization and delegate lifecycle management.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MM_67999_websocket_crash

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ios/WebSocket/WebSocketManager.swift`:
- Around line 57-64: In the queue.sync critical section that currently checks
webSockets[url] and rebinds delegates, change the logic to detect an existing
socket and atomically reject/throw the duplicate create instead of mutating the
existing socket or discarding the new one; do not set existing.delegate =
delegate or webSocket.delegate = nil there. Keep reuse behavior only inside
ensureClientFor(...); ensure the create path honors the caller's options by
leaving the new webSocket intact and returning an error when webSockets[url]
already exists.

In `@ios/WebSocket/WebSocketWrapper.swift`:
- Around line 152-179: The handler currently short-circuits all event handling
by making delegate a required guard; change it to keep the existing URL guard
but capture delegate into a strong local (e.g., let strongDelegate =
self.delegate) so internal logic (errorCounter updates,
client.disconnect(closeCode:), etc.) always runs even if delegate is nil, and
replace direct delegate.sendEvent(...) calls with optional calls
(strongDelegate?.sendEvent(...)) so sending events is skipped but cleanup still
executes; update cases .connected (ensure errorCounter.removeValue(forKey: url)
always runs), .cancelled (ensure client.disconnect(closeCode: 1001) always
runs), and .error (ensure errorCounter increment and READY_STATE/CLOSE_EVENT
logic run regardless of delegate presence) and keep references to symbols
WSEvents, READY_STATE, errorCounter, client.disconnect, and URL extraction
intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c4a5a18-e3d8-4cf8-b11a-3ab2ee867f80

📥 Commits

Reviewing files that changed from the base of the PR and between f599964 and e1488f1.

📒 Files selected for processing (2)
  • ios/WebSocket/WebSocketManager.swift
  • ios/WebSocket/WebSocketWrapper.swift

Comment on lines +93 to 101
let snapshot: [WebSocket] = queue.sync {
let values = Array(webSockets.values)
webSockets.removeAll()
return values
}
for ws in snapshot {
ws.disconnect()
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is throwing me off, why do you need to make a copy?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I asked AI about it (I'm basically a swift noob) he replied that it was to avoid the deadlock:

The copy is needed to avoid a deadlock. ws.disconnect() can trigger Starscream delegate callbacks that call back into the manager (e.g., getWebSocket),
which would try to acquire the same serial queue via queue.sync. If we iterated inside the lock, that re-entrant queue.sync would deadlock. So we clear the
dictionary and grab the values inside the lock, then iterate outside it.

Which sounded fine. But if you think this is not needed, I'll be happy to fix it, but I need an expert eye on there

- Restore synchronized early-return in createWebSocket to skip expensive
  option parsing and keychain work when a socket already exists
- Move dictionary removal in invalidateClient back to after disconnect
  and cleanup, so the entry is only removed on success
- Change didReceive delegate capture from required guard to optional
  strong local so internal side effects (errorCounter cleanup,
  client.disconnect on cancel) still execute when delegate is nil

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ios/WebSocket/WebSocketManager.swift (1)

80-96: ⚠️ Potential issue | 🔴 Critical

Remove the registry entry atomically before teardown.

Line 81 reads the socket and Line 96 removes it later, so a concurrent createWebSocket(for:) can still see the stale entry, rebind its delegate, and then lose the socket when this cleanup finishes. invalidateContext() already uses remove-first semantics; invalidateClient(for:) should do the same removeValue(forKey:) inside one queue.sync, then clean up the returned socket outside the lock.

Suggested fix
 func invalidateClient(for url:URL) -> Void {
-    guard let webSocket = getWebSocket(for: url) else {
+    guard let webSocket = queue.sync(execute: { webSockets.removeValue(forKey: url) }) else {
         return
     }
 
     if let _ = try? Keychain.getClientIdentityAndCertificate(for: url.host!) {
         do {
@@
     }
     webSocket.forceDisconnect()
     webSocket.delegate = nil
-    queue.sync { webSockets.removeValue(forKey: url) }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ios/WebSocket/WebSocketManager.swift` around lines 80 - 96, The
invalidateClient(for:) function currently reads the socket with
getWebSocket(for:) and only removes the entry later, allowing a race with
createWebSocket(for:); change it to perform remove-first semantics by calling
queue.sync { let socket = webSockets.removeValue(forKey: url) } (using the same
queue and webSockets dictionary used elsewhere, e.g. invalidateContext()) and
then perform Keychain deletion, webSocket.forceDisconnect() and
webSocket.delegate = nil on the returned socket outside the lock; this ensures
the registry entry is removed atomically before teardown and prevents a
concurrent createWebSocket(from:) from rebinding a stale socket.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@ios/WebSocket/WebSocketManager.swift`:
- Around line 80-96: The invalidateClient(for:) function currently reads the
socket with getWebSocket(for:) and only removes the entry later, allowing a race
with createWebSocket(for:); change it to perform remove-first semantics by
calling queue.sync { let socket = webSockets.removeValue(forKey: url) } (using
the same queue and webSockets dictionary used elsewhere, e.g.
invalidateContext()) and then perform Keychain deletion,
webSocket.forceDisconnect() and webSocket.delegate = nil on the returned socket
outside the lock; this ensures the registry entry is removed atomically before
teardown and prevents a concurrent createWebSocket(from:) from rebinding a stale
socket.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 93e1f365-00cc-43f2-a17a-5f23002f3a30

📥 Commits

Reviewing files that changed from the base of the PR and between e1488f1 and 6d2e548.

📒 Files selected for processing (2)
  • ios/WebSocket/WebSocketManager.swift
  • ios/WebSocket/WebSocketWrapper.swift
🚧 Files skipped from review as they are similar to previous changes (1)
  • ios/WebSocket/WebSocketWrapper.swift

@Willyfrog Willyfrog requested a review from enahum March 26, 2026 21:19
@Willyfrog
Copy link
Copy Markdown
Author

@enahum I think I addressed all, can you take a look again?

@Willyfrog Willyfrog requested a review from larkox April 7, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2: Dev Review Requires review by a core committer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants