Skip to content

feat: iOS/macOS fatal hang detection#52

Draft
bobbyg603 wants to merge 4 commits intomainfrom
feat/ios-hang-detection
Draft

feat: iOS/macOS fatal hang detection#52
bobbyg603 wants to merge 4 commits intomainfrom
feat/ios-hang-detection

Conversation

@bobbyg603
Copy link
Copy Markdown
Member

Closes #51.

Summary

Adds opt-in fatal hang detection for iOS, tvOS, and macOS. When the main thread is blocked past a threshold and the app is subsequently terminated without recovering - launch/resume watchdog kills, background task expirations, user force-quit - a hang report is uploaded on the next launch using the same pipeline as crash reports.

Non-fatal hangs are intentionally not reported in v1: if the main thread resumes after detection, the persisted report is deleted. The architecture leaves a clean path to add non-fatal reporting later without breaking backend grouping.

Public API

Single new property on BugSplat:

@property (nonatomic, assign) BOOL enableHangDetection; // default NO

Implementation

  • BugSplatHangTracker (new class): CFRunLoopObserver on the main runloop plus a dedicated NSThread watchdog that polls every threshold / 5 seconds. Observer records the start of each main-thread processing window; watchdog fires the delegate if processing exceeds the threshold.
  • Guards: debugger attached, UIApplicationStateActive, no app extensions, wall-clock suspension guard (skip when we overslept by more than the threshold - typically device sleep / process suspension), single-report-per-window throttle.
  • On detection, BugSplat captures a live PLCrashReporter report with a synthetic App Hang (Fatal) exception and reason Main thread unresponsive for {N} ms, persists it alongside existing crash reports, and attaches structured attributes:
    • bugsplat-hang-duration-ms
    • bugsplat-hang-detected-at
    • bugsplat-hang-app-state
    • bugsplat-hang-launch-id (shared with any same-launch crash report for correlation in the dashboard)
  • On recovery, the persisted report is deleted.
  • Existing next-launch pending-crash scanner uploads any leftover hang reports through the standard 3-step S3 flow - no new crashTypeId needed; backend differentiates by the exception name in the report.

Threshold hardcoded at 2.0s for v1. Configurable threshold, pause/resume APIs, and non-fatal reporting are explicit future scope.

Tests

17 new tests, all passing on macOS and iOS:

  • BugSplatHangTrackerTests (11): init, start/stop lifecycle, detection, debugger guard, background guard, recovery, throttle.
  • BugSplatHangPersistenceTests (6): verifies .crash + .meta files are written; exception name + duration string appear in the report text; metadata carries database, userSubmitted, and bugsplat-hang-* attributes; recovery deletes the persisted files.

Sample apps

enableHangDetection = YES and a Simulate Hang button/command added to all workspace-linked samples:

  • BugSplatTest-SwiftUI
  • BugSplatTest-UIKit-Swift
  • BugSplatTest-UIKit-ObjC
  • BugSplatTest-macOS-UIKit-ObjC
  • BugSplatTest-macOS-Tool-CPlusPlus (new hang REPL command)

BugSplatTest-SwiftUI-SPM is intentionally not updated - it consumes the published xcframework pinned in Package.swift (v3.1.2), which predates this feature. It should be updated after the next release.

Demo flow

  1. Run a sample.
  2. Tap Simulate Hang - UI freezes for 4 seconds.
  3. Force-quit during the freeze (app switcher swipe-up on iOS, \u2318Q on macOS, Ctrl-C on the CLI).
  4. Relaunch - a fatal-hang report uploads with exception name App Hang (Fatal).

If the 4 seconds pass without force-quitting, the persisted report is discarded (main recovered) - consistent with the v1 fatal-only policy.

Test plan

  • BugSplatMacTests - all tests pass on macOS.
  • BugSplatIOSTests - all tests pass on iOS simulator.
  • BugSplat iOS + macOS framework targets build clean.
  • All workspace-linked sample apps compile against the modified framework.
  • Manually verify on a real device: launch sample, tap Simulate Hang, force-quit, relaunch, confirm report appears in the dashboard.
  • Verify same-launch hang + crash correlation via bugsplat-hang-launch-id attribute in the dashboard.
  • Coordinate backend grouping rules so App Hang (Fatal) reports group cleanly separate from crashes.

Out of scope

  • Non-fatal hang reporting (deliberately deferred; architecture supports adding this without breaking the App Hang (Fatal) grouping).
  • bugsplat-android parity (separate issue).
  • Unreal plugin wiring (tracked in bugsplat-unreal).
  • Configurable threshold and pause/resume APIs.
  • Frame-tracker / CADisplayLink based detection.

\U0001F916 Generated with Claude Code

bobbyg603 and others added 4 commits April 18, 2026 10:33
Adds opt-in fatal hang detection. When the main thread is blocked past a
threshold and the app is subsequently terminated without recovering, a
hang report is uploaded on the next launch using the same pipeline as
crash reports.

- BugSplatHangTracker: CFRunLoopObserver + dedicated watchdog thread,
  with debugger / app-active / app-extension guards, wall-clock
  suspension guard, and a single-report-per-window throttle.
- On detection, captures a live PLCrashReporter report with a synthetic
  "App Hang (Fatal)" exception and persists it (plus metadata +
  bugsplat-hang-* attributes) to the crashes directory.
- On recovery, the persisted report is deleted - non-fatal hangs are not
  reported in v1.
- New BugSplat.enableHangDetection property (default NO).
- Demo button added to all workspace-linked sample apps. SPM sample
  pending next release.
- Tests: 11 unit tests for the tracker, 6 integration tests for
  persistence + recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 4-second hang in the samples let the main thread recover before
most testers could force-quit, which caused the persisted report to be
deleted (non-fatal) and gave the impression the demo did not work.

- Hang loop now runs forever; the only way out is force-quit, which is
  the exact scenario the feature exists to catch.
- Added a confirmation dialog (UIAlertController / NSAlert / SwiftUI
  .alert / console prompt) explaining what is about to happen, so the
  freeze is not mistaken for a broken app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous isApplicationActive check did dispatch_sync onto the main
queue when called from the watchdog thread. That works while main is
healthy but deadlocks the watchdog the moment main is actually hung -
which is precisely the scenario the feature exists to catch. Result: no
hang report ever got persisted for real fatal hangs, only for hangs the
app recovered from.

Track the foreground-active state via UIApplicationDidBecomeActive /
WillResignActive / DidEnterBackground / WillEnterForeground
notifications, update an atomic bool on the main thread, and read that
value lock-free from the watchdog poll. Idempotent observer
installation gated on the iOS/tvOS target.

Verified end-to-end on the iOS simulator: hang is detected, .crash +
.meta land on disk, and on relaunch the persisted report uploads
successfully while the UI loads normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The iOS Simulator's app-switcher swipe-up gesture only backgrounds the
app instead of SIGKILL'ing it like on device, so the hang appears to
persist across "relaunches." Update each sample's hang-confirmation
dialog to tell the tester how to actually terminate the hung process:
\`xcrun simctl terminate\` on iOS simulator, Cmd+Option+Esc or
\`killall -9\` on macOS. No behavior change - copy only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add iOS hang detection

1 participant