Skip to content

fix(breakpad): avoid stdio deadlock in macOS exception handler#1656

Open
jpnurmi wants to merge 2 commits intomasterfrom
jpnurmi/fix/signal-safe-log
Open

fix(breakpad): avoid stdio deadlock in macOS exception handler#1656
jpnurmi wants to merge 2 commits intomasterfrom
jpnurmi/fix/signal-safe-log

Conversation

@jpnurmi
Copy link
Copy Markdown
Collaborator

@jpnurmi jpnurmi commented Apr 16, 2026

On macOS, breakpad's Mach exception handler runs while all other threads are suspended by the kernel. Any vfprintf-based log call from the handler can deadlock waiting for a stdio file lock held by a frozen thread — this is what is currently hanging macOS 14 (xcode llvm) CI on #1650 after test_cache_max_items_with_retry[inproc].

Default enable_logging_when_crashed to off for the breakpad backend on macOS so the existing sentry__logger_disable() path in the callback fires by default. Users who accept the risk can still opt in via sentry_options_set_logger_enabled_when_crashed(opts, 1).

Share the async-signal-safe SENTRY_SIGNAL_SAFE_LOG macro via sentry_logger.h (previously private to the inproc backend) and convert the five explicit log sites in the breakpad callback plus the crash-safe flush helpers in sentry_metrics, sentry_logs, and sentry_batcher. These sites use write(STDERR_FILENO, ...) directly, so they stay visible during crash handling regardless of the logger state.

Captured stack from a local repro (8x CPU yes load, 4-iteration log cache-keep crash loop against a TCP-RST DSN):

Thread sentry-http (suspended):
    curl_easy_perform -> Curl_verboseconnect -> Curl_debug -> fwrite
        -> __sfvwrite -> __write_nocancel        (holds stderr flockfile)

Thread breakpad handler (blocked):
    breakpad_backend_callback -> sentry__logger_log
        -> sentry__logger_defaultlogger -> vfprintf -> flockfile
            -> __psynch_mutexwait                 (waits forever)

sentry_options_set_debug(true) (via the log example arg used in most cache tests) enables CURLOPT_VERBOSE, which is what reliably catches the worker mid-fwrite on stderr when the crash fires.

A related deadlock class still exists inside breakpad's own MinidumpGenerator (operator new -> mfm_alloc) when another thread is suspended mid-allocation. That is outside sentry-native's reach and not addressed here.

jpnurmi and others added 2 commits April 16, 2026 18:01
On macOS, breakpad uses an in-process Mach exception server: when the
process crashes, the kernel suspends all threads of the task and delivers
the exception to breakpad's handler thread. Any stdio lock (e.g. the
stderr file lock acquired by `flockfile` inside `vfprintf`/`fwrite`) held
by a thread that was suspended at crash time is never released — so any
`vfprintf`-based logging from the handler deadlocks the process.

A repro sample captured in CI shows the pattern:

  sentry-http (worker, suspended):
    curl_easy_perform -> Curl_verboseconnect -> Curl_debug -> fwrite
      -> __sfvwrite -> __write_nocancel    (holding stderr flockfile)
  breakpad exception handler (running):
    breakpad_backend_callback -> sentry__logger_log
      -> sentry__logger_defaultlogger -> vfprintf -> flockfile
        -> __psynch_mutexwait            (blocked forever)

The trigger in our tests is `sentry_options_set_debug(true)`, which
enables `CURLOPT_VERBOSE` and routes libcurl's verbose output to stderr
via `fwrite`. Under load (CI macos-14 runner, local repro under 8x CPU
stress) the worker is reliably caught mid-`fwrite` when the main thread
crashes.

Two changes address this:

1. Default `enable_logging_when_crashed` to off for the breakpad backend
   on macOS, so the callback's `sentry__logger_disable()` fires by
   default and no reachable `vfprintf` call is made. Users who accept
   the risk can still opt in by calling
   `sentry_options_set_logger_enabled_when_crashed(opts, 1)`.

2. Move the signal-safe log macro from `sentry_backend_inproc.c` into
   `sentry_logger.h` and convert the five explicit log sites in the
   breakpad callback, plus the crash-safe flush helpers in
   `sentry_metrics`, `sentry_logs`, and `sentry_batcher`, to use it.
   `SENTRY_SIGNAL_SAFE_LOG` writes a static string directly via
   `write(STDERR_FILENO, ...)`, bypassing the stdio lock entirely — so
   those lines remain visible during crash handling even when the
   regular logger is disabled.

A related class of deadlock still exists in breakpad's own
`MinidumpGenerator` (`operator new` → `mfm_alloc`) when another thread
is suspended mid-allocation; that is outside sentry-native's control
and not addressed here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant