fix(breakpad): avoid stdio deadlock in macOS exception handler#1656
Open
fix(breakpad): avoid stdio deadlock in macOS exception handler#1656
Conversation
On macOS, breakpad uses an in-process Mach exception server: when the
process crashes, the kernel suspends all threads of the task and delivers
the exception to breakpad's handler thread. Any stdio lock (e.g. the
stderr file lock acquired by `flockfile` inside `vfprintf`/`fwrite`) held
by a thread that was suspended at crash time is never released — so any
`vfprintf`-based logging from the handler deadlocks the process.
A repro sample captured in CI shows the pattern:
sentry-http (worker, suspended):
curl_easy_perform -> Curl_verboseconnect -> Curl_debug -> fwrite
-> __sfvwrite -> __write_nocancel (holding stderr flockfile)
breakpad exception handler (running):
breakpad_backend_callback -> sentry__logger_log
-> sentry__logger_defaultlogger -> vfprintf -> flockfile
-> __psynch_mutexwait (blocked forever)
The trigger in our tests is `sentry_options_set_debug(true)`, which
enables `CURLOPT_VERBOSE` and routes libcurl's verbose output to stderr
via `fwrite`. Under load (CI macos-14 runner, local repro under 8x CPU
stress) the worker is reliably caught mid-`fwrite` when the main thread
crashes.
Two changes address this:
1. Default `enable_logging_when_crashed` to off for the breakpad backend
on macOS, so the callback's `sentry__logger_disable()` fires by
default and no reachable `vfprintf` call is made. Users who accept
the risk can still opt in by calling
`sentry_options_set_logger_enabled_when_crashed(opts, 1)`.
2. Move the signal-safe log macro from `sentry_backend_inproc.c` into
`sentry_logger.h` and convert the five explicit log sites in the
breakpad callback, plus the crash-safe flush helpers in
`sentry_metrics`, `sentry_logs`, and `sentry_batcher`, to use it.
`SENTRY_SIGNAL_SAFE_LOG` writes a static string directly via
`write(STDERR_FILENO, ...)`, bypassing the stdio lock entirely — so
those lines remain visible during crash handling even when the
regular logger is disabled.
A related class of deadlock still exists in breakpad's own
`MinidumpGenerator` (`operator new` → `mfm_alloc`) when another thread
is suspended mid-allocation; that is outside sentry-native's control
and not addressed here.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On macOS, breakpad's Mach exception handler runs while all other threads are suspended by the kernel. Any
vfprintf-based log call from the handler can deadlock waiting for a stdio file lock held by a frozen thread — this is what is currently hangingmacOS 14 (xcode llvm)CI on #1650 aftertest_cache_max_items_with_retry[inproc].Default
enable_logging_when_crashedto off for the breakpad backend on macOS so the existingsentry__logger_disable()path in the callback fires by default. Users who accept the risk can still opt in viasentry_options_set_logger_enabled_when_crashed(opts, 1).Share the async-signal-safe
SENTRY_SIGNAL_SAFE_LOGmacro viasentry_logger.h(previously private to the inproc backend) and convert the five explicit log sites in the breakpad callback plus the crash-safe flush helpers insentry_metrics,sentry_logs, andsentry_batcher. These sites usewrite(STDERR_FILENO, ...)directly, so they stay visible during crash handling regardless of the logger state.Captured stack from a local repro (8x CPU
yesload, 4-iterationlog cache-keep crashloop against a TCP-RST DSN):sentry_options_set_debug(true)(via thelogexample arg used in most cache tests) enablesCURLOPT_VERBOSE, which is what reliably catches the worker mid-fwriteon stderr when the crash fires.A related deadlock class still exists inside breakpad's own
MinidumpGenerator(operator new->mfm_alloc) when another thread is suspended mid-allocation. That is outside sentry-native's reach and not addressed here.