fix: prevent deadlocks when instrumenting with clippy-tracing#5852
fix: prevent deadlocks when instrumenting with clippy-tracing#5852snigenigmatic wants to merge 5 commits intofirecracker-microvm:mainfrom
Conversation
f80a97c to
1f061b5
Compare
There was a problem hiding this comment.
Pull request overview
This PR mitigates deadlock scenarios introduced when clippy-tracing instruments functions that may log while the global LOGGER mutex is held, and avoids instrumenting extern ABI functions (e.g., signal handlers / FFI callbacks) to prevent async-signal-unsafe logging paths.
Changes:
- Narrow
Logger::update()’s critical section by opening the log target file before taking theLOGGERmutex. - Update
clippy-tracing’s check/fix visitors to skip functions with an explicit ABI (extern ...). - Add integration tests validating
extern "C"functions are skipped in both check and fix modes (including insideimplblocks).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/vmm/src/logger/logging.rs |
Reorders file opening to avoid logging re-entrancy while holding the logger mutex. |
src/clippy-tracing/src/main.rs |
Adds an ABI guard to prevent instrumentation of extern functions. |
src/clippy-tracing/tests/integration_tests.rs |
Adds integration tests covering the new “skip extern” behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // re-entering the locked Logger. | ||
| let file = config | ||
| .log_path | ||
| .map(|p| open_file_nonblock(&p)) |
There was a problem hiding this comment.
config.log_path.map(|p| ...) moves log_path out of config, but config is used again below (level/show_level/show_log_origin/module). This will not compile due to a partial move. Consider borrowing the path (e.g., config.log_path.as_ref() / as_deref()) or destructuring config into locals before opening the file.
| .map(|p| open_file_nonblock(&p)) | |
| .as_ref() | |
| .map(|p| open_file_nonblock(p)) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5852 +/- ##
==========================================
+ Coverage 82.79% 82.83% +0.03%
==========================================
Files 276 276
Lines 29764 29770 +6
==========================================
+ Hits 24643 24659 +16
+ Misses 5121 5111 -10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Two deadlock sources exist when clippy-tracing instruments the codebase: 1. Logger::update() called open_file_nonblock() while holding the LOGGER mutex. When instrumented, that function calls log::trace!() which re-enters Logger::log() and attempts to acquire the same non-reentrant mutex, causing a deadlock on startup. 2. Signal handlers (extern C fns) were instrumented by clippy-tracing, adding mutex acquisitions in async-signal-unsafe contexts. When a signal interrupted a thread already holding the LOGGER mutex, the handler would deadlock trying to acquire it again. Fix (1) by moving the open_file_nonblock() call before the mutex is acquired in Logger::update(). Fix (2) by teaching clippy-tracing to skip functions with an explicit ABI (e.g. extern C), since these are typically signal handlers or FFI callbacks that must not acquire locks. Closes firecracker-microvm#5807 Signed-off-by: snigenigmatic <kaustubhc.1105@gmail.com>
Two deadlock sources exist when clippy-tracing instruments the codebase: 1. Logger::update() called open_file_nonblock() while holding the LOGGER mutex. When instrumented, that function calls log::trace!() which re-enters Logger::log() and attempts to acquire the same non-reentrant mutex, causing a deadlock on startup. 2. Signal handlers (extern C fns) were instrumented by clippy-tracing, adding mutex acquisitions in async-signal-unsafe contexts. When a signal interrupted a thread already holding the LOGGER mutex, the handler would deadlock trying to acquire it again. Fix (1) by moving the open_file_nonblock() call before the mutex is acquired in Logger::update(). Fix (2) by teaching clippy-tracing to skip functions with an explicit ABI (e.g. extern C), since these are typically signal handlers or FFI callbacks that must not acquire locks. Closes firecracker-microvm#5807 Signed-off-by: snigenigmatic <kaustubhc.1105@gmail.com>
4a10d4b to
7a25fc6
Compare
The refactored update() path that opens the file before acquiring the lock was not exercised by any unit test. Add a test that calls update() with a valid log_path and asserts the target is set on the logger. Signed-off-by: snigenigmatic <kaustubhc.1105@gmail.com>
Changes
src/vmm/src/logger/logging.rs: InLogger::update(), moved theopen_file_nonblock()call to before the LOGGER mutex is acquired. Previously, the file was opened while holding the lock, so whenopen_file_nonblockwas instrumented by clippy-tracing, the resultinglog::trace!()call would re-enterLogger::log()and attempt to acquire the same non-reentrantstd::sync::Mutex, deadlocking the process on startup.src/clippy-tracing/src/main.rs: Added ani.sig.abi.is_none()guard to both thecreate_check_visitor_function!andcreate_fix_visitor_function!macros. This prevents clippy-tracing from instrumentingextern "C"functions (signal handlers, FFI callbacks), which would introduce mutex acquisitions in async-signal-unsafe contexts. When a signal interrupted a thread already holding the LOGGER mutex (e.g. duringwarn!()in the vCPUpaused()state), the instrumented signal handler would deadlock trying to acquire it again.src/clippy-tracing/tests/integration_tests.rs: Added three integration tests (fix_skips_extern_c_fn,check_skips_extern_c_fn,fix_skips_extern_c_fn_in_impl) verifying thatextern "C"functions are correctly skipped during both fix and check actions, including insideimplblocks....
Reason
When running
clippy-tracing --action fixwith default settings, at least two deadlock scenarios arise (#5807):main_exec()callsLOGGER.update(config)which acquires the LOGGER mutex, then callsopen_file_nonblock(). When instrumented, that function callslog::trace!()which tries to re-acquire the same mutex on the same thread -- deadlock.warn!()(acquiring the LOGGER mutex) after receiving aResumeevent withimmediate_exit == 1. The RT signal sent bysend_event()can arrive mid-warn!(), and if thehandle_signalfunction was instrumented, it tries to acquire the LOGGER mutex from the signal handler context -- deadlock.These fixes eliminate both deadlock sources: (1) by narrowing the critical section in
Logger::update(), and (2) by ensuring signal handlers are never instrumented....
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.