Add libbacktrace fallback for stack traces on musl/Alpine#3581
Add libbacktrace fallback for stack traces on musl/Alpine#3581hanxizh9910 wants to merge 22 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3581 +/- ##
============================================
- Coverage 76.94% 76.69% -0.26%
============================================
Files 162 162
Lines 80656 80673 +17
============================================
- Hits 62058 61869 -189
- Misses 18598 18804 +206
🚀 New features to boost your workflow:
|
836b39d to
71ae67a
Compare
There was a problem hiding this comment.
This looks like a good change! Thanks for the fix 😊 I just had a couple ideas to simplify/document a little more.
I'm wondering why the stack trace in your testing was so short. Normally I'd expect it to be somewhat longer. Maybe if you built with CFLAGS="-g" there would be more info. I don't think this is too important though.
| bt_frame_state = backtrace_create_state(NULL, 0, bt_simple_error_cb, NULL); | ||
| } | ||
|
|
||
| static int valkey_backtrace(void **trace, int max_size) { |
There was a problem hiding this comment.
It might be worth adding a comment here explaining that we preallocate bt_frame_state because allocating is unsafe while handling an async signal (such as crash handling). This probably isn't immediately obvious and might help out the next person to read this code
There was a problem hiding this comment.
Totally agree! I just added the comment.
| #ifdef HAVE_BACKTRACE | ||
| #define BACKTRACE_MAX_SIZE 100 | ||
|
|
||
| #if defined(USE_LIBBACKTRACE) && !defined(HAVE_EXECINFO) |
There was a problem hiding this comment.
When USE_LIBBACKTRACE is enabled, we should use backtrace_simple() from libbacktrace for frame collection unconditionally — not just when execinfo.h is missing. Both call _Unwind_Backtrace() under the hood, so there's no benefit to keeping the glibc path as a special case. This simplifies the guard here to just #ifdef USE_LIBBACKTRACE
5e41066 to
c75b577
Compare
Hi, the build already includes -g by default(https://github.com/valkey-io/valkey/blob/unstable/src/Makefile#L142). The short trace is likely due to -03 optimization(https://github.com/valkey-io/valkey/blob/unstable/src/Makefile#L23) which inlines many functions |
rainsupreme
left a comment
There was a problem hiding this comment.
LGTM! Thanks for this improvement!
| /* Preallocate at startup: backtrace_create_state() calls malloc, which is not | ||
| * async-signal-safe and could deadlock if called from a crash signal handler. */ | ||
| void initLibbacktraceFrameState(void) { | ||
| bt_frame_state = backtrace_create_state(NULL, 0, bt_simple_error_cb, NULL); |
There was a problem hiding this comment.
Here, should we make this threadsafe? meaning backtrace_create_state(NULL, 1, bt_simple_error_cb, NULL);
I just want to be sure that we would not mess up the stacktrace when geting the stack threads when io-threads > 1
backtrace_create_state has an option to enable threaded=1
e8e2574 to
4d12fc5
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds libbacktrace-backed crash stack collection, separates execinfo.h detection into HAVE_EXECINFO, forces HAVE_BACKTRACE when USE_LIBBACKTRACE is enabled, updates symbolization fallbacks, and initializes libbacktrace state at server startup. ChangesLibbacktrace integration with separated backtrace detection
Sequence DiagramsequenceDiagram
participant Server as Server (logStackTrace)
participant ValkeyBacktrace as valkey_backtrace
participant Libbacktrace as libbacktrace (backtrace_simple)
Server->>ValkeyBacktrace: invoke backtrace(trace, size)
ValkeyBacktrace->>Libbacktrace: backtrace_simple(bt_frame_state, callback)
Libbacktrace-->>ValkeyBacktrace: addresses via callback
ValkeyBacktrace-->>Server: return frame addresses
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/server.c`:
- Around line 2915-2917: Move the libbacktrace pre-initialization so it runs
before signal handlers are registered: call initLibbacktraceFrameState() (under
the USE_LIBBACKTRACE ifdef) prior to invoking setupSignalHandlers(), ensuring
the libbacktrace state is initialized before any signal handler can run; update
the order in the initialization sequence accordingly and keep the
USE_LIBBACKTRACE guard around initLibbacktraceFrameState().
In `@tests/support/util.tcl`:
- Around line 1226-1233: The current catch block mistakenly errors out when it
finds "USE_LIBBACKTRACE" (leftover debug) so the function proceeds to the musl
check and returns 0; change the logic so that when grep finds "USE_LIBBACKTRACE"
(i.e. buildinfo is non-empty) the function returns 1 to indicate backtrace
support instead of calling error. Specifically, update the catch block that
inspects buildinfo (from exec grep -a "USE_LIBBACKTRACE" $::VALKEY_SERVER_BIN)
to avoid the intentional error and instead set/return a success value (1) when
buildinfo is non-empty so the presence of USE_LIBBACKTRACE correctly signals
supported backtraces.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e662b773-4c5f-4289-883f-3356a863ab06
📒 Files selected for processing (5)
src/config.hsrc/debug.csrc/server.csrc/server.htests/support/util.tcl
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/support/util.tcl (1)
1228-1230:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
system_backtrace_supportedshould not throw on a positive libbacktrace detectionAt Line 1229, this turns a supported case into a hard error, so the proc no longer reliably returns a boolean capability result.
🐛 Proposed fix
- if {[catch {exec grep -a "initLibbacktraceFrameState" $::VALKEY_SERVER_BIN} buildinfo] == 0} { - error "PROOF: Found initLibbacktraceFrameState symbol - this code path IS being executed on [exec uname -s]" - } + if {[catch {exec grep -a "initLibbacktraceFrameState" $::VALKEY_SERVER_BIN} buildinfo] == 0} { + return 1 + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/support/util.tcl` around lines 1228 - 1230, The check in system_backtrace_supported currently calls error when initLibbacktraceFrameState is found, which turns a supported case into an exception; change that branch so it does not throw but instead returns a truthy boolean (e.g., return 1 or set the proc's result to true) when grep finds the symbol, and ensure the proc still returns a falsy boolean when the symbol is not found—update the block referencing initLibbacktraceFrameState in tests/support/util.tcl to return success rather than calling error so system_backtrace_supported consistently returns a boolean capability result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@tests/support/util.tcl`:
- Around line 1228-1230: The check in system_backtrace_supported currently calls
error when initLibbacktraceFrameState is found, which turns a supported case
into an exception; change that branch so it does not throw but instead returns a
truthy boolean (e.g., return 1 or set the proc's result to true) when grep finds
the symbol, and ensure the proc still returns a falsy boolean when the symbol is
not found—update the block referencing initLibbacktraceFrameState in
tests/support/util.tcl to return success rather than calling error so
system_backtrace_supported consistently returns a boolean capability result.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e4fccd43-fc20-4805-9006-c102d0604848
📒 Files selected for processing (1)
tests/support/util.tcl
5a3ffdd to
8a0a097
Compare
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…cktrace returns 1 instead of 0 Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…stered Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…e is being executed Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
… resolved Module crash tests expect assertCrash and modulesCollectInfo to appear in stack traces, but Alpine/musl doesn't resolve symbols. Make these checks conditional on symbol availability, same as debugCommand check. Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
0f1ab09 to
23cfe1e
Compare
Problem
On Alpine Linux (musl-based), execinfo.h is not available, so Valkey produces no stack trace on crash. This makes debugging crashes on Alpine very difficult.
Solution
Add libbacktrace as a fallback for stack frame collection on systems without execinfo.h.
src/config.h: Split HAVE_BACKTRACE (generic capability) from HAVE_EXECINFO (execinfo-specific). When USE_LIBBACKTRACE is defined and HAVE_EXECINFO is not, HAVE_BACKTRACE is still set so the crash handler compiles in on Alpine.src/debug.c:src/server.c/src/server.h: Initialize backtrace_state once at startup via initLibbacktraceFrameState() to avoid calling malloc from the signal handler, which could deadlock if a crash occurs while the allocator lock is held.Notes
Testing
Built on Alpine 3.23 with USE_LIBBACKTRACE=yes and triggered a crash via DEBUG SEGFAULT. Stack traces are produced correctly for all threads.
Summary by CodeRabbit
New Features
Bug Fixes
Tests