Skip to content

Add libbacktrace fallback for stack traces on musl/Alpine#3581

Open
hanxizh9910 wants to merge 22 commits into
valkey-io:unstablefrom
hanxizh9910:libbacktrace-alpine-support
Open

Add libbacktrace fallback for stack traces on musl/Alpine#3581
hanxizh9910 wants to merge 22 commits into
valkey-io:unstablefrom
hanxizh9910:libbacktrace-alpine-support

Conversation

@hanxizh9910
Copy link
Copy Markdown
Contributor

@hanxizh9910 hanxizh9910 commented Apr 28, 2026

Problem

On Alpine Linux (musl-based), execinfo.h is not available, so Valkey produces no stack trace on crash. This makes debugging crashes on Alpine very difficult.

Solution

Add libbacktrace as a fallback for stack frame collection on systems without execinfo.h.

  • src/config.h: Split HAVE_BACKTRACE (generic capability) from HAVE_EXECINFO (execinfo-specific). When USE_LIBBACKTRACE is defined and HAVE_EXECINFO is not, HAVE_BACKTRACE is still set so the crash handler compiles in on Alpine.
  • src/debug.c:
    • Add valkey_backtrace() using backtrace_simple() as a drop-in replacement for backtrace() on musl. Macro-aliases backtrace() to valkey_backtrace() so no other code needs to change.
    • Fix fallback messages in symbolizeWithLibbacktrace to correctly distinguish between "execinfo fallback available" and "no fallback available".
  • src/server.c / src/server.h: Initialize backtrace_state once at startup via initLibbacktraceFrameState() to avoid calling malloc from the signal handler, which could deadlock if a crash occurs while the allocator lock is held.

Notes

Testing

Built on Alpine 3.23 with USE_LIBBACKTRACE=yes and triggered a crash via DEBUG SEGFAULT. Stack traces are produced correctly for all threads.

  • After the implementation:
Part of alpine implementation screenshot

Summary by CodeRabbit

  • New Features

    • Optional libbacktrace support with startup initialization to improve crash symbolization when available.
  • Bug Fixes

    • Separate detection for execinfo-based backtraces and improved logic so libbacktrace is enabled when configured.
    • Safer, conditional symbolization and clearer fallbacks when native backtrace APIs are unavailable.
  • Tests

    • Runtime test updated to detect libbacktrace-enabled builds and adjust behavior.

Review Change Stack

@hanxizh9910 hanxizh9910 marked this pull request as draft April 28, 2026 21:12
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.69%. Comparing base (a813df0) to head (23cfe1e).

Files with missing lines Patch % Lines
src/debug.c 76.47% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3581      +/-   ##
============================================
- Coverage     76.94%   76.69%   -0.26%     
============================================
  Files           162      162              
  Lines         80656    80673      +17     
============================================
- Hits          62058    61869     -189     
- Misses        18598    18804     +206     
Files with missing lines Coverage Δ
src/server.c 89.48% <100.00%> (-0.03%) ⬇️
src/server.h 100.00% <ø> (ø)
src/debug.c 55.11% <76.47%> (+0.27%) ⬆️

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch 2 times, most recently from 836b39d to 71ae67a Compare April 30, 2026 00:36
@hanxizh9910 hanxizh9910 marked this pull request as ready for review April 30, 2026 01:35
@rainsupreme rainsupreme self-requested a review April 30, 2026 19:43
Copy link
Copy Markdown
Contributor

@rainsupreme rainsupreme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good change! Thanks for the fix 😊 I just had a couple ideas to simplify/document a little more.

I'm wondering why the stack trace in your testing was so short. Normally I'd expect it to be somewhat longer. Maybe if you built with CFLAGS="-g" there would be more info. I don't think this is too important though.

Comment thread src/debug.c
bt_frame_state = backtrace_create_state(NULL, 0, bt_simple_error_cb, NULL);
}

static int valkey_backtrace(void **trace, int max_size) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a comment here explaining that we preallocate bt_frame_state because allocating is unsafe while handling an async signal (such as crash handling). This probably isn't immediately obvious and might help out the next person to read this code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree! I just added the comment.

Comment thread src/debug.c Outdated
#ifdef HAVE_BACKTRACE
#define BACKTRACE_MAX_SIZE 100

#if defined(USE_LIBBACKTRACE) && !defined(HAVE_EXECINFO)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When USE_LIBBACKTRACE is enabled, we should use backtrace_simple() from libbacktrace for frame collection unconditionally — not just when execinfo.h is missing. Both call _Unwind_Backtrace() under the hood, so there's no benefit to keeping the glibc path as a special case. This simplifies the guard here to just #ifdef USE_LIBBACKTRACE

@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch from 5e41066 to c75b577 Compare May 6, 2026 20:26
@hanxizh9910
Copy link
Copy Markdown
Contributor Author

This looks like a good change! Thanks for the fix 😊 I just had a couple ideas to simplify/document a little more.

I'm wondering why the stack trace in your testing was so short. Normally I'd expect it to be somewhat longer. Maybe if you built with CFLAGS="-g" there would be more info. I don't think this is too important though.

Hi, the build already includes -g by default(https://github.com/valkey-io/valkey/blob/unstable/src/Makefile#L142). The short trace is likely due to -03 optimization(https://github.com/valkey-io/valkey/blob/unstable/src/Makefile#L23) which inlines many functions

Copy link
Copy Markdown
Contributor

@rainsupreme rainsupreme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for this improvement!

Copy link
Copy Markdown
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good fix and I have also reviewed the container PR, that is well. just have one question about grabbing the io-threads backtrace.

Also check if we have a test to tests this change in daily.yml

Comment thread src/debug.c Outdated
/* Preallocate at startup: backtrace_create_state() calls malloc, which is not
* async-signal-safe and could deadlock if called from a crash signal handler. */
void initLibbacktraceFrameState(void) {
bt_frame_state = backtrace_create_state(NULL, 0, bt_simple_error_cb, NULL);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, should we make this threadsafe? meaning backtrace_create_state(NULL, 1, bt_simple_error_cb, NULL);

I just want to be sure that we would not mess up the stacktrace when geting the stack threads when io-threads > 1

backtrace_create_state has an option to enable threaded=1

@hanxizh9910 hanxizh9910 added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label May 11, 2026
@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch 2 times, most recently from e8e2574 to 4d12fc5 Compare May 12, 2026 06:23
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds libbacktrace-backed crash stack collection, separates execinfo.h detection into HAVE_EXECINFO, forces HAVE_BACKTRACE when USE_LIBBACKTRACE is enabled, updates symbolization fallbacks, and initializes libbacktrace state at server startup.

Changes

Libbacktrace integration with separated backtrace detection

Layer / File(s) Summary
Backtrace feature detection
src/config.h
Defines HAVE_EXECINFO for execinfo.h availability and ensures HAVE_BACKTRACE is defined when USE_LIBBACKTRACE is set even if platform detection did not.
Libbacktrace stack collection
src/debug.c
Conditionally include <execinfo.h> when HAVE_EXECINFO is set; add preallocated bt_frame_state, initLibbacktraceFrameState(), valkey_backtrace() using backtrace_simple, and #define backtrace(trace,size) valkey_backtrace(trace,size) when USE_LIBBACKTRACE.
Error handling and symbolization
src/debug.c
On libbacktrace symbolization failures or fork errors, use backtrace_symbols_fd only if HAVE_EXECINFO is available; otherwise emit explicit fallback messages. Use symbolizeWithLibbacktrace() for eip symbolization when USE_LIBBACKTRACE is enabled.
Public API and server initialization
src/server.h, src/server.c
Add guarded prototype initLibbacktraceFrameState() and call it from initServer() when USE_LIBBACKTRACE is enabled to initialize libbacktrace frame state at startup.
Test backtrace detection
tests/support/util.tcl
Add runtime check that greps the server binary for initLibbacktraceFrameState inside system_backtrace_supported to exercise the libbacktrace detection path in tests.

Sequence Diagram

sequenceDiagram
  participant Server as Server (logStackTrace)
  participant ValkeyBacktrace as valkey_backtrace
  participant Libbacktrace as libbacktrace (backtrace_simple)
  Server->>ValkeyBacktrace: invoke backtrace(trace, size)
  ValkeyBacktrace->>Libbacktrace: backtrace_simple(bt_frame_state, callback)
  Libbacktrace-->>ValkeyBacktrace: addresses via callback
  ValkeyBacktrace-->>Server: return frame addresses
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I padded through configs, nose to the code,
Found frames and symbols down a new road,
Libbacktrace readied, state snug and tight,
When crashes arrive, the stack hops to light,
A rabbit's cheer for traces found tonight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main objective of the pull request: adding libbacktrace as a fallback mechanism to enable stack traces on musl-based systems like Alpine Linux.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/server.c`:
- Around line 2915-2917: Move the libbacktrace pre-initialization so it runs
before signal handlers are registered: call initLibbacktraceFrameState() (under
the USE_LIBBACKTRACE ifdef) prior to invoking setupSignalHandlers(), ensuring
the libbacktrace state is initialized before any signal handler can run; update
the order in the initialization sequence accordingly and keep the
USE_LIBBACKTRACE guard around initLibbacktraceFrameState().

In `@tests/support/util.tcl`:
- Around line 1226-1233: The current catch block mistakenly errors out when it
finds "USE_LIBBACKTRACE" (leftover debug) so the function proceeds to the musl
check and returns 0; change the logic so that when grep finds "USE_LIBBACKTRACE"
(i.e. buildinfo is non-empty) the function returns 1 to indicate backtrace
support instead of calling error. Specifically, update the catch block that
inspects buildinfo (from exec grep -a "USE_LIBBACKTRACE" $::VALKEY_SERVER_BIN)
to avoid the intentional error and instead set/return a success value (1) when
buildinfo is non-empty so the presence of USE_LIBBACKTRACE correctly signals
supported backtraces.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e662b773-4c5f-4289-883f-3356a863ab06

📥 Commits

Reviewing files that changed from the base of the PR and between fdd9039 and 843029c.

📒 Files selected for processing (5)
  • src/config.h
  • src/debug.c
  • src/server.c
  • src/server.h
  • tests/support/util.tcl

Comment thread src/server.c
Comment thread tests/support/util.tcl
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/support/util.tcl (1)

1228-1230: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

system_backtrace_supported should not throw on a positive libbacktrace detection

At Line 1229, this turns a supported case into a hard error, so the proc no longer reliably returns a boolean capability result.

🐛 Proposed fix
-    if {[catch {exec grep -a "initLibbacktraceFrameState" $::VALKEY_SERVER_BIN} buildinfo] == 0} {
-        error "PROOF: Found initLibbacktraceFrameState symbol - this code path IS being executed on [exec uname -s]"
-    }
+    if {[catch {exec grep -a "initLibbacktraceFrameState" $::VALKEY_SERVER_BIN} buildinfo] == 0} {
+        return 1
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/support/util.tcl` around lines 1228 - 1230, The check in
system_backtrace_supported currently calls error when initLibbacktraceFrameState
is found, which turns a supported case into an exception; change that branch so
it does not throw but instead returns a truthy boolean (e.g., return 1 or set
the proc's result to true) when grep finds the symbol, and ensure the proc still
returns a falsy boolean when the symbol is not found—update the block
referencing initLibbacktraceFrameState in tests/support/util.tcl to return
success rather than calling error so system_backtrace_supported consistently
returns a boolean capability result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@tests/support/util.tcl`:
- Around line 1228-1230: The check in system_backtrace_supported currently calls
error when initLibbacktraceFrameState is found, which turns a supported case
into an exception; change that branch so it does not throw but instead returns a
truthy boolean (e.g., return 1 or set the proc's result to true) when grep finds
the symbol, and ensure the proc still returns a falsy boolean when the symbol is
not found—update the block referencing initLibbacktraceFrameState in
tests/support/util.tcl to return success rather than calling error so
system_backtrace_supported consistently returns a boolean capability result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e4fccd43-fc20-4805-9006-c102d0604848

📥 Commits

Reviewing files that changed from the base of the PR and between c750932 and 867eb3d.

📒 Files selected for processing (1)
  • tests/support/util.tcl

@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch 2 times, most recently from 5a3ffdd to 8a0a097 Compare May 12, 2026 20:06
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…cktrace returns 1 instead of 0

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…stered

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
…e is being executed

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
… resolved

Module crash tests expect assertCrash and modulesCollectInfo to appear
in stack traces, but Alpine/musl doesn't resolve symbols. Make these
checks conditional on symbol availability, same as debugCommand check.

Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>
@hanxizh9910 hanxizh9910 force-pushed the libbacktrace-alpine-support branch from 0f1ab09 to 23cfe1e Compare May 13, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants