Skip to content

Infinite startup hang on AMD CPU with defective RDSEED (boost::log get_random_seed busy-loop) #5351

Description

@Azwata

Is there an existing issue for this?

  • I have searched the existing issues

Is your issue described in the documentation?

  • I have read the documentation

Is your issue present in the latest beta/pre-release?

None

Describe the Bug

Describe the Bug

Sunshine hangs indefinitely on startup in a 100% CPU busy-loop, before writing even the first log line. Root cause confirmed: RDSEED is hardware-defective on this CPU (returns 0xFFFFFFFFFFFFFFFF while falsely signaling success via CF=1 — same failure class as documented AMD-SB-7055, but on Zen 2 rather than Zen 5). boost::log's get_random_seed() calls glibc's internal __x86_rdseed_rdrand(), which never detects the failure and loops forever trying to get a "valid" seed that never comes.

System

  • OS: Nobara Linux 43 (Fedora 43 based), KDE Plasma 6.6.4, Wayland
  • CPU: AMD Ryzen 9 3900X (Zen 2), microcode 0x08701034 (latest available for this chip; no newer revision exists from AMD/linux-firmware as of this report)
  • BIOS: 2019-era, not updated (ruled out as a factor)

Root cause confirmation

Standalone test, -O0, no optimization:

unsigned long long val;
_rdseed64_step(&val); // returns success (1) every time
// val is ALWAYS 0xFFFFFFFFFFFFFFFF, never a real random number

This reproduces 100% of the time, independent of package format (Copr RPM, GitHub RPM, AppImage all affected identically), independent of config state (fresh config dir, no config dir — same hang), independent of setcap cap_sys_admin, independent of CPU affinity/cgroup.

What does NOT fix it

  • clearcpuid=rdseed kernel boot param — masks CPUID discovery but the raw RDSEED opcode is still directly invoked by glibc/boost without runtime CPUID re-checking, so the broken instruction still executes
  • Directly clearing bit 18 of MSR 0xc0011002 (the same MSR/bit the upstream Linux kernel patch for AMD Cyan Skillfish clears) on all cores — CPUID confirmed updated (rdseed flag gone from /proc/cpuinfo), but raw instruction still returns the broken value
  • GLIBC_TUNABLES=glibc.cpu.hwcaps=-RDRAND,-RDSEED — no effect
  • Updating linux-firmware/amd-ucode-firmware to latest available (20260410) — microcode revision unchanged (0x08701034 before and after), so no fix exists yet for this silicon from AMD
  • Cold boot (full shutdown, not just reboot) — defect persists, not a transient/suspend-related state
  • LD_PRELOAD override of std::random_device::_M_getval — works correctly for normal C++ code using std::random_device directly, but boost::log's get_random_seed() doesn't go through this path; it calls the internal anonymous-namespace __x86_rdseed_rdrand directly, which isn't an exported/interceptable symbol

Suggested fix

boost::log (or Sunshine's own startup code, if a workaround needs to live there) should validate the RDSEED/RDRAND output isn't a known-bad sentinel value (all-1s) before trusting CF=1, similar to the mitigation the Linux kernel and Xen apply for known-defective AMD silicon (see "x86/AMD: deal with RDSEED issues" patches, and AMD-SB-7055). Alternatively, falling back to /dev/urandom unconditionally for log-record ID generation (rather than a CPU instruction) would sidestep this whole class of hardware defect, since it has no real cryptographic need for hardware RNG.

Reproducibility

100% reproducible on this hardware across ~3 hours of testing, multiple Sunshine versions/packaging formats, multiple

Expected Behavior

Sunshine should start normally, write its startup logs, and open the configuration UI on port 47990, as it does on systems without the RDSEED hardware defect.

Additional Context

Will continue troubleshooting and update this issue if a workaround is found. Happy to provide more diagnostic info (gdb backtraces, strace output, etc.) if useful for triaging this.

Host Operating System

Linux

Operating System Version

Nobara Linux 43 (Fedora 43 based), KDE Plasma 6.6.4, Wayland

Architecture

amd64/x86_64

Package

Linux - Fedora Copr

GPU Type

NVIDIA

GPU Model

NVIDIA GeForce RTX 3070 Ti

GPU Driver/Mesa Version

595.58.03

Capture Method

KMS (Linux)

Apps

Log output

N/A - This is precisely the nature of the bug: Sunshine hangs before writing any log output at all. ~/.config/sunshine/sunshine.log remains at 0 bytes. The web UI never becomes available, so the troubleshooting page logs cannot be retrieved either. See the gdb backtrace and strace output included in the bug description above as the diagnostic evidence in place of application logs.

Online logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions