Skip to content

fix: gate AVX/AVX-512/AMX dispatch on OS XSAVE support via XGETBV#3699

Open
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Mattbusel:fix/xgetbv-os-support
Open

fix: gate AVX/AVX-512/AMX dispatch on OS XSAVE support via XGETBV#3699
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Mattbusel:fix/xgetbv-os-support

Conversation

@Mattbusel
Copy link
Copy Markdown

Problem

ggml_backend_cpu_x86_score() has carried a // FIXME: this does not check for OS support comment since this file was first written.

The function gates SIMD dispatch purely on CPUID feature flags — but CPUID only reports hardware capability. The OS must also have opted in to saving and restoring the relevant extended register state in XCR0. On systems where it hasn't (certain hypervisors, containers, custom kernels), executing a VEX-encoded (AVX/AVX2) or EVEX-encoded (AVX-512) instruction causes an Illegal Instruction fault (#UD / SIGILL) at runtime, even though CPUID says the CPU supports it.

This is the root cause behind a class of bug reports such as:

The same fix was recently applied to the standalone ggml-org/ggml repo (from which this file is vendored).

Fix

Add three helpers to cpuid_x86:

Helper XCR0 bits Gates
os_saves_ymm() [2:1] — SSE + YMM AVX, AVX2, FMA, F16C, AVX-VNNI
os_saves_zmm() [7:5] — opmask + ZMM_Hi256 + Hi16_ZMM All AVX-512 variants
os_saves_amx() [18:17] — XTILECFG + XTILEDATA AMX-INT8

xgetbv() is portable: _xgetbv() intrinsic on MSVC, inline xgetbv asm on GCC/Clang — matching the same pattern already used for cpuid/cpuidex in this struct.

Impact

  • No behavioural change on any correctly configured system (XCR0 is always set when the CPU advertises the feature under a normal OS).
  • Prevents SIGILL under hypervisors or containers that do not set XCR0 to match CPUID.
  • Removes the FIXME that has existed since this code was written.

CPUID feature bits report hardware capability only.  The OS must also
have enabled the relevant extended register state in XCR0 before VEX/
EVEX instructions can execute safely.  Without this check, a process
on a hypervisor or restricted OS that has not set XCR0[2:1]/[7:5]/
[18:17] will receive SIGILL the moment it hits an AVX/AVX-512/AMX
instruction — even though CPUID reports support.

Add three helpers to cpuid_x86:
  os_saves_ymm() — OSXSAVE + XCR0[2:1] (AVX, AVX2, FMA, F16C, AVX-VNNI)
  os_saves_zmm() — chains os_saves_ymm() + XCR0[7:5] (all AVX-512)
  os_saves_amx() — chains os_saves_zmm() + XCR0[18:17] (AMX-INT8)

xgetbv() is portable: _xgetbv() on MSVC, inline asm on GCC/Clang.

Resolves the long-standing FIXME comment in ggml_backend_cpu_x86_score().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant