fix: gate AVX/AVX-512/AMX dispatch on OS XSAVE support via XGETBV#3699
Open
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Open
fix: gate AVX/AVX-512/AMX dispatch on OS XSAVE support via XGETBV#3699Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Mattbusel wants to merge 1 commit intoggml-org:masterfrom
Conversation
CPUID feature bits report hardware capability only. The OS must also have enabled the relevant extended register state in XCR0 before VEX/ EVEX instructions can execute safely. Without this check, a process on a hypervisor or restricted OS that has not set XCR0[2:1]/[7:5]/ [18:17] will receive SIGILL the moment it hits an AVX/AVX-512/AMX instruction — even though CPUID reports support. Add three helpers to cpuid_x86: os_saves_ymm() — OSXSAVE + XCR0[2:1] (AVX, AVX2, FMA, F16C, AVX-VNNI) os_saves_zmm() — chains os_saves_ymm() + XCR0[7:5] (all AVX-512) os_saves_amx() — chains os_saves_zmm() + XCR0[18:17] (AMX-INT8) xgetbv() is portable: _xgetbv() on MSVC, inline asm on GCC/Clang. Resolves the long-standing FIXME comment in ggml_backend_cpu_x86_score().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ggml_backend_cpu_x86_score()has carried a// FIXME: this does not check for OS supportcomment since this file was first written.The function gates SIMD dispatch purely on CPUID feature flags — but CPUID only reports hardware capability. The OS must also have opted in to saving and restoring the relevant extended register state in XCR0. On systems where it hasn't (certain hypervisors, containers, custom kernels), executing a VEX-encoded (AVX/AVX2) or EVEX-encoded (AVX-512) instruction causes an Illegal Instruction fault (#UD / SIGILL) at runtime, even though CPUID says the CPU supports it.
This is the root cause behind a class of bug reports such as:
The same fix was recently applied to the standalone ggml-org/ggml repo (from which this file is vendored).
Fix
Add three helpers to
cpuid_x86:os_saves_ymm()[2:1]— SSE + YMMos_saves_zmm()[7:5]— opmask + ZMM_Hi256 + Hi16_ZMMos_saves_amx()[18:17]— XTILECFG + XTILEDATAxgetbv()is portable:_xgetbv()intrinsic on MSVC, inlinexgetbvasm on GCC/Clang — matching the same pattern already used forcpuid/cpuidexin this struct.Impact