Skip to content

Add cpuid checks for XSAVE, OSXSAVE and AVX#140

Merged
jkbonfield merged 3 commits into
samtools:masterfrom
daviesrob:cpuid_xsave
Sep 17, 2025
Merged

Add cpuid checks for XSAVE, OSXSAVE and AVX#140
jkbonfield merged 3 commits into
samtools:masterfrom
daviesrob:cpuid_xsave

Conversation

@daviesrob
Copy link
Copy Markdown
Member

AVX2 and AVX512F depend on AVX, XSAVE and OSXSAVE being present. If they are not, AVX2/AVX512F instructions may be blocked even though cpuid() reports them to be available. Spotted when the AVX512 rANS codec crashed with illegal instructions on a virtual host that claimed to have AVX512F, but did not have OSXSAVE.

For comparison, see similar detection code in libdeflate and gcc.

AVX2 and AVX512F depend on AVX, XSAVE and OSXSAVE being present.
If they are not, AVX2/AVX512F instructions may be blocked even
though cpuid() reports them to be available.  Spotted when the
AVX512 rANS codec crashed with illegal instructions on a virtual
host that claimed to have AVX512F, but did not have OSXSAVE.
@jkbonfield
Copy link
Copy Markdown
Collaborator

Note this implementation is not the same as the ones you linked, which use XGETBV (in machine code as binutils may not support it) and then check XCR[0] for 0x06 (AVX2) or 0xE6 (AVX512). The comments in libdeflate are more useful, but it has hard coded numeric values (6/e6), while gcc's implementation has no comments about why it's dropped to raw machine code but does have sensible macros explaining the magic numbers.

Now whether or not we need to care about these I don't know, but my instinct here is if we're fixing this to support the OSXSAVE flags then we ought to then actually query them rather than being binary on/off to see which flags have been filtered out by the OS.

@daviesrob
Copy link
Copy Markdown
Member Author

Ah, yes, the canonical guide for how to do this appears to be in section 13.2 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual vol. 1. I'll see if I can implement more of it.

Along with checking CPUID, it's necessary to look in the XCR0
register to check that AVX, AVX2 and AVX512 instructions can
be used.  (The operating system can write to this register to
selectively enable or disable these features).  See the Intel 64
and IA-32 Architectures Software Developer’s Manual Vol. 1
sections 13.2 and 13.3 for details.

XCR0 is read using the XGETBV instruction.  While there is an
intrinsic for this, using it requires specific compiler options
that we may not want to use for htscodecs/rANS_static4x16pr.c
compilation.  The intrinsic also didn't work until gcc 9.

As some binutils still in use may not know about XGETBV, the
instruction is written as a byte stream in the inline assembly.
MacOS before 12.2 (a.k.a. Darwin 21.3) had a bug that could cause
random failures on some AVX512 operations due to opmask registers
not being restored correctly following interrupts.  Add a Darwin
version check when testing for AVX512 so it can be kept off where
this might cause a problem.

See https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259
@daviesrob
Copy link
Copy Markdown
Member Author

Added XGETBV lookup and a commit to disable AVX512 on versions of MacOS which had a bug that could cause some AVX512 operations to randomly fail.

The XGETBV commit can be tested by passing boot option clearcpuid=avx or clearcpuid=avx512f to the Linux kernel. The MacOS one will need an Intel Mac.

@jkbonfield jkbonfield merged commit a815cd0 into samtools:master Sep 17, 2025
6 checks passed
@daviesrob daviesrob deleted the cpuid_xsave branch October 1, 2025 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants