Add cpuid checks for XSAVE, OSXSAVE and AVX#140
Conversation
AVX2 and AVX512F depend on AVX, XSAVE and OSXSAVE being present. If they are not, AVX2/AVX512F instructions may be blocked even though cpuid() reports them to be available. Spotted when the AVX512 rANS codec crashed with illegal instructions on a virtual host that claimed to have AVX512F, but did not have OSXSAVE.
|
Note this implementation is not the same as the ones you linked, which use XGETBV (in machine code as binutils may not support it) and then check XCR[0] for 0x06 (AVX2) or 0xE6 (AVX512). The comments in libdeflate are more useful, but it has hard coded numeric values (6/e6), while gcc's implementation has no comments about why it's dropped to raw machine code but does have sensible macros explaining the magic numbers. Now whether or not we need to care about these I don't know, but my instinct here is if we're fixing this to support the OSXSAVE flags then we ought to then actually query them rather than being binary on/off to see which flags have been filtered out by the OS. |
|
Ah, yes, the canonical guide for how to do this appears to be in section 13.2 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual vol. 1. I'll see if I can implement more of it. |
Along with checking CPUID, it's necessary to look in the XCR0 register to check that AVX, AVX2 and AVX512 instructions can be used. (The operating system can write to this register to selectively enable or disable these features). See the Intel 64 and IA-32 Architectures Software Developer’s Manual Vol. 1 sections 13.2 and 13.3 for details. XCR0 is read using the XGETBV instruction. While there is an intrinsic for this, using it requires specific compiler options that we may not want to use for htscodecs/rANS_static4x16pr.c compilation. The intrinsic also didn't work until gcc 9. As some binutils still in use may not know about XGETBV, the instruction is written as a byte stream in the inline assembly.
MacOS before 12.2 (a.k.a. Darwin 21.3) had a bug that could cause random failures on some AVX512 operations due to opmask registers not being restored correctly following interrupts. Add a Darwin version check when testing for AVX512 so it can be kept off where this might cause a problem. See https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259
|
Added The |
AVX2 and AVX512F depend on AVX, XSAVE and OSXSAVE being present. If they are not, AVX2/AVX512F instructions may be blocked even though
cpuid()reports them to be available. Spotted when the AVX512 rANS codec crashed with illegal instructions on a virtual host that claimed to have AVX512F, but did not have OSXSAVE.For comparison, see similar detection code in libdeflate and gcc.