Fix SIGILL on pre-AVX x86_64: function-level target attributes + runtime CPU detection#307
Open
stumpylog wants to merge 3 commits into
Open
Fix SIGILL on pre-AVX x86_64: function-level target attributes + runtime CPU detection#307stumpylog wants to merge 3 commits into
stumpylog wants to merge 3 commits into
Conversation
build.rs was missing -I../../vendor and -I../../, so sqlite3ext.h could not be found when cc compiled sqlite-vec.c.
…ection
- Add __attribute__((target("avx,avx2"))) to l2_sqr_float_avx and
__attribute__((target("avx2"))) to distance_hamming_avx2, confining
AVX2 codegen to those two functions only.
- Add __builtin_cpu_supports("avx2") runtime guards (cached in a
static local) at both dispatch sites so pre-AVX2 CPUs take the
scalar path instead of SIGILL.
- Drop -mavx -mavx2 from the Makefile entirely: those flags were
applied file-wide, allowing the compiler to emit AVX2 instructions
in unrelated functions (vec_eachOpen, vec0Open, etc.).
- Simplify the Linux SIMD block: the /proc/cpuinfo grep is no longer
needed now that the runtime check handles AVX2 availability, so the
define is enabled unconditionally for Linux x86_64 (matching the
Darwin x86_64 behaviour).
Fixes asg017#302.
Measures two things: - distance/l2_float_d1536: one vec_distance_l2() call per iteration, direct proxy for the AVX2 l2_sqr_float_avx kernel with no KNN overhead. - knn/n5000_d1536: end-to-end KNN query over 5 000 x 1536-dim vectors, setup paid once outside b.iter, three page sizes. Designed to capture before/after numbers for SIMD dispatch changes. n=5000 keeps peak memory well under 100 MB (two DBs alive at once in the original my_benchmark caused OOM at n=1M).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #302
What
__attribute__((target(...)))tol2_sqr_float_avxanddistance_hamming_avx2, confining AVX/AVX2 codegen to those functions only__builtin_cpu_supports()runtime guards at both dispatch sites so a binary built on AVX2 hardware degrades safely on pre-AVX CPUs-mavx -mavx2from the Makefile (keepSQLITE_VEC_ENABLE_AVXdefine)Also
Fixed a pre-existing build error in
benchmarks/micro/(missing vendor/root include paths) and added asimd_dispatchCriterion benchmark. 478 Python tests pass.