Skip to content

Fix SIGILL on pre-AVX x86_64: function-level target attributes + runtime CPU detection#307

Open
stumpylog wants to merge 3 commits into
asg017:mainfrom
stumpylog:fix/302-simd-dispatch
Open

Fix SIGILL on pre-AVX x86_64: function-level target attributes + runtime CPU detection#307
stumpylog wants to merge 3 commits into
asg017:mainfrom
stumpylog:fix/302-simd-dispatch

Conversation

@stumpylog

Copy link
Copy Markdown

Closes #302

What

  • Add __attribute__((target(...))) to l2_sqr_float_avx and distance_hamming_avx2, confining AVX/AVX2 codegen to those functions only
  • Add __builtin_cpu_supports() runtime guards at both dispatch sites so a binary built on AVX2 hardware degrades safely on pre-AVX CPUs
  • Drop -mavx -mavx2 from the Makefile (keep SQLITE_VEC_ENABLE_AVX define)

Also

Fixed a pre-existing build error in benchmarks/micro/ (missing vendor/root include paths) and added a simd_dispatch Criterion benchmark. 478 Python tests pass.

build.rs was missing -I../../vendor and -I../../, so sqlite3ext.h
could not be found when cc compiled sqlite-vec.c.
…ection

- Add __attribute__((target("avx,avx2"))) to l2_sqr_float_avx and
  __attribute__((target("avx2"))) to distance_hamming_avx2, confining
  AVX2 codegen to those two functions only.
- Add __builtin_cpu_supports("avx2") runtime guards (cached in a
  static local) at both dispatch sites so pre-AVX2 CPUs take the
  scalar path instead of SIGILL.
- Drop -mavx -mavx2 from the Makefile entirely: those flags were
  applied file-wide, allowing the compiler to emit AVX2 instructions
  in unrelated functions (vec_eachOpen, vec0Open, etc.).
- Simplify the Linux SIMD block: the /proc/cpuinfo grep is no longer
  needed now that the runtime check handles AVX2 availability, so the
  define is enabled unconditionally for Linux x86_64 (matching the
  Darwin x86_64 behaviour).

Fixes asg017#302.
Measures two things:
- distance/l2_float_d1536: one vec_distance_l2() call per iteration,
  direct proxy for the AVX2 l2_sqr_float_avx kernel with no KNN overhead.
- knn/n5000_d1536: end-to-end KNN query over 5 000 x 1536-dim vectors,
  setup paid once outside b.iter, three page sizes.

Designed to capture before/after numbers for SIMD dispatch changes.
n=5000 keeps peak memory well under 100 MB (two DBs alive at once in
the original my_benchmark caused OOM at n=1M).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linux/macOS builds bake AVX/AVX2 based on the build machine's CPU, so 0.1.10-alpha wheels SIGILL on older x86_64

1 participant