Commit 0602b8c
Raghuveer Devulapalli
Add optimal AVX2/SSE reduction and refactor native SIMD code
* Replace icelake-server gcc target with skylake-avx512 in build script
* Remove global mutable state: eliminate initialIndexRegister,
indexIncrement, maskSeventhBit, maskEighthBit globals and their
constructor initializer; move mask constants (maskSeventhBit,
maskEighthBit) to local scope inside lookup_partial_sums
* Add shared reduce_add_128_ps and reduce_add_256_ps helper functions
using proper horizontal-add sequences instead of store-to-array loops
* Remove redundant if (length >= N) guards in all SIMD kernels — the
loop body already handles the zero-iteration case correctly
* Replace store-to-aligned-array horizontal reduction pattern with the
new helpers across all 128- and 256-bit dot product and euclidean
distance functions
* Remove preferred_size parameter from dot_product_f32 and
euclidean_f32; always dispatch to AVX-512 when length >= 16
* Standardize inline annotations: replace __attribute__((always_inline))
inline with JV_FINLINE / JV_INLINE macros throughout1 parent 18488b8 commit 0602b8c
3 files changed
Lines changed: 212 additions & 224 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | | - | |
| 80 | + | |
0 commit comments