Commit 84b29a1
committed
perf: all Base17 ops multi-versioned SIMD — AVX-512/AVX2/scalar via LazyLock
4 functions converted to multi-versioned kernels:
l1_weighted: I32x16 mul(abs_diff, weights) + reduce_sum
sign_agreement: I32x16 xor + cmpge_mask + count_ones
xor_bind: I32x16 xor + cvtepi32_epi16 pack-back
inject_noise: I32x16 add(dims, prng_noise) + clamp
Pattern: #[target_feature(enable = "avx512f")] per-function,
LazyLock runtime detection, one binary serves all ISAs.
No global target-cpu in .cargo/config.toml.
CI (AVX2) and Production (AVX-512) use same binary.
629M lookups/sec, 19K tokens/sec, 19 tests passing.
https://claude.ai/code/session_01M3at4EuHVvQ8S95mSnKgtK1 parent 9d570bc commit 84b29a1
1 file changed
Lines changed: 372 additions & 82 deletions
0 commit comments