Commit be35795
committed
fix(simd): aarch64 F32x16/F64x8 use real NEON paired loads, not scalar
Burn parity item 9: F32x16/F64x8 on aarch64 previously dispatched to
the scalar fallback in simd::scalar (element-wise [f32; 16] loops).
Add a real NEON-backed implementation in simd_neon::aarch64_simd,
modeled on the AVX2 polyfill's dual-tuple shape:
F32x16 = [float32x4_t; 4] (4x vld1q_f32 / vst1q_f32 / vfmaq_f32 /
vaddq_f32 etc. per op)
F64x8 = [float64x2_t; 4] (4x vld1q_f64 / vst1q_f64 / vfmaq_f64)
Hot-path arithmetic (add, sub, mul, div, mul_add, splat, abs, neg,
sqrt, round, floor, simd_min/max, reduce_sum) compiles to one NEON
instruction per 128-bit lane pair. Comparisons and bit-cast helpers
round-trip through to_array, same shape as simd_avx2.
simd.rs: mod scalar -> pub(crate) mod scalar (so simd_neon can pull
I32x16/U32x16/U64x8 from there). aarch64 branch pulls F32x16/F64x8
from simd_neon::aarch64_simd; integer + 256-bit float types still
come from scalar. Other non-x86 targets (wasm/riscv) keep full
scalar fallback.
simd_neon.rs: pub mod aarch64_simd (~600 LOC) plus 5 smoke tests
gated on cfg(target_arch = "aarch64", test).
Build:
- cargo build --release --lib -p ndarray (x86_64 AVX-512): PASS
- aarch64 cross-compile of just our types compiles cleanly (uses
only stable core::arch::aarch64 intrinsics shipped since 1.59);
full lib cross-compile blocked in this env by blake3 needing
aarch64-linux-gnu-gcc which is not installed.1 parent 4eca4e0 commit be35795
2 files changed
Lines changed: 661 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
240 | | - | |
| 240 | + | |
241 | 241 | | |
242 | 242 | | |
243 | 243 | | |
| |||
1014 | 1014 | | |
1015 | 1015 | | |
1016 | 1016 | | |
1017 | | - | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
1018 | 1036 | | |
1019 | 1037 | | |
1020 | 1038 | | |
| |||
0 commit comments