Commit 7189779
committed
fix(simd): aarch64 F32x16/F64x8 use real NEON paired loads, not scalar
Burn parity item 9: F32x16/F64x8 on aarch64 previously dispatched to
the scalar fallback in simd::scalar (element-wise [f32; 16] loops).
Add a real NEON-backed implementation in simd_neon::aarch64_simd,
modeled on the AVX2 polyfill's dual-tuple shape:
F32x16 = [float32x4_t; 4] (4x vld1q_f32 / vst1q_f32 / vfmaq_f32 /
vaddq_f32 etc. per op)
F64x8 = [float64x2_t; 4] (4x vld1q_f64 / vst1q_f64 / vfmaq_f64)
Hot-path arithmetic (add, sub, mul, div, mul_add, splat, abs, neg,
sqrt, round, floor, simd_min/max, reduce_sum) compiles to one NEON
instruction per 128-bit lane pair. Comparisons and bit-cast helpers
round-trip through to_array, same shape as simd_avx2.
simd.rs: mod scalar -> pub(crate) mod scalar (so simd_neon can pull
I32x16/U32x16/U64x8 from there). aarch64 branch pulls F32x16/F64x8
from simd_neon::aarch64_simd; integer + 256-bit float types still
come from scalar. Other non-x86 targets (wasm/riscv) keep full
scalar fallback.
simd_neon.rs: pub mod aarch64_simd (~600 LOC) plus 5 smoke tests
gated on cfg(target_arch = "aarch64", test).
Build:
- cargo build --release --lib -p ndarray (x86_64 AVX-512): PASS
- aarch64 cross-compile of just our types compiles cleanly (uses
only stable core::arch::aarch64 intrinsics shipped since 1.59);
full lib cross-compile blocked in this env by blake3 needing
aarch64-linux-gnu-gcc which is not installed.1 parent 888e598 commit 7189779
2 files changed
Lines changed: 661 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
165 | | - | |
| 165 | + | |
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
| |||
939 | 939 | | |
940 | 940 | | |
941 | 941 | | |
942 | | - | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
943 | 961 | | |
944 | 962 | | |
945 | 963 | | |
| |||
0 commit comments