You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(simd): align with existing ArmProfile::arm_profile() heuristic
Self-correction. The previous commit collapsed A72Fast + A53Baseline
into a new `Armv8Neon` variant, claiming the two could not be
distinguished by HWCAP. That reinvented something the codebase
already solves.
`src/hpc/simd_caps.rs:317-336` has had `ArmProfile::arm_profile()`
in tree since the SBC support landed. Its decision tree:
asimd_dotprod present → A76DotProd (Pi 5 / A76+)
aes present (no dotprod) → A72Fast (Pi 4 / Pi 3 / Pi Zero 2W)
no aes → A53Baseline (QEMU / minimal aarch64)
The line 329 comment explicitly admits the A72Fast branch catches
both A72 silicon (Pi 4) and A53-with-crypto silicon (Pi 3, Pi Zero
2W): "we report A72-tier since most deployments target Pi 4." The
dispatch tables would be identical at the ISA level (both are
ARMv8.0+crypto, no dotprod), so this is intentional.
The `A53Baseline` variant catches the rare case of NEON-without-
crypto (QEMU, minimal aarch64 builds), which my `Armv8Neon` collapse
lost.
Changes:
- Reverted SimdProfile enum to A76DotProd / A72Fast / A53Baseline.
- detect() pseudocode now delegates to existing arm_profile() helper.
- GemmDispatch table restored to 3 aarch64 entries.
- Quick-reference tables list both A72Fast and A53Baseline rows with
a note that they share the same kernel.
- Dispatch matrix split into 4 rows: A53+crypto (→A72Fast),
A53-no-crypto (→A53Baseline), A72 (→A72Fast), A76+ (→A76DotProd).
This is more honest than the Armv8Neon collapse: it preserves the
existing in-tree pattern, names it correctly, and documents the
A72Fast-as-ARMv8.0+crypto-catch-all semantic that the codebase
already chose.
Rows ordered by SoC tier (Pi family naming as canonical). A53 and A72 are listed as separate documented silicon (they have distinct microarchitecture — single vs dual NEON pipeline), but **the runtime `SimdProfile` collapses both into one variant `Armv8Neon`** because HWCAP/CPUID alone cannot distinguish them. Splitting them requires reading `/proc/cpuinfo``CPU part` field (0xd03 = A53, 0xd08 = A72) — deferred until benchmarks demand it.
172
+
Rows ordered by SoC tier (Pi family naming as canonical). **The existing detection helper `ArmProfile::arm_profile()` at `src/hpc/simd_caps.rs:317-336` already implements this dispatch and is the canonical reference.** It admits in its own comments that A72 silicon and A53-with-crypto silicon cannot be distinguished by HWCAP alone, and pragmatically maps both to `A72Fast` since the dispatch tables would be identical at the ISA level (both are ARMv8.0+crypto with no dotprod). The `A53Baseline` variant catches the rare case of NEON-without-crypto (QEMU, minimal aarch64 builds).
0 commit comments