Commit 2a35416
committed
bench(splat3d): re-run EWA-SYRK crossover at target-cpu=x86-64-v4
Correction: the prior RESULTS reported v3 numbers and wrongly attributed
AVX-512 to runtime dispatch. F32x16 is compile-time-selected by target-cpu,
so v3 measured AVX2. Benches must run at the project's deployment tier v4
(AVX-512 native, F32x16 = __m512); committed .cargo/config.toml stays v3 for
GitHub/CI portability, overridden locally via RUSTFLAGS=-Ctarget-cpu=x86-64-v4.
v4 numbers (Melem/s): simd_x16 175/170/172 vs scalar 85/76/82 vs gemm_shape
90/85/87 at 1k/100k/1M. Verdict unchanged and tier-robust (v3 within ~5%):
simd_x16 ~2x over both scalar and the BLAS-shape, no crossover — the
EWA-SYRK backend is a pessimization at 3x3.
https://claude.ai/code/session_01HbqooFZHAjaUtFEzhA1R2u1 parent d798b97 commit 2a35416
1 file changed
Lines changed: 17 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
136 | | - | |
137 | | - | |
138 | | - | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
139 | 140 | | |
140 | 141 | | |
141 | | - | |
| 142 | + | |
| 143 | + | |
142 | 144 | | |
143 | 145 | | |
144 | | - | |
| 146 | + | |
145 | 147 | | |
146 | 148 | | |
147 | 149 | | |
148 | | - | |
149 | | - | |
150 | | - | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
151 | 153 | | |
152 | 154 | | |
153 | | - | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
154 | 158 | | |
155 | | - | |
| 159 | + | |
156 | 160 | | |
157 | 161 | | |
158 | 162 | | |
159 | | - | |
160 | | - | |
| 163 | + | |
161 | 164 | | |
162 | | - | |
| 165 | + | |
| 166 | + | |
163 | 167 | | |
164 | 168 | | |
165 | 169 | | |
| |||
0 commit comments