Commit ea882e6
committed
fix(avx-512vl): masked load/store reach per-arch EVEX intrinsics
Three coupled changes; gcc-10 partial-ordering forces them into a
single commit:
1) Rewrite avx512vl_128 / avx512vl_256 masked load/store. Adds the
missing int64/uint64/float/double load_masked ovlds, corrects the
batch_bool_constant typing on store_masked (was uint32_t/uint64_t for
signed-int/float/double stores, now matches the value type), and
branches aligned vs. unaligned to the right EVEX intrinsic. Unsigned
ovlds delegate to the signed one via bitwise_cast.
2) Constrain the non-VL master ovlds (avx_128 float/double, avx2_128
int32/uint32 + int64/uint64, avx2 templated and int32/uint32/int64/
uint64) and the common-memory int<->float bridges with
!is_base_of<avx512vl_*, A>. gcc-10's partial ordering otherwise sees
a concrete requires_arch<X> and the inherited concrete
requires_arch<Y> (Y a base of X) as equally specialized, likewise
for templated bridge<A> vs. native<avx512vl_*> when A is VL.
gcc-14 handles both cases naturally so this is a no-op there.
The avx native gains an is_floating_point<T> SFINAE and the avx2
templated gains is_integral<T> && sizeof>=4 so the new half-fold
dispatch (half_arch = avx for floats, avx2 for ints in a 512-bit
batch) is unambiguous on gcc-10.
3) Resolve the half-fold target arch in avx / avx2 / avx512f through
make_sized_batch_t<T, half>::arch_type so the dispatch picks
avx512vl_128 / avx512vl_256 when available and emits EVEX
vmovdqu32{k}{z} instead of VEX vpmaskmovd / vmaskmovps. (Without
(3), (2)'s is_integral SFINAE on the avx2 templated form leaves the
pre-existing avx512f.hpp:339 'store_masked<avx2>(float*, __m256,
...)' callsite with no matching ovld on gcc-10.)
The xsimd_batch dispatch drops the explicit <A, T, U, Values...> args
on the kernel::store_masked call so the SFINAE'd overload set can be
resolved by ADL, and adds a fwd decl of make_sized_batch ahead of
xsimd_isa.hpp so the half-fold sites can see the type at parse time.
bridge_not_vl lives in xsimd_common_fwd next to the bridge fwd-decls;
fwd.hpp now pulls xsimd_avx512vl_register so the trait sees complete
types. The 4 redundant register.hpp includes that would otherwise be
added at the point-of-use are dropped — they're reachable transitively
through fwd.hpp.1 parent 7c36cbc commit ea882e6
10 files changed
Lines changed: 333 additions & 140 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
388 | 388 | | |
389 | 389 | | |
390 | 390 | | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
391 | 394 | | |
392 | | - | |
| 395 | + | |
| 396 | + | |
393 | 397 | | |
394 | 398 | | |
395 | 399 | | |
396 | 400 | | |
397 | 401 | | |
398 | 402 | | |
399 | | - | |
| 403 | + | |
| 404 | + | |
400 | 405 | | |
401 | 406 | | |
402 | 407 | | |
403 | 408 | | |
404 | 409 | | |
405 | 410 | | |
406 | | - | |
| 411 | + | |
407 | 412 | | |
408 | 413 | | |
409 | 414 | | |
410 | 415 | | |
411 | 416 | | |
412 | 417 | | |
413 | 418 | | |
414 | | - | |
| 419 | + | |
415 | 420 | | |
416 | 421 | | |
417 | 422 | | |
418 | 423 | | |
419 | 424 | | |
420 | 425 | | |
421 | 426 | | |
422 | | - | |
| 427 | + | |
| 428 | + | |
423 | 429 | | |
424 | 430 | | |
425 | 431 | | |
426 | 432 | | |
427 | 433 | | |
428 | | - | |
| 434 | + | |
| 435 | + | |
429 | 436 | | |
430 | 437 | | |
431 | 438 | | |
432 | 439 | | |
433 | 440 | | |
434 | | - | |
| 441 | + | |
435 | 442 | | |
436 | 443 | | |
437 | 444 | | |
438 | 445 | | |
439 | 446 | | |
440 | 447 | | |
441 | | - | |
| 448 | + | |
442 | 449 | | |
443 | 450 | | |
444 | 451 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
993 | 993 | | |
994 | 994 | | |
995 | 995 | | |
| 996 | + | |
996 | 997 | | |
997 | | - | |
| 998 | + | |
998 | 999 | | |
999 | 1000 | | |
1000 | | - | |
1001 | | - | |
| 1001 | + | |
| 1002 | + | |
1002 | 1003 | | |
1003 | 1004 | | |
1004 | | - | |
| 1005 | + | |
1005 | 1006 | | |
1006 | 1007 | | |
1007 | | - | |
1008 | | - | |
| 1008 | + | |
| 1009 | + | |
1009 | 1010 | | |
1010 | 1011 | | |
1011 | 1012 | | |
| |||
1019 | 1020 | | |
1020 | 1021 | | |
1021 | 1022 | | |
1022 | | - | |
| 1023 | + | |
1023 | 1024 | | |
1024 | 1025 | | |
1025 | 1026 | | |
1026 | 1027 | | |
1027 | 1028 | | |
1028 | | - | |
| 1029 | + | |
1029 | 1030 | | |
1030 | 1031 | | |
1031 | 1032 | | |
1032 | 1033 | | |
1033 | 1034 | | |
1034 | | - | |
| 1035 | + | |
| 1036 | + | |
1035 | 1037 | | |
1036 | 1038 | | |
1037 | 1039 | | |
| 1040 | + | |
| 1041 | + | |
1038 | 1042 | | |
1039 | | - | |
| 1043 | + | |
1040 | 1044 | | |
1041 | 1045 | | |
1042 | | - | |
1043 | | - | |
1044 | | - | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
1045 | 1049 | | |
1046 | | - | |
| 1050 | + | |
1047 | 1051 | | |
1048 | 1052 | | |
1049 | | - | |
1050 | | - | |
1051 | | - | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
1052 | 1056 | | |
1053 | 1057 | | |
1054 | 1058 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
138 | 138 | | |
139 | 139 | | |
140 | 140 | | |
141 | | - | |
| 141 | + | |
| 142 | + | |
142 | 143 | | |
143 | 144 | | |
144 | 145 | | |
| |||
148 | 149 | | |
149 | 150 | | |
150 | 151 | | |
151 | | - | |
| 152 | + | |
| 153 | + | |
152 | 154 | | |
153 | 155 | | |
154 | 156 | | |
155 | 157 | | |
156 | 158 | | |
157 | | - | |
| 159 | + | |
| 160 | + | |
158 | 161 | | |
159 | 162 | | |
160 | 163 | | |
161 | 164 | | |
162 | 165 | | |
163 | 166 | | |
164 | | - | |
| 167 | + | |
| 168 | + | |
165 | 169 | | |
166 | 170 | | |
167 | 171 | | |
168 | 172 | | |
169 | 173 | | |
170 | | - | |
| 174 | + | |
| 175 | + | |
171 | 176 | | |
172 | 177 | | |
173 | 178 | | |
| |||
190 | 195 | | |
191 | 196 | | |
192 | 197 | | |
193 | | - | |
| 198 | + | |
| 199 | + | |
194 | 200 | | |
195 | 201 | | |
196 | 202 | | |
| 203 | + | |
| 204 | + | |
197 | 205 | | |
198 | | - | |
| 206 | + | |
199 | 207 | | |
200 | 208 | | |
201 | | - | |
202 | | - | |
203 | | - | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
204 | 212 | | |
205 | | - | |
| 213 | + | |
206 | 214 | | |
207 | 215 | | |
208 | | - | |
209 | | - | |
210 | | - | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
211 | 219 | | |
212 | 220 | | |
213 | 221 | | |
214 | 222 | | |
215 | 223 | | |
216 | 224 | | |
217 | 225 | | |
218 | | - | |
219 | | - | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
220 | 229 | | |
221 | 230 | | |
222 | | - | |
| 231 | + | |
223 | 232 | | |
224 | 233 | | |
225 | | - | |
| 234 | + | |
| 235 | + | |
226 | 236 | | |
227 | 237 | | |
228 | 238 | | |
| |||
0 commit comments