Commit 0eaa3ac
committed
feat(hpc/amx_matmul): TD-T1b — matmul_f32 AMX arm routes through tile kernel
Follow-up to TD-T1 (fe334de). `matmul_f32`'s AMX branch was the same
shape of placebo as `matmul_bf16_to_f32`'s pre-TD-T1: it down-cast f32
→ BF16, then called the scalar `bf16_gemm_f32` reference — never
reaching `TDPBF16PS` even on real AMX silicon.
Factored the BF16 AMX-tile dispatch logic out of `matmul_bf16_to_f32`
into a private `bf16_gemm_with_amx(a, b, c, m, n, k)` helper. Both
public entry points now route through it:
matmul_bf16_to_f32 → bf16_gemm_with_amx (direct BF16 inputs)
matmul_f32 → RNE down-cast → bf16_gemm_with_amx
(f32 in, BF16 compute,
f32 accumulator out)
The helper's behaviour is unchanged from what TD-T1 shipped: 16/16/32-
aligned shapes hit `bf16_tile_gemm_16x16` (TDPBF16PS via asm-byte,
8 192 BF16×BF16 multiplies + 256 f32 accumulates per instruction);
mis-aligned shapes or non-AMX hosts fall back to scalar
`bf16_gemm_f32`. Single source of truth — future Phase-4 mixed-tile-
plus-tail dispatch only needs to land in one place.
Verification:
* 11 amx_matmul tests pass (default v3, no AMX on this host →
scalar fallback exercised; behaviour identical to pre-commit).
* cargo clippy --lib -D warnings clean.
https://claude.ai/code/session_01HbqooFZHAjaUtFEzhA1R2u1 parent fe334de commit 0eaa3ac
1 file changed
Lines changed: 29 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
317 | 317 | | |
318 | 318 | | |
319 | 319 | | |
320 | | - | |
321 | | - | |
322 | | - | |
323 | | - | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
324 | 343 | | |
325 | 344 | | |
326 | 345 | | |
| |||
355 | 374 | | |
356 | 375 | | |
357 | 376 | | |
358 | | - | |
| 377 | + | |
359 | 378 | | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | 379 | | |
364 | 380 | | |
365 | 381 | | |
| |||
381 | 397 | | |
382 | 398 | | |
383 | 399 | | |
384 | | - | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
385 | 404 | | |
386 | 405 | | |
387 | | - | |
| 406 | + | |
388 | 407 | | |
389 | 408 | | |
390 | 409 | | |
| |||
0 commit comments