Commit f1748ac
committed
Add mixed-type Metal FA kernels for auto-asymmetric K/V
turbo4_1 and turbo3_1 auto-promote K by 1 bit (K=turbo5_1/V=turbo4_1).
Previously this fell back to CPU scalar attention (47 t/s).
Now with mixed-type Metal flash attention kernels: 73 t/s (+53%).
Changes:
- ggml-metal.metal: 8 new FA kernel instantiations for mixed K/V
(4 batched + 4 vec, for turbo and rq auto-asymmetric pairs)
- ggml-metal-device.cpp: pipeline naming includes V type when K!=V
- ggml-metal-device.m: allow mixed turbo/rq types in supports_op
- ggml-metal-ops.cpp: relax K==V type assertion for turbo types
Results (gpt-oss-120b, M3 Ultra):
turbo4_1: 47→73 t/s (+53%), correct output
turbo3_1: 47→75 t/s (+59%), marginal quality
turbo5_1: 76 t/s (unchanged, symmetric)
q8_0: 80 t/s (baseline)1 parent 94a8ba9 commit f1748ac
4 files changed
Lines changed: 49 additions & 12 deletions
File tree
- ggml/src/ggml-metal
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1321 | 1321 | | |
1322 | 1322 | | |
1323 | 1323 | | |
1324 | | - | |
1325 | | - | |
1326 | | - | |
1327 | | - | |
1328 | | - | |
| 1324 | + | |
| 1325 | + | |
| 1326 | + | |
| 1327 | + | |
| 1328 | + | |
| 1329 | + | |
| 1330 | + | |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
1329 | 1337 | | |
1330 | 1338 | | |
1331 | 1339 | | |
| |||
1384 | 1392 | | |
1385 | 1393 | | |
1386 | 1394 | | |
1387 | | - | |
1388 | | - | |
1389 | | - | |
1390 | | - | |
1391 | | - | |
| 1395 | + | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
| 1406 | + | |
1392 | 1407 | | |
1393 | 1408 | | |
1394 | 1409 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1157 | 1157 | | |
1158 | 1158 | | |
1159 | 1159 | | |
1160 | | - | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
1161 | 1166 | | |
1162 | 1167 | | |
1163 | 1168 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2633 | 2633 | | |
2634 | 2634 | | |
2635 | 2635 | | |
2636 | | - | |
| 2636 | + | |
| 2637 | + | |
2637 | 2638 | | |
2638 | 2639 | | |
2639 | 2640 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6702 | 6702 | | |
6703 | 6703 | | |
6704 | 6704 | | |
| 6705 | + | |
| 6706 | + | |
| 6707 | + | |
| 6708 | + | |
| 6709 | + | |
| 6710 | + | |
| 6711 | + | |
| 6712 | + | |
| 6713 | + | |
| 6714 | + | |
6705 | 6715 | | |
6706 | 6716 | | |
6707 | 6717 | | |
| |||
7320 | 7330 | | |
7321 | 7331 | | |
7322 | 7332 | | |
| 7333 | + | |
| 7334 | + | |
| 7335 | + | |
| 7336 | + | |
| 7337 | + | |
| 7338 | + | |
7323 | 7339 | | |
7324 | 7340 | | |
7325 | 7341 | | |
| |||
0 commit comments