Commit 4898af2
authored
Metal backend: Add SDPA head_dim=256 support (#18875)
Qwen 3.5 MoE uses head_dim=256 for full attention layers. The existing
SDPA Metal kernel only instantiated head_dim 64, 96, 128. At D=256 each
thread handles 8 QK elements (8 x 32 threads = 256 dims); register
pressure and threadgroup memory are well within Apple GPU limits.1 parent ad27a45 commit 4898af2
2 files changed
Lines changed: 31 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
226 | 226 | | |
227 | 227 | | |
228 | 228 | | |
229 | | - | |
| 229 | + | |
| 230 | + | |
230 | 231 | | |
231 | 232 | | |
232 | 233 | | |
| |||
430 | 431 | | |
431 | 432 | | |
432 | 433 | | |
433 | | - | |
| 434 | + | |
434 | 435 | | |
435 | | - | |
436 | | - | |
437 | | - | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
438 | 439 | | |
439 | 440 | | |
440 | 441 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
639 | 639 | | |
640 | 640 | | |
641 | 641 | | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
642 | 667 | | |
643 | 668 | | |
644 | 669 | | |
| |||
0 commit comments