Commit 4c0e73a
[rocm-libraries] ROCm/rocm-libraries#6156 (commit 367565a)
[CK_TILE] Optimize FMHA head-dim padded path on gfx11/gfx12
(#6156)
## Motivation
On gfx11/gfx12, FMHA forward kernels that require head-dim padding show
a large performance drop compared to the exact-head-dim path. In
practice, padded cases such as `HDIM=72` and `HDIM=80` were falling too
far off the fast path.
This PR improves padded-head-dim FMHA performance on gfx11/gfx12 while
keeping the behavior for other GPUs unchanged.
## Technical Details
- Add/scope a dedicated padded-head-dim (`qr_hpad`) FMHA forward path
for gfx11/gfx12.
- For `receipt=0`, keep support conservative and only enable the padded
fast path for vector-safe cases (`head_dim % 8 == 0`), matching the
existing assumption used on other GPUs.
- Move `v_prefetch` later only for the head-dim-padded path on
gfx11/gfx12. This reduces live ranges and removes the register-spill
behavior seen in the earlier scheduling.
- Enable the buffer-load OOB check offset trick for the padded path on
gfx11/gfx12.
## Test Plan
./build/bin/tile_example_fmha_fwd -prec=bf16 -mode={0/1} -b=1 -h=16
-d={72/80} -s={seqlen} -s_k={seqlen} -lse=0 -iperm={0/1} -operm={0/1}
## Test Result
Observed padded-head-dim performance improvements for HDIM=72/80:
- gfx11: about ~3.5x
- gfx1151: about ~2.0x
- gfx12: about ~1.3x
## Submission Checklist
- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.1 parent 7d6c8e5 commit 4c0e73a
4 files changed
Lines changed: 144 additions & 26 deletions
File tree
- example/ck_tile/01_fmha/codegen
- ops
- include/ck_tile/ops/fmha/pipeline
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
| 142 | + | |
142 | 143 | | |
143 | 144 | | |
144 | 145 | | |
| |||
147 | 148 | | |
148 | 149 | | |
149 | 150 | | |
| 151 | + | |
150 | 152 | | |
151 | 153 | | |
152 | 154 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
63 | 79 | | |
64 | 80 | | |
65 | 81 | | |
| |||
300 | 316 | | |
301 | 317 | | |
302 | 318 | | |
303 | | - | |
| 319 | + | |
304 | 320 | | |
305 | 321 | | |
306 | 322 | | |
| |||
323 | 339 | | |
324 | 340 | | |
325 | 341 | | |
326 | | - | |
| 342 | + | |
327 | 343 | | |
328 | 344 | | |
329 | 345 | | |
| |||
344 | 360 | | |
345 | 361 | | |
346 | 362 | | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
347 | 368 | | |
348 | 369 | | |
349 | 370 | | |
| |||
361 | 382 | | |
362 | 383 | | |
363 | 384 | | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
364 | 390 | | |
365 | 391 | | |
366 | 392 | | |
| |||
634 | 660 | | |
635 | 661 | | |
636 | 662 | | |
| 663 | + | |
637 | 664 | | |
638 | 665 | | |
639 | 666 | | |
| |||
643 | 670 | | |
644 | 671 | | |
645 | 672 | | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
646 | 679 | | |
647 | 680 | | |
648 | 681 | | |
| |||
651 | 684 | | |
652 | 685 | | |
653 | 686 | | |
654 | | - | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
655 | 690 | | |
656 | 691 | | |
657 | 692 | | |
| |||
1144 | 1179 | | |
1145 | 1180 | | |
1146 | 1181 | | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
| 1196 | + | |
| 1197 | + | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
| 1202 | + | |
| 1203 | + | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
1147 | 1213 | | |
1148 | 1214 | | |
1149 | 1215 | | |
| |||
1152 | 1218 | | |
1153 | 1219 | | |
1154 | 1220 | | |
1155 | | - | |
| 1221 | + | |
| 1222 | + | |
1156 | 1223 | | |
1157 | 1224 | | |
1158 | 1225 | | |
| |||
1179 | 1246 | | |
1180 | 1247 | | |
1181 | 1248 | | |
1182 | | - | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
1183 | 1252 | | |
1184 | 1253 | | |
1185 | 1254 | | |
| |||
1251 | 1320 | | |
1252 | 1321 | | |
1253 | 1322 | | |
1254 | | - | |
| 1323 | + | |
| 1324 | + | |
| 1325 | + | |
1255 | 1326 | | |
1256 | 1327 | | |
1257 | 1328 | | |
| |||
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
43 | 50 | | |
Lines changed: 58 additions & 20 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
| |||
54 | 56 | | |
55 | 57 | | |
56 | 58 | | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
| |||
80 | 83 | | |
81 | 84 | | |
82 | 85 | | |
| 86 | + | |
| 87 | + | |
83 | 88 | | |
84 | 89 | | |
85 | 90 | | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
90 | 97 | | |
91 | 98 | | |
92 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
93 | 102 | | |
94 | 103 | | |
95 | 104 | | |
96 | 105 | | |
97 | 106 | | |
98 | 107 | | |
99 | | - | |
| 108 | + | |
100 | 109 | | |
101 | 110 | | |
102 | 111 | | |
| |||
548 | 557 | | |
549 | 558 | | |
550 | 559 | | |
551 | | - | |
552 | | - | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
553 | 579 | | |
554 | 580 | | |
555 | 581 | | |
| |||
562 | 588 | | |
563 | 589 | | |
564 | 590 | | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
565 | 595 | | |
566 | 596 | | |
567 | 597 | | |
| |||
819 | 849 | | |
820 | 850 | | |
821 | 851 | | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
822 | 857 | | |
823 | 858 | | |
824 | 859 | | |
| |||
1098 | 1133 | | |
1099 | 1134 | | |
1100 | 1135 | | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
1101 | 1139 | | |
0 commit comments