Commit a7ed94f
* Use vectorized stores for dropout randvals
With no kPadSeqLenK the kernel uses 2 buffer_store_dwordx2 instead of
16 buffer_store_byte. This requires less registers and reduces spilling.
* Calculate dropout randvals for storing and applying only once
Even though it may add a small overhead when storing is not required,
it uses significantly less registers and hence no spilling.
Co-authored-by: Anton Gorenko <anton@streamhpc.com>
1 parent 6e9f0e9 commit a7ed94f
11 files changed
Lines changed: 37 additions & 16 deletions
File tree
- include/ck_tile/ops/fmha
- block
- kernel
- pipeline
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | 336 | | |
351 | 337 | | |
352 | 338 | | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
353 | 345 | | |
354 | 346 | | |
355 | 347 | | |
| |||
369 | 361 | | |
370 | 362 | | |
371 | 363 | | |
| 364 | + | |
372 | 365 | | |
| 366 | + | |
373 | 367 | | |
374 | 368 | | |
375 | 369 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1005 | 1005 | | |
1006 | 1006 | | |
1007 | 1007 | | |
1008 | | - | |
| 1008 | + | |
1009 | 1009 | | |
1010 | 1010 | | |
1011 | 1011 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1450 | 1450 | | |
1451 | 1451 | | |
1452 | 1452 | | |
1453 | | - | |
| 1453 | + | |
1454 | 1454 | | |
1455 | 1455 | | |
1456 | 1456 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
80 | 80 | | |
81 | 81 | | |
82 | 82 | | |
| 83 | + | |
| 84 | + | |
83 | 85 | | |
84 | 86 | | |
85 | 87 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
| 84 | + | |
| 85 | + | |
84 | 86 | | |
85 | 87 | | |
86 | 88 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
90 | 90 | | |
91 | 91 | | |
92 | 92 | | |
| 93 | + | |
| 94 | + | |
93 | 95 | | |
94 | 96 | | |
95 | 97 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| 72 | + | |
| 73 | + | |
72 | 74 | | |
73 | 75 | | |
74 | 76 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| 77 | + | |
| 78 | + | |
77 | 79 | | |
78 | 80 | | |
79 | 81 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
81 | 83 | | |
82 | 84 | | |
83 | 85 | | |
| |||
0 commit comments