Commit 5bb250c
Ljubomir Josifovski
Bugfixes to the GLA metal kernel 1) Grid dispatch was wrong (S/nsg, S/4, H*n_seqs) - correct to (1, S/4, H*n_seqs). The buggy version dispatched 32x too many threadgroups in x, all of them computing the same i-dimension?!? 2) The kernel was missing from the scheduler routing so was never routed to even when present. TBS looking like TG 54 tok/s (from 32 t/s), PP 115 tok/s (from 75 t/s). Major win
1 parent aa37e54 commit 5bb250c
3 files changed
Lines changed: 8 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1191 | 1191 | | |
1192 | 1192 | | |
1193 | 1193 | | |
| 1194 | + | |
| 1195 | + | |
1194 | 1196 | | |
1195 | 1197 | | |
1196 | 1198 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1717 | 1717 | | |
1718 | 1718 | | |
1719 | 1719 | | |
1720 | | - | |
| 1720 | + | |
1721 | 1721 | | |
1722 | 1722 | | |
1723 | 1723 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3149 | 3149 | | |
3150 | 3150 | | |
3151 | 3151 | | |
3152 | | - | |
3153 | | - | |
3154 | | - | |
3155 | 3152 | | |
3156 | 3153 | | |
| 3154 | + | |
| 3155 | + | |
| 3156 | + | |
| 3157 | + | |
| 3158 | + | |
3157 | 3159 | | |
3158 | 3160 | | |
3159 | 3161 | | |
| |||
0 commit comments