Commit f22e3a9
authored
webgpu: Optimize DP4A SmallM MatMulNBits tiling (#27910)
This pull request adjusts the tiling strategy for small matrix sizes in
the DP4A matmul kernel. The changes are aimed at improving performance
and compatibility, especially for specific GPU vendors.
On Qualcomm, improving token generation from ~20 tps to ~25 tps.1 parent 048e7dc commit f22e3a9
2 files changed
Lines changed: 3 additions & 7 deletions
Lines changed: 2 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
131 | | - | |
132 | | - | |
| 131 | + | |
| 132 | + | |
133 | 133 | | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | 134 | | |
139 | 135 | | |
140 | 136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
233 | | - | |
| 233 | + | |
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
| |||
0 commit comments