Commit 49a7564
ggml webgpu: fix workgroup dispatch limit for large batch sizes (ggml-org#19965)
* ggml-webgpu: fix workgroup dispatch limit for large batch sizes
WebGPU limits workgroup sizes to 65535 per dimension. Large MUL_MAT
operations with batch sizes exceedeing this limi would fail.
* add compute_2d_workgroups() helper to split total workgroup ID across
X/Y dimensions
* update mul_mat_reg_tile.wgsl to reconstruct linear workgroup ID from 2D
dispatch
* update mul_mat_subgroup_matrix.wgsl to reconstruct linear workgroup ID
from 2D dispatch
* update mul_mat.wgsl to compute global index from 2D workgroup
coordinates
* refactor all three mul_mat dispatch paths to use the shared helper
* ggml-webgpu: add bounds checking for over-dispatched workgroups
2D workgroup dispatch can over-dispatch when total workgroups don't
divide evenly into the 65535 per-dimension limit. Extra workgroups
would compute invalid batch indices, causing memory corruption.
* add batch_idx bound check to mul_mat_reg_tile.wgsl and
mul_mat_subgroup_matrix.wgsl to prevent over-dispatched workgroups
from accessing invalid memory
* fixes test failures with large batch sizes (eg., bs=[128, 1024])
* ggml-webgpu: add back TODO for spliting large sizes into batches
* Optimize 2d workgroup provisioning
* Set some parameters that increase speed
---------
Co-authored-by: Reese Levine <reeselevine1@gmail.com>1 parent 4d828bd commit 49a7564
4 files changed
Lines changed: 49 additions & 20 deletions
File tree
- ggml/src/ggml-webgpu
- wgsl-shaders
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
34 | 41 | | |
35 | 42 | | |
36 | 43 | | |
| |||
69 | 76 | | |
70 | 77 | | |
71 | 78 | | |
72 | | - | |
73 | | - | |
| 79 | + | |
| 80 | + | |
74 | 81 | | |
75 | 82 | | |
76 | 83 | | |
| |||
1146 | 1153 | | |
1147 | 1154 | | |
1148 | 1155 | | |
1149 | | - | |
1150 | | - | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
1151 | 1159 | | |
1152 | 1160 | | |
1153 | 1161 | | |
1154 | 1162 | | |
1155 | 1163 | | |
1156 | 1164 | | |
1157 | 1165 | | |
1158 | | - | |
1159 | | - | |
1160 | | - | |
| 1166 | + | |
1161 | 1167 | | |
1162 | 1168 | | |
1163 | 1169 | | |
| |||
1176 | 1182 | | |
1177 | 1183 | | |
1178 | 1184 | | |
1179 | | - | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
1180 | 1188 | | |
1181 | 1189 | | |
1182 | 1190 | | |
1183 | | - | |
1184 | | - | |
| 1191 | + | |
| 1192 | + | |
1185 | 1193 | | |
1186 | 1194 | | |
1187 | 1195 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
679 | 679 | | |
680 | 680 | | |
681 | 681 | | |
682 | | - | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
683 | 688 | | |
684 | | - | |
| 689 | + | |
685 | 690 | | |
686 | 691 | | |
687 | 692 | | |
688 | 693 | | |
689 | 694 | | |
690 | 695 | | |
691 | | - | |
| 696 | + | |
692 | 697 | | |
693 | 698 | | |
694 | | - | |
| 699 | + | |
695 | 700 | | |
696 | 701 | | |
697 | 702 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
| 57 | + | |
| 58 | + | |
58 | 59 | | |
59 | 60 | | |
60 | 61 | | |
| |||
64 | 65 | | |
65 | 66 | | |
66 | 67 | | |
67 | | - | |
| 68 | + | |
68 | 69 | | |
69 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
70 | 78 | | |
71 | 79 | | |
72 | 80 | | |
| |||
Lines changed: 11 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
72 | | - | |
| 72 | + | |
| 73 | + | |
73 | 74 | | |
74 | 75 | | |
75 | 76 | | |
| |||
79 | 80 | | |
80 | 81 | | |
81 | 82 | | |
82 | | - | |
| 83 | + | |
83 | 84 | | |
84 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
85 | 93 | | |
86 | 94 | | |
87 | 95 | | |
| |||
0 commit comments