Commit b02ff66
V8 grouped scalar GEMV + inline work distribution for grouped MMA
Grouped scalar GEMV: ported V8 optimizations (64 threads, 2 warps,
vectorized int4 A loads, __launch_bounds__, M_VAL dispatch 1-4) and
switched from tiled/E4M4 to flat layout with float32 absmax.
Grouped MMA: replaced cudaMemcpy/cudaMalloc/cudaFree work_offsets
computation with inline linear scan over expert_offsets. Caller now
passes max_M directly, eliminating device-to-host sync per call.
Benchmark suite: added grouped_mma kernel type to ncu_driver and
bench_ncu.sh. model_summary.py now shows 5 kernel columns (MMA,
Scalar, Grouped, Grp MMA, fp16) with per-k TOTAL rows.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 3cbaf74 commit b02ff66
File tree
9 files changed
+299
-183
lines changed- benchmarks
- bitsandbytes
- backends/cuda
- csrc
- tests
9 files changed
+299
-183
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
90 | 97 | | |
91 | 98 | | |
92 | 99 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
67 | 68 | | |
68 | 69 | | |
69 | | - | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| |||
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
81 | | - | |
82 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
83 | 85 | | |
84 | 86 | | |
85 | | - | |
| 87 | + | |
86 | 88 | | |
87 | | - | |
88 | | - | |
| 89 | + | |
| 90 | + | |
89 | 91 | | |
90 | 92 | | |
91 | 93 | | |
92 | 94 | | |
93 | | - | |
| 95 | + | |
94 | 96 | | |
95 | 97 | | |
96 | | - | |
97 | | - | |
98 | | - | |
99 | | - | |
100 | 98 | | |
101 | 99 | | |
102 | 100 | | |
| |||
105 | 103 | | |
106 | 104 | | |
107 | 105 | | |
| 106 | + | |
108 | 107 | | |
109 | 108 | | |
110 | 109 | | |
| |||
115 | 114 | | |
116 | 115 | | |
117 | 116 | | |
| 117 | + | |
| 118 | + | |
118 | 119 | | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
123 | 124 | | |
124 | 125 | | |
125 | | - | |
126 | | - | |
127 | | - | |
128 | | - | |
129 | 126 | | |
130 | 127 | | |
131 | | - | |
132 | | - | |
133 | 128 | | |
134 | | - | |
135 | 129 | | |
136 | 130 | | |
137 | 131 | | |
138 | | - | |
139 | | - | |
140 | 132 | | |
141 | 133 | | |
142 | | - | |
143 | 134 | | |
144 | 135 | | |
145 | 136 | | |
146 | | - | |
| 137 | + | |
147 | 138 | | |
148 | 139 | | |
149 | 140 | | |
150 | 141 | | |
151 | | - | |
| 142 | + | |
152 | 143 | | |
153 | 144 | | |
154 | 145 | | |
| |||
158 | 149 | | |
159 | 150 | | |
160 | 151 | | |
| 152 | + | |
161 | 153 | | |
162 | 154 | | |
163 | 155 | | |
| |||
167 | 159 | | |
168 | 160 | | |
169 | 161 | | |
| 162 | + | |
| 163 | + | |
170 | 164 | | |
171 | 165 | | |
172 | 166 | | |
| |||
182 | 176 | | |
183 | 177 | | |
184 | 178 | | |
185 | | - | |
| 179 | + | |
186 | 180 | | |
187 | | - | |
| 181 | + | |
188 | 182 | | |
189 | 183 | | |
190 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
98 | 143 | | |
99 | 144 | | |
100 | 145 | | |
| |||
119 | 164 | | |
120 | 165 | | |
121 | 166 | | |
122 | | - | |
123 | 167 | | |
124 | 168 | | |
125 | 169 | | |
126 | 170 | | |
127 | 171 | | |
128 | | - | |
| 172 | + | |
129 | 173 | | |
130 | | - | |
| 174 | + | |
131 | 175 | | |
132 | 176 | | |
133 | 177 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
606 | 606 | | |
607 | 607 | | |
608 | 608 | | |
609 | | - | |
| 609 | + | |
610 | 610 | | |
611 | 611 | | |
612 | 612 | | |
| |||
621 | 621 | | |
622 | 622 | | |
623 | 623 | | |
| 624 | + | |
624 | 625 | | |
625 | 626 | | |
626 | 627 | | |
| |||
682 | 683 | | |
683 | 684 | | |
684 | 685 | | |
685 | | - | |
| 686 | + | |
686 | 687 | | |
687 | 688 | | |
688 | 689 | | |
| |||
697 | 698 | | |
698 | 699 | | |
699 | 700 | | |
| 701 | + | |
700 | 702 | | |
701 | 703 | | |
702 | 704 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1085 | 1085 | | |
1086 | 1086 | | |
1087 | 1087 | | |
| 1088 | + | |
1088 | 1089 | | |
1089 | 1090 | | |
1090 | 1091 | | |
| |||
1114 | 1115 | | |
1115 | 1116 | | |
1116 | 1117 | | |
| 1118 | + | |
1117 | 1119 | | |
1118 | 1120 | | |
1119 | 1121 | | |
| |||
1193 | 1195 | | |
1194 | 1196 | | |
1195 | 1197 | | |
| 1198 | + | |
1196 | 1199 | | |
1197 | 1200 | | |
1198 | 1201 | | |
1199 | 1202 | | |
1200 | 1203 | | |
1201 | 1204 | | |
1202 | 1205 | | |
1203 | | - | |
| 1206 | + | |
1204 | 1207 | | |
1205 | 1208 | | |
1206 | | - | |
1207 | 1209 | | |
1208 | 1210 | | |
1209 | 1211 | | |
| |||
1222 | 1224 | | |
1223 | 1225 | | |
1224 | 1226 | | |
| 1227 | + | |
1225 | 1228 | | |
1226 | 1229 | | |
1227 | 1230 | | |
0 commit comments