Commit bfe0916
fix: Remap CuTE M-index from interleaved to sequential row order
The CuTE ALayout for m16n8k64 MMA uses column-major indexing which
interleaves rows: [0,8], [1,9], [2,10], ... But the SM80_16x8_Row
output layout expects consecutive row pairs: [0,1], [2,3], ...
Diagnostic showed: A row 0 → D[0], A row 8 → D[1], A row 1 → D[2],
which means the MMA maps CuTE m-indices 0,8,1,9,... to output rows
0,1,2,3,...
Fix: Remap when loading A data and SFA scales:
actual_m = (cute_m % 8) * 2 + cute_m / 8
This ensures A row i goes to output row i. Applied to both A data
loading and SFA scale loading. B and SFB are unaffected (no
interleaving issue for N-dimension).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent e749d15 commit bfe0916
1 file changed
+12
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
176 | 181 | | |
177 | 182 | | |
178 | 183 | | |
| |||
183 | 188 | | |
184 | 189 | | |
185 | 190 | | |
186 | | - | |
| 191 | + | |
187 | 192 | | |
| 193 | + | |
| 194 | + | |
188 | 195 | | |
189 | 196 | | |
190 | 197 | | |
| |||
246 | 253 | | |
247 | 254 | | |
248 | 255 | | |
249 | | - | |
250 | | - | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
251 | 260 | | |
252 | 261 | | |
253 | 262 | | |
| |||
0 commit comments