Skip to content

Optimize MMA kernel for small M: TILE_N=64 + multi-block-per-SM k_splits #2208

Optimize MMA kernel for small M: TILE_N=64 + multi-block-per-SM k_splits

Optimize MMA kernel for small M: TILE_N=64 + multi-block-per-SM k_splits #2208

Job Run time
7m 29s
38s
5m 42s
16s
11s
14s
1m 36s
3m 54s
5m 15s
5m 18s
5m 7s
5m 35s
2m 24s
2m 18s
2m 25s
2m 4s
2m 13s
2m 15s
2m 22s
2m 28s
2m 46s
2m 19s
2m 29s
2m 37s
5m 48s
2m 37s
2m 49s
5m 50s
2m 50s
3m 2s
5m 49s
2m 47s
2m 22s
3m 11s
2m 42s
6m 2s
2m 40s
2m 17s
5m 27s
5m 3s
6m 43s
5m 53s
7m 2s
6m 18s
6m 22s
0s
0s
0s
0s
2h 43m 29s