Optimize MMA kernel for small M: TILE_N=64 + multi-block-per-SM k_splits #2208
| Job | Run time |
|---|---|
| 7m 29s | |
| 38s | |
| 5m 42s | |
| 16s | |
| 11s | |
| 14s | |
| 1m 36s | |
| 3m 54s | |
| 5m 15s | |
| 5m 18s | |
| 5m 7s | |
| 5m 35s | |
| 2m 24s | |
| 2m 18s | |
| 2m 25s | |
| 2m 4s | |
| 2m 13s | |
| 2m 15s | |
| 2m 22s | |
| 2m 28s | |
| 2m 46s | |
| 2m 19s | |
| 2m 29s | |
| 2m 37s | |
| 5m 48s | |
| 2m 37s | |
| 2m 49s | |
| 5m 50s | |
| 2m 50s | |
| 3m 2s | |
| 5m 49s | |
| 2m 47s | |
| 2m 22s | |
| 3m 11s | |
| 2m 42s | |
| 6m 2s | |
| 2m 40s | |
| 2m 17s | |
| 5m 27s | |
| 5m 3s | |
| 6m 43s | |
| 5m 53s | |
| 7m 2s | |
| 6m 18s | |
| 6m 22s | |
| 0s | |
| 0s | |
| 0s | |
| 0s | |
| 2h 43m 29s |