Add k-bit blockwise quantization (K=2-5) with warp-level CUDA kernels #2159
| Job | Run time |
|---|---|
| 1m 33s | |
| 15s | |
| 5m 47s | |
| 22s | |
| 1m 12s | |
| 14s | |
| 4m 30s | |
| 4m 53s | |
| 4m 34s | |
| 4m 59s | |
| 4m 19s | |
| 5m 51s | |
| 4m 40s | |
| 4m 6s | |
| 4m 22s | |
| 4m 11s | |
| 4m 27s | |
| 4m 26s | |
| 4m 23s | |
| 5m 49s | |
| 5m 49s | |
| 3m 19s | |
| 4m 45s | |
| 3m 40s | |
| 3m 24s | |
| 5m 50s | |
| 4m 39s | |
| 5m 11s | |
| 5m 49s | |
| 4m 21s | |
| 3m 30s | |
| 5m 48s | |
| 5m 49s | |
| 4m 17s | |
| 4m 56s | |
| 3m 19s | |
| 4m 33s | |
| 4m 4s | |
| 5m 49s | |
| 3m 42s | |
| 4m 25s | |
| 3m 23s | |
| 5m 44s | |
| 5m 49s | |
| 4m 53s | |
| 1s | |
| 1s | |
| 1s | |
| 1s | |
| 3h 11m 45s |