Add k-bit blockwise quantization (K=2-5) with warp-level CUDA kernels #2283
| Job | Run time |
|---|---|
| 1m 41s | |
| 6m 23s | |
| 13s | |
| 13s | |
| 1m 7s | |
| 5m 49s | |
| 21s | |
| 4m 40s | |
| 6m 6s | |
| 5m 2s | |
| 5m 8s | |
| 5m 13s | |
| 4m 37s | |
| 4m 38s | |
| 4m 43s | |
| 3m 24s | |
| 4m 25s | |
| 4m 30s | |
| 5m 3s | |
| 5m 58s | |
| 4m 22s | |
| 3m 35s | |
| 5m 13s | |
| 4m 15s | |
| 4m 32s | |
| 4m 47s | |
| 3m 31s | |
| 3m 16s | |
| 4m 49s | |
| 3m 42s | |
| 3m 22s | |
| 6m 35s | |
| 3m 29s | |
| 4m 31s | |
| 4m 56s | |
| 6m 29s | |
| 5m 47s | |
| 6m 22s | |
| 3m 33s | |
| 6m 24s | |
| 6m 36s | |
| 5m 55s | |
| 7m 30s | |
| 5m 46s | |
| 5m 26s | |
| 35s | |
| 46s | |
| 41s | |
| 31s | |
| 1s | |
| 1s | |
| 1s | |
| 3h 26m 33s |