Add CUDA kernel support for 4-bit quantization with blocksize=32 (#1854) #2152
| Job | Run time |
|---|---|
| 2m 11s | |
| 8m 30s | |
| 4m 32s | |
| 20s | |
| 7m 36s | |
| 46s | |
| 6m 8s | |
| 8m 12s | |
| 5m 50s | |
| 13s | |
| 5m 11s | |
| 4m 35s | |
| 3m 45s | |
| 5m 16s | |
| 4m 22s | |
| 4m 51s | |
| 3m 28s | |
| 4m 11s | |
| 4m 51s | |
| 3m 29s | |
| 5m 19s | |
| 4m 25s | |
| 14s | |
| 4m 0s | |
| 3m 39s | |
| 2m 12s | |
| 3m 52s | |
| 5m 58s | |
| 3m 37s | |
| 5m 59s | |
| 4m 52s | |
| 7m 5s | |
| 5m 51s | |
| 6m 4s | |
| 4m 59s | |
| 4m 17s | |
| 6m 27s | |
| 4m 14s | |
| 6m 20s | |
| 6m 14s | |
| 4m 33s | |
| 5m 16s | |
| 6m 53s | |
| 5m 54s | |
| 4m 53s | |
| 0s | |
| 0s | |
| 0s | |
| 0s | |
| 3h 31m 24s |