Currently we support warp-based parallel scan for Vulkan and CUDA. Lets use this issue to track some performance data:
ENV: RTX3080 with Driver 510. CUDA 11.6.
| Number of elements |
Vulkan |
CUDA |
| 131072 |
0.348 ms |
0.160 ms |
| 65536 |
0.308 ms |
0.111 ms |
| 32768 |
0.311 ms |
0.114 ms |
| 16384 |
0.232 ms |
0.082 ms |
| 8192 |
0.222 ms |
0.075 ms |
| 4096 |
0.183 ms |
0.075 ms |
Currently we support warp-based parallel scan for Vulkan and CUDA. Lets use this issue to track some performance data:
ENV: RTX3080 with Driver 510. CUDA 11.6.