Commit fb83cc9
authored
CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360)
* Add missing syncthreads before resuing cub_temp_storage
__syncthreads() is required before being allowed to resue TempStorage
smem:
https://nvidia.github.io/cccl/unstable/cub/api/classcub_1_1BlockLoad.html#_CPPv4I0EN3cub9BlockLoad4LoadEv20RandomAccessIteratorRA14ItemsPerThread_1Ti
* Add one more missing __syncthreads
Could also double-buffer, but alternative is to simply ensure all
threads have read smem* before writing to it again in the next loop
iteration
* Remove unused smem from ssm_scan_f321 parent 039e20a commit fb83cc9
1 file changed
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
| |||
105 | 106 | | |
106 | 107 | | |
107 | 108 | | |
| 109 | + | |
108 | 110 | | |
109 | 111 | | |
110 | 112 | | |
| |||
249 | 251 | | |
250 | 252 | | |
251 | 253 | | |
252 | | - | |
253 | 254 | | |
254 | | - | |
| 255 | + | |
255 | 256 | | |
256 | 257 | | |
257 | 258 | | |
| |||
0 commit comments