Commit a83c73a
authored
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (ggml-org#19042)
* [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full
With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline.
Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size.
* Set the env variable in the CUDA backend registry allocation
* Add link to PR in code comment
* Remove warning logs and update documentation1 parent fc3cdf3 commit a83c73a
2 files changed
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
248 | 248 | | |
249 | 249 | | |
250 | 250 | | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
251 | 259 | | |
252 | 260 | | |
253 | 261 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4876 | 4876 | | |
4877 | 4877 | | |
4878 | 4878 | | |
| 4879 | + | |
| 4880 | + | |
| 4881 | + | |
| 4882 | + | |
| 4883 | + | |
| 4884 | + | |
| 4885 | + | |
| 4886 | + | |
| 4887 | + | |
| 4888 | + | |
4879 | 4889 | | |
4880 | 4890 | | |
4881 | 4891 | | |
| |||
0 commit comments