Commit 25c1aa5
Support multiple GPUs: one pipeline per GPU, --jobs applied per GPU
Previously every runner bound device 0, so --jobs>1 just piled onto one
GPU and extra GPUs sat idle. Now:
- cuda::device_count() enumerates GPUs; Gpu::load_first takes a device
ordinal and binds it on the calling thread.
- run_worker builds one independent pipeline (prefetch -> N runners ->
N uploaders) per selected GPU, so --jobs is the concurrency *per GPU*
(2 GPUs, --jobs 3 => 6 concurrent runs).
- New --gpus selector (e.g. "0,2"); default is every detected GPU.
- A shared per-content-hash download lock (Downloads) coordinates the
now-multiple prefetch threads so two GPUs fetching the same job's blobs
don't race on the same file in the shared cache; the existing shared
in-flight set already dedupes a fragment handed to two GPUs.
- Per-fragment "running" log and startup banner now name the GPU.
Unit tests for --gpus parsing/validation. Single-GPU path verified
end-to-end (claim -> run on GPU#0 -> next fragment prefetched under
--jobs 2). Bump version to 0.1.9.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent 1f38213 commit 25c1aa5
4 files changed
Lines changed: 178 additions & 50 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
122 | 133 | | |
123 | 134 | | |
124 | 135 | | |
| |||
127 | 138 | | |
128 | 139 | | |
129 | 140 | | |
130 | | - | |
131 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
132 | 145 | | |
133 | 146 | | |
134 | 147 | | |
135 | 148 | | |
136 | 149 | | |
137 | 150 | | |
138 | 151 | | |
139 | | - | |
| 152 | + | |
140 | 153 | | |
141 | 154 | | |
142 | 155 | | |
143 | | - | |
| 156 | + | |
144 | 157 | | |
145 | 158 | | |
146 | 159 | | |
| |||
359 | 372 | | |
360 | 373 | | |
361 | 374 | | |
362 | | - | |
| 375 | + | |
363 | 376 | | |
364 | 377 | | |
365 | 378 | | |
| |||
0 commit comments