Commit 2b3ea9e
committed
feat(training): Optimize DataLoader for performance
Implements several standard PyTorch optimizations to address a severe performance bottleneck in the training script.
- Enables cuDNN autotuning (`benchmark=True`, `deterministic=False`) for faster GPU kernel selection.
- Configures the DataLoader for high-performance GPU training by:
- Setting `num_workers` to a reasonable maximum.
- Enabling `pin_memory` for faster CPU-to-GPU data transfers.
- Using `persistent_workers` to avoid worker respawn overhead between epochs.
- Adds the missing `os` import to the training script.
- Cleans up the debug configuration to be non-destructive and avoid conflicting settings.1 parent c4f28e5 commit 2b3ea9e
2 files changed
Lines changed: 13 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | | - | |
10 | | - | |
| 9 | + | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
167 | 168 | | |
168 | 169 | | |
169 | 170 | | |
170 | | - | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
171 | 175 | | |
172 | 176 | | |
173 | 177 | | |
174 | 178 | | |
175 | | - | |
176 | | - | |
177 | | - | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
178 | 182 | | |
179 | | - | |
| 183 | + | |
180 | 184 | | |
181 | 185 | | |
182 | 186 | | |
| |||
0 commit comments