Commit fca6ead
authored
make sure we initialize accelerator before model (#1132)
We need to initialize the `Accelerator` object before creating TE layers
or they all end up on a single device
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- New Features
- Added a ready-to-run performance test preset for the esm2 t48 15B
model with sensible defaults: model tag, step cap, batch sizes, learning
rate, weight decay, warmup steps, and Weights & Biases logging.
- Bug Fixes
- Improved multi-GPU initialization by starting distributed state
earlier, reducing setup issues and OOM risk without changing training
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>1 parent 0d30652 commit fca6ead
2 files changed
Lines changed: 24 additions & 2 deletions
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
38 | 48 | | |
39 | 49 | | |
40 | 50 | | |
| |||
57 | 67 | | |
58 | 68 | | |
59 | 69 | | |
60 | | - | |
61 | | - | |
62 | 70 | | |
63 | 71 | | |
64 | 72 | | |
| |||
0 commit comments