Commit 3c2e6ce
authored
updating esm2 native recipe (#1078)
* Adds separate `train_ddp.py`, `train_fsdp2.py`, and `train_nvfsdp.py`
entrypoints
* Adds comparison against FA-2 based HF transformers model
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Distributed ESM‑2 training entrypoints (DDP, MFSDP/FSDP2) and a shared
linear warmup/decay LR scheduler.
* **Configuration**
* Switched defaults to nvFSDP-style sharding; updated model identifiers
and training hyperparameters (train steps, warmup, optimizer LR).
* **Documentation**
* Added new ESM‑2 training README; removed an outdated README.
* **Build/Chores**
* Install Transformers from the Git repo; Docker builds can use netrc
credentials for installs.
* **Tests**
* Added extensive single‑ and multi‑GPU training tests; removed obsolete
tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>1 parent 7c18697 commit 3c2e6ce
25 files changed
Lines changed: 596 additions & 259 deletions
File tree
- .devcontainer/recipes
- recipes
- amplify_accelerate_te_fp8
- esm2_native_te_mfsdp
- hydra_config
- esm2_native_te_nvfsdp_thd
- esm2_native_te_nvfsdp
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
138 | | - | |
139 | | - | |
140 | | - | |
| 138 | + | |
141 | 139 | | |
142 | 140 | | |
143 | 141 | | |
| |||
306 | 304 | | |
307 | 305 | | |
308 | 306 | | |
309 | | - | |
| 307 | + | |
310 | 308 | | |
311 | 309 | | |
312 | 310 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
190 | | - | |
| 190 | + | |
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
File renamed without changes.
File renamed without changes.
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
File renamed without changes.
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
17 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
0 commit comments