|
2 | 2 |
|
3 | 3 | This directory contains self-contained training examples that demonstrate best practices for scaling |
4 | 4 | biological foundation models using [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) |
5 | | -and [nvFSDP](https://github.com/NVIDIA-NeMo/nvFSDP). Each recipe is a complete Docker environment with |
| 5 | +and [megatron-fsdp](https://pypi.org/project/megatron-fsdp/). Each recipe is a complete Docker environment with |
6 | 6 | benchmarked training scripts that users can learn from and adapt for their own research. |
7 | 7 |
|
8 | 8 | ## Philosophy |
@@ -49,7 +49,7 @@ Follow this naming pattern to clearly communicate what your recipe demonstrates: |
49 | 49 |
|
50 | 50 | Examples: |
51 | 51 |
|
52 | | -- `esm2_native_te_nvfsdp/` - ESM-2 with vanilla PyTorch, TransformerEngine, and nvFSDP |
| 52 | +- `esm2_native_te_mfsdp/` - ESM-2 with vanilla PyTorch, TransformerEngine, and megatron-fsdp |
53 | 53 | - `amplify_accelerate_fp8/` - AMPLIFY with HuggingFace Accelerate and FP8 training |
54 | 54 | - `geneformer_lightning_context_parallel/` - Geneformer with PyTorch Lightning and context parallelism |
55 | 55 |
|
@@ -115,16 +115,16 @@ Your `train.py` should be educational and self-explanatory: |
115 | 115 | ```python |
116 | 116 | #!/usr/bin/env python3 |
117 | 117 | """ |
118 | | -ESM-2 training with TransformerEngine and nvFSDP. |
| 118 | +ESM-2 training with TransformerEngine and megatron-fsdp. |
119 | 119 |
|
120 | 120 | This script demonstrates how to: |
121 | 121 | 1. Load and prepare biological sequence data |
122 | 122 | 2. Initialize ESM-2 with TransformerEngine layers |
123 | | -3. Configure nvFSDP for memory-efficient multi-GPU training |
| 123 | +3. Configure megatron-fsdp for memory-efficient multi-GPU training |
124 | 124 | 4. Implement a training loop with proper checkpointing |
125 | 125 |
|
126 | 126 | Key design decisions: |
127 | | -- We use nvFSDP ZeRO-3 for maximum memory efficiency |
| 127 | +- We use megatron-fsdp ZeRO-3 for maximum memory efficiency |
128 | 128 | - TransformerEngine FP8 is enabled for H100+ hardware |
129 | 129 | - Context parallelism handles long biological sequences |
130 | 130 | """ |
@@ -197,7 +197,7 @@ optimizer: |
197 | 197 | # Distributed training |
198 | 198 | distributed: |
199 | 199 | backend: nccl |
200 | | - nvfsdp: |
| 200 | + mfsdp: |
201 | 201 | enable: true |
202 | 202 | sharding_strategy: zero3 |
203 | 203 |
|
@@ -242,7 +242,7 @@ training: |
242 | 242 | num_train_steps: 100 # Enough steps for stable metrics |
243 | 243 |
|
244 | 244 | wandb: |
245 | | - name: "esm2_nvfsdp_benchmark" |
| 245 | + name: "esm2_mfsdp_benchmark" |
246 | 246 | tags: ["L1", "benchmark", "performance"] |
247 | 247 | ``` |
248 | 248 |
|
@@ -411,7 +411,7 @@ docker run --rm -it --gpus all my_recipe pytest -v . |
411 | 411 |
|
412 | 412 | For reference implementations, examine existing recipes: |
413 | 413 |
|
414 | | -- **`esm2_native_te_nvfsdp/`**: Comprehensive example showing vanilla PyTorch with TE and nvFSDP |
| 414 | +- **`esm2_native_te_mfsdp/`**: Comprehensive example showing vanilla PyTorch with TE and megatron-fsdp |
415 | 415 | - **`amplify_accelerate_fp8/`**: HuggingFace Accelerate integration with FP8 training |
416 | 416 | - **`geneformer_lightning_context_parallel/`**: PyTorch Lightning with context parallelism for long sequences |
417 | 417 |
|
|
0 commit comments