We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 75515f5 commit 6aded53Copy full SHA for 6aded53
1 file changed
deepmd/utils/argcheck.py
@@ -3948,6 +3948,7 @@ def training_args(
3948
"but reduces optimizer memory to 1/N per GPU. "
3949
"2: FSDP2 stage-2, shards optimizer states and gradients; same communication "
3950
"volume as stage-1 but further reduces gradient memory to 1/N per GPU. "
3951
+ "Stages 2 and 3 require FSDP2, which is available in PyTorch >= 2.6. "
3952
"Note: FSDP2 introduces DTensor dispatch overhead that can slow down "
3953
"models with many small layers; use torch.compile to mitigate. "
3954
"3: FSDP2 stage-3, shards parameters as well; maximum memory savings but "
0 commit comments