User problem
Now, FSDP checkpoints are saved unconditionally to file system, see save_checkpoint function or example.
It would be nice to have an option to save checkpoints with MSC.
Moreover, the current behaviour is error-prone: if set checkpoint.save to MSC path (s3 for example), then some files (like trainer_state.pt) will be saved to S3, but the checkpoint itself will be saved on the local filesystem.
Desired outcome
Option to save FSDP checkpoints with MSC.
Alternatives considered
No response
Affected area
area:ckpt
Urgency / use case
Important but not blocking
Extra context
No response
User problem
Now, FSDP checkpoints are saved unconditionally to file system, see save_checkpoint function or example.
It would be nice to have an option to save checkpoints with MSC.
Moreover, the current behaviour is error-prone: if set
checkpoint.saveto MSC path (s3 for example), then some files (liketrainer_state.pt) will be saved to S3, but the checkpoint itself will be saved on the local filesystem.Desired outcome
Option to save FSDP checkpoints with MSC.
Alternatives considered
No response
Affected area
area:ckpt
Urgency / use case
Important but not blocking
Extra context
No response