Skip to content

[feature] Support MultiStorageClient (MSC) for FSDP checkpoints #3261

@pavelgein

Description

@pavelgein

User problem

Now, FSDP checkpoints are saved unconditionally to file system, see save_checkpoint function or example.
It would be nice to have an option to save checkpoints with MSC.

Moreover, the current behaviour is error-prone: if set checkpoint.save to MSC path (s3 for example), then some files (like trainer_state.pt) will be saved to S3, but the checkpoint itself will be saved on the local filesystem.

Desired outcome

Option to save FSDP checkpoints with MSC.

Alternatives considered

No response

Affected area

area:ckpt

Urgency / use case

Important but not blocking

Extra context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions