Skip to content

FSDP saving state improvement #3401

@nastya236

Description

@nastya236

Currently in FSDP skips intermediate reconstruction of tree. This results in the following issues:

  • grouping for the parameters might be non-deterministic -> loaded state might be incorrect
  • state tight to the number of groups -- so to the size of the communication, which is wrong design

While current strategy works deterministically if everything is fixed from run to run, however improvement is needed.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions