You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(merger): re-tie weights to avoid duplicating tied parameters
FSDP saves tied parameters (e.g. lm_head <-> embed_tokens) as
independent shards. After load_state_dict(..., assign=True) they
become separate tensors and save_pretrained writes both, bloating
the merged checkpoint. Re-tie when the model declares tying and the
saved tensors agree, otherwise warn and skip.
0 commit comments