Is there an existing issue for this bug?
🐛 Describe the bug
I trained reward model based on Llama3.1-70B-instruct in 48 H100 (3d tp=8, pp=1, ).
When execute booster.save_model(model, os.path.join(save_dir, "modeling"), shard=True), the size of model.embed_tokens.weight saved is [16064, 8192] rather than [128256, 8192]. However, the size of other weight are correct.
Please HELP ME!
Thank you!
Environment
transformes 4.44.1
colosssalai 0.4.5
flash-attn 2.6.3
Is there an existing issue for this bug?
🐛 Describe the bug
I trained reward model based on Llama3.1-70B-instruct in 48 H100 (3d tp=8, pp=1, ).
When execute
booster.save_model(model, os.path.join(save_dir, "modeling"), shard=True), the size ofmodel.embed_tokens.weightsaved is [16064, 8192] rather than [128256, 8192]. However, the size of other weight are correct.Please HELP ME!
Thank you!
Environment
transformes 4.44.1
colosssalai 0.4.5
flash-attn 2.6.3