fix: update TE GroupedLinear integration for single-parameter mode#1680
fix: update TE GroupedLinear integration for single-parameter mode#1680SwekeR-463 wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>
Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>
|
/ok to test 5346557 |
|
Hi @SwekeR-463, thanks a lot for your contribution. Do you have an example wandb run verifying the convergence before and after this change? |
I haven't done any runs, will do and update. |
|
/ok to test bf85bc9 |
Hello @hemildesai, I attempted to run experiments to verify convergence before and after the change, but ran into repeated setup issues on my end. Rather than delaying further, I wanted to inform. 🙂 |
|
/ok to test f6f3fcd |
|
Hi @SwekeR-463 I restarted CI, I apologize for the long delay. |
What does this PR do ?
Update
GroupedExpertsTEto useTE GroupedLinearsingle-parameter mode and keep AutoModel’s MoE state dict format unchanged.Changelog
single_grouped_parameter=True.GroupedExpertsTEweight handling to read and write the grouped weight parameter directly.Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information