fix: update TE GroupedLinear integration for single-parameter mode by SwekeR-463 · Pull Request #1680 · NVIDIA-NeMo/Automodel

SwekeR-463 · 2026-04-04T09:19:11Z

What does this PR do ?

Update GroupedExpertsTE to use TE GroupedLinear single-parameter mode and keep AutoModel’s MoE state dict format unchanged.

Changelog

Switched TE MoE expert construction to single_grouped_parameter=True.
Updated GroupedExpertsTE weight handling to read and write the grouped weight parameter directly.
Kept AutoModel MoE checkpoint serialization in stacked tensor format.
Updated EP grad-scaling name matching for the grouped TE parameter layout.
Added and updated unit tests for the new TE path.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to Update TE GroupedLinear to use single parameter mode #1538

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>

copy-pr-bot · 2026-04-04T09:19:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

akoumpa · 2026-04-04T23:39:41Z

/ok to test 5346557

hemildesai · 2026-04-06T03:30:05Z

Hi @SwekeR-463, thanks a lot for your contribution. Do you have an example wandb run verifying the convergence before and after this change?

SwekeR-463 · 2026-04-08T01:44:02Z

Hi @SwekeR-463, thanks a lot for your contribution. Do you have an example wandb run verifying the convergence before and after this change?

I haven't done any runs, will do and update.

akoumpa · 2026-04-10T03:15:04Z

/ok to test bf85bc9

SwekeR-463 · 2026-04-11T16:29:08Z

Hi @SwekeR-463, thanks a lot for your contribution. Do you have an example wandb run verifying the convergence before and after this change?

Hello @hemildesai, I attempted to run experiments to verify convergence before and after the change, but ran into repeated setup issues on my end. Rather than delaying further, I wanted to inform. 🙂

akoumpa · 2026-04-16T22:22:38Z

/ok to test f6f3fcd

akoumpa · 2026-04-16T22:25:03Z

Hi @SwekeR-463 I restarted CI, I apologize for the long delay.

SwekeR-463 added 2 commits April 4, 2026 14:41

fix: update TE GroupedLinear integration for single-parameter mode

d7f88b6

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>

fix: imports style

5346557

Signed-off-by: SwekeR-463 <swekerswasti@gmail.com>

SwekeR-463 requested review from HuiyingLi, ZhiyuLi-Nvidia, adil-a, akoumpa, hemildesai and pthombre as code owners April 4, 2026 09:19

github-actions bot added the community-request label Apr 4, 2026

SwekeR-463 changed the title ~~(fix): update TE GroupedLinear integration for single-parameter mode~~ fix: update TE GroupedLinear integration for single-parameter mode Apr 4, 2026

copy-pr-bot bot temporarily deployed to test April 4, 2026 23:40 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 4, 2026 23:40 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 4, 2026 23:49 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 4, 2026 23:56 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 4, 2026 23:56 Inactive

chtruong814 added the needs-follow-up Issue needs follow-up label Apr 10, 2026

Merge branch 'main' into fix/te-grp-linear-to-1-param

bf85bc9

copy-pr-bot bot temporarily deployed to test April 10, 2026 03:15 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 03:15 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 03:23 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 10, 2026 04:28 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 10, 2026 04:28 Inactive

chtruong814 removed the needs-follow-up Issue needs follow-up label Apr 10, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Apr 13, 2026

Merge branch 'main' into fix/te-grp-linear-to-1-param

f6f3fcd

copy-pr-bot bot temporarily deployed to test April 16, 2026 22:22 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 16, 2026 22:23 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci April 17, 2026 00:05 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci April 17, 2026 02:54 Failure

copy-pr-bot bot temporarily deployed to nemo-ci April 17, 2026 02:54 Inactive

chtruong814 added waiting-on-customer Waiting on the original author to respond and removed needs-follow-up Issue needs follow-up labels Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update TE GroupedLinear integration for single-parameter mode#1680

fix: update TE GroupedLinear integration for single-parameter mode#1680
SwekeR-463 wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
SwekeR-463:fix/te-grp-linear-to-1-param

SwekeR-463 commented Apr 4, 2026

Uh oh!

copy-pr-bot bot commented Apr 4, 2026

Uh oh!

akoumpa commented Apr 4, 2026

Uh oh!

hemildesai commented Apr 6, 2026

Uh oh!

SwekeR-463 commented Apr 8, 2026

Uh oh!

akoumpa commented Apr 10, 2026

Uh oh!

SwekeR-463 commented Apr 11, 2026

Uh oh!

akoumpa commented Apr 16, 2026

Uh oh!

akoumpa commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

SwekeR-463 commented Apr 4, 2026

What does this PR do ?

Changelog

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Apr 4, 2026

Uh oh!

akoumpa commented Apr 4, 2026

Uh oh!

hemildesai commented Apr 6, 2026

Uh oh!

SwekeR-463 commented Apr 8, 2026

Uh oh!

akoumpa commented Apr 10, 2026

Uh oh!

SwekeR-463 commented Apr 11, 2026

Uh oh!

akoumpa commented Apr 16, 2026

Uh oh!

akoumpa commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants