Skip to content

[bug] nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 can not be imported/converted to Megatron format #3605

@OlegSudakov

Description

@OlegSudakov

Problem

Trying to import nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 using examples/conversion/convert_checkpoints.py leads to the following error:

ValueError: Shape mismatch for megatron param decoder.layers.1.mlp.shared_experts.linear_fc2.weight:
[rank0]:   Expected shape: torch.Size([4096, 5376])
[rank0]:   Got shape: torch.Size([4096, 2688])
[rank0]:   Bridge type: AutoMapping
[rank0]:   HF mapping: backbone.layers.1.mixer.shared_experts.down_proj.weight

Minimal repro

1. Run python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py import --hf-model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 --megatron-path NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

Expected behavior

The model is successfully imported and converted.

Affected area

area:ckpt

Regression?

Not sure

Environment

nvcr.io/nvidia/nemo:26.04.00

Logs

Loading from nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   2% 0:02:29 (1055/42665) NemotronHBridge
[rank0]: Traceback (most recent call last):
[rank0]:   File "/opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py", line 273, in <module>
[rank0]:     sys.exit(main())
[rank0]:              ^^^^^^
[rank0]:   File "/opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py", line 248, in main
[rank0]:     import_hf_to_megatron(
[rank0]:   File "/opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py", line 120, in import_hf_to_megatron
[rank0]:     AutoBridge.import_ckpt(
[rank0]:   File "/opt/Megatron-Bridge/src/megatron/bridge/models/conversion/auto_bridge.py", line 896, in import_ckpt
[rank0]:     megatron_model = bridge.to_megatron_model(wrap_with_ddp=False, use_cpu_initialization=True)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Megatron-Bridge/src/megatron/bridge/models/conversion/auto_bridge.py", line 1116, in to_megatron_model
[rank0]:     return provider.provide_distributed_model(**kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Megatron-Bridge/src/megatron/bridge/models/model_provider.py", line 202, in provide_distributed_model
[rank0]:     model = get_model(
[rank0]:             ^^^^^^^^^^
[rank0]:   File "/opt/Megatron-Bridge/src/megatron/bridge/models/model_provider.py", line 565, in get_model
[rank0]:     _model = pre_wrap_hook(model)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/Megatron-Bridge/src/megatron/bridge/models/model_provider.py", line 284, in composed_hook
[rank0]:     model = hook(model)
[rank0]:             ^^^^^^^^^^^
[rank0]:   File "/opt/Megatron-Bridge/src/megatron/bridge/models/conversion/model_bridge.py", line 897, in load_weights_hf_to_megatron
[rank0]:     raise ValueError(
[rank0]: ValueError: Shape mismatch for megatron param decoder.layers.1.mlp.shared_experts.linear_fc2.weight:
[rank0]:   Expected shape: torch.Size([4096, 5376])
[rank0]:   Got shape: torch.Size([4096, 2688])
[rank0]:   Bridge type: AutoMapping
[rank0]:   HF mapping: backbone.layers.1.mixer.shared_experts.down_proj.weight

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:ckptCheckpoint conversion, loading, export, and save pathsbugSomething isn't workingcommunity-requestwaiting-on-customerWaiting on the original author to respond

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions