Problem
Trying to import nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 using examples/conversion/convert_checkpoints.py leads to the following error:
ValueError: Shape mismatch for megatron param decoder.layers.1.mlp.shared_experts.linear_fc2.weight:
[rank0]: Expected shape: torch.Size([4096, 5376])
[rank0]: Got shape: torch.Size([4096, 2688])
[rank0]: Bridge type: AutoMapping
[rank0]: HF mapping: backbone.layers.1.mixer.shared_experts.down_proj.weight
Minimal repro
1. Run python /opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py import --hf-model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 --megatron-path NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
Expected behavior
The model is successfully imported and converted.
Affected area
area:ckpt
Regression?
Not sure
Environment
nvcr.io/nvidia/nemo:26.04.00
Logs
Loading from nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2% 0:02:29 (1055/42665) NemotronHBridge
[rank0]: Traceback (most recent call last):
[rank0]: File "/opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py", line 273, in <module>
[rank0]: sys.exit(main())
[rank0]: ^^^^^^
[rank0]: File "/opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py", line 248, in main
[rank0]: import_hf_to_megatron(
[rank0]: File "/opt/Megatron-Bridge/examples/conversion/convert_checkpoints.py", line 120, in import_hf_to_megatron
[rank0]: AutoBridge.import_ckpt(
[rank0]: File "/opt/Megatron-Bridge/src/megatron/bridge/models/conversion/auto_bridge.py", line 896, in import_ckpt
[rank0]: megatron_model = bridge.to_megatron_model(wrap_with_ddp=False, use_cpu_initialization=True)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/Megatron-Bridge/src/megatron/bridge/models/conversion/auto_bridge.py", line 1116, in to_megatron_model
[rank0]: return provider.provide_distributed_model(**kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/Megatron-Bridge/src/megatron/bridge/models/model_provider.py", line 202, in provide_distributed_model
[rank0]: model = get_model(
[rank0]: ^^^^^^^^^^
[rank0]: File "/opt/Megatron-Bridge/src/megatron/bridge/models/model_provider.py", line 565, in get_model
[rank0]: _model = pre_wrap_hook(model)
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/opt/Megatron-Bridge/src/megatron/bridge/models/model_provider.py", line 284, in composed_hook
[rank0]: model = hook(model)
[rank0]: ^^^^^^^^^^^
[rank0]: File "/opt/Megatron-Bridge/src/megatron/bridge/models/conversion/model_bridge.py", line 897, in load_weights_hf_to_megatron
[rank0]: raise ValueError(
[rank0]: ValueError: Shape mismatch for megatron param decoder.layers.1.mlp.shared_experts.linear_fc2.weight:
[rank0]: Expected shape: torch.Size([4096, 5376])
[rank0]: Got shape: torch.Size([4096, 2688])
[rank0]: Bridge type: AutoMapping
[rank0]: HF mapping: backbone.layers.1.mixer.shared_experts.down_proj.weight
Problem
Trying to import nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 using
examples/conversion/convert_checkpoints.pyleads to the following error:Minimal repro
Expected behavior
The model is successfully imported and converted.
Affected area
area:ckpt
Regression?
Not sure
Environment
nvcr.io/nvidia/nemo:26.04.00
Logs