Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/conversion/hf_megatron_roundtrip_multi_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@
# MiniMax-M2: QK norms stored as bf16 in HF, loaded as fp32 by Megatron config.params_dtype
"q_norm.weight",
"k_norm.weight",
# MiniMax-M2: router gate stored as fp32 in HF, loaded as bf16 via autocast_dtype
"block_sparse_moe.gate.weight",
# MoE router gate stored as fp32 in Megatron, may be bf16 in HF
"gate.weight",
]

# FP8 dtypes whose dequantisation is inherently lossy — allclose is meaningless.
Expand Down
520 changes: 520 additions & 0 deletions examples/models/vlm/ernie_vl/ernie45_vl_fwd_bwd.py
Comment thread
bo-ke marked this conversation as resolved.

Large diffs are not rendered by default.

Loading
Loading