Skip to content

Commit 6837d30

Browse files
yaoyu-33claude
andauthored
[doc] feat: add MiniMax M2.5 / M2.7 model support (#3291)
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 30defd7 commit 6837d30

4 files changed

Lines changed: 9 additions & 1 deletion

File tree

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
## 📣 News
1414

15+
- [04/12/2026] [**MiniMax-M2.5 / M2.7**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/minimax_m2) are now supported! Both models share the same architecture as MiniMax-M2 and work with the existing bridge out of the box — checkpoint conversion and inference verified on real FP8 checkpoints.
16+
1517
- [04/10/2026] [**Qwen3-ASR**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/audio_lm/qwen3_asr) is now supported! Checkpoint conversion and inference for [Qwen3's ASR model](https://github.com/QwenLM/Qwen3-ASR) are available on **main**.
1618

1719
- [04/09/2026] [**Bailing MoE V2**](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples/models/bailing) is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on **main**. Thank you to [@ccclyu](https://github.com/ccclyu) for the community contribution!
@@ -181,7 +183,7 @@ Megatron Bridge provides out-of-the-box bridges and training recipes for a wide
181183
- [Mamba](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/mamba)
182184
- [Ministral](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/ministral3)[recipes (3B/8B/14B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/ministral3/ministral3.py)
183185
- [Mistral](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/mistral)
184-
- [MiniMax-M2](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/minimax_m2)
186+
- [MiniMax-M2 / M2.5 / M2.7](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/minimax_m2)
185187
- [Moonlight](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/deepseek)[recipes (16B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/moonlight/moonlight_16b.py)
186188
- [OlMoE](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/olmoe)[recipes (7B)](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/olmoe/olmoe_7b.py)
187189
- [Qwen2 / Qwen2.5](https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen)[recipes](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py)

docs/models/llm/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Megatron Bridge supports the following LLM families:
1414
| **Gemma 3** | [gemma3.md](gemma3.md) | Google Gemma 3 models |
1515
| **GLM-4.5** | [glm45.md](glm45.md) | GLM-4.5 model family |
1616
| **GPT-OSS** | [gpt-oss.md](gpt-oss.md) | Open-source GPT-style models |
17+
| **MiniMax-M2** || MiniMax-M2 / M2.5 / M2.7 (456B MoE, FP8) |
1718
| **LLaMA 3** | [llama3.md](llama3.md) | Meta LLaMA 3 models |
1819
| **LLaMA Nemotron** | [llama-nemotron.md](llama-nemotron.md) | NVIDIA LLaMA Nemotron models |
1920
| **Mistral** | [mistral.md](mistral.md) | Mistral AI models |

examples/models/minimax_m2/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This directory contains example scripts for [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2), a large sparse MoE model with 456B total parameters (45.9B active), 256 experts, and FP8 quantization.
44

5+
> **M2.5 / M2.7 compatibility:** [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) and [MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) share the same architecture (`MiniMaxM2ForCausalLM`) and work with the same bridge. Replace the model ID in the scripts below (e.g. `MiniMaxAI/MiniMax-M2.5`).
6+
57
## Hardware Requirements
68

79
MiniMax-M2 requires **at least 2 nodes (16 GPUs)** for inference and conversion. The model cannot fit on a single 8-GPU node because:

src/megatron/bridge/models/minimax_m2/minimax_m2_bridge.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,9 @@ class MiniMaxM2Bridge(MegatronModelBridge):
9898
"""
9999
Megatron Bridge for MiniMax-M2 MoE Causal LM.
100100
101+
Also supports MiniMax-M2.5 and MiniMax-M2.7, which share the same
102+
``model_type`` (``minimax_m2``) and ``MiniMaxM2ForCausalLM`` architecture.
103+
101104
MiniMax-M2 is a sparse MoE model (256 experts, top-8 routing with sigmoid
102105
scoring and expert bias correction). Use the native transformers >= 5.0
103106
implementation (no ``trust_remote_code`` required).

0 commit comments

Comments
 (0)