@@ -102,8 +102,8 @@ A migration guide for the new CLI, the `recipe` YAML section, the SLURM
102102 Nemotron-3-Nano-30B-A3B, Nemotron Flash 1B, GLM-4.7,
103103 Devstral-Small-2-24B.
104104- ** MoE / VLM / Omni:** Qwen3-VL (4B/8B), Qwen3-VL-MoE (30B/235B),
105- Kimi-VL, Kimi-K2.5 VL, Qwen3-Omni, Nemotron-Parse VLM,
106- InternVL3.5-4B, Ministral3 (3B/8B/14B), Phi-4-multimodal.
105+ Kimi-VL, Kimi-K2.5 VL, Nemotron-Parse VLM, InternVL3.5-4B ,
106+ Ministral3 (3B/8B/14B), Phi-4-multimodal.
107107
108108### Distributed Training
109109
@@ -165,13 +165,14 @@ A migration guide for the new CLI, the `recipe` YAML section, the SLURM
165165- ** MoE:** Qwen3 MoE custom implementation, Qwen3 Next, GPT-OSS (custom
166166 implementation, dequantization, DGX Spark recipe), GLM 4 / 4.5 / 4.6 MoE,
167167 GLM 4.5 Air, Moonlight 2L test, Phi 4 (TP plan).
168+ - ** Omni / VLM:** Qwen3-Omni OOTB recipe and custom implementation.
168169- ** DeepSeek v3** with fp8 base checkpoint loading.
169170- ** Sequence classification:** Qwen3ForSequenceClassification registered;
170171 generic SFT sequence-classification recipe.
171172
172173### Distributed Training
173174
174- - VLM EP and Qwen-Omni custom implementation .
175+ - VLM expert-parallel recipe support .
175176- PP for VLM; PEFT with PP.
176177- Sharding optimization for SP / LoRA.
177178- ` clip_grad_norm ` across all parallelism modes.
0 commit comments