|
20 | 20 |
|
21 | 21 | ### New Models |
22 | 22 |
|
23 | | -- **LLM:** GLM 5, Minimax M2.5, Step-3.5-Flash, Devstral 24B, Nemotron Nano |
24 | | - 4B/8B. |
| 23 | +- **LLM:** GLM-5, MiniMax-M2.5, Nemotron Super v3, Nemotron Nano 4B/8B. |
25 | 24 | - **MoE / VLM:** Qwen3.5-MoE (397B-A17B, 35B-A3B). |
26 | 25 | - **VLM:** Gemma 4, Mistral Small 4, Qwen3.5 small dense models. |
27 | | -- **Multimodal / Omni:** Nemotron-3-Nano-Omni. |
28 | | -- **Diffusion:** Wan multi-resolution, LoRA for diffusion. |
| 26 | +- **Diffusion:** FLUX.1-dev, Wan 2.1 T2V, HunyuanVideo 1.5; Wan |
| 27 | + multi-resolution and LoRA recipes for diffusion. |
29 | 28 |
|
30 | 29 | ### Distributed Training |
31 | 30 |
|
@@ -99,11 +98,12 @@ A migration guide for the new CLI, the `recipe` YAML section, the SLURM |
99 | 98 |
|
100 | 99 | ### New Models |
101 | 100 |
|
102 | | -- **LLM:** DeepSeek 3.2, Step3p5, Minimax M2, Nano v3 custom, Nemotron Flash, |
103 | | - GLM 4.7, Devstral (backported to v4). |
104 | | -- **MoE / VLM:** Qwen3-VL custom implementation (235B, 30B, 4B/8B configs), |
105 | | - Kimi-VL, Kimi K2.5 VL, Qwen3-Omni port via `transformers omni`, |
106 | | - Nemotron-Parse VLM. |
| 101 | +- **LLM:** DeepSeek V3.2, Step-3.5-Flash, MiniMax-M2.1, |
| 102 | + Nemotron-3-Nano-30B-A3B, Nemotron Flash 1B, GLM-4.7, |
| 103 | + Devstral-Small-2-24B. |
| 104 | +- **MoE / VLM / Omni:** Qwen3-VL (4B/8B), Qwen3-VL-MoE (30B/235B), |
| 105 | + Kimi-VL, Kimi-K2.5 VL, Qwen3-Omni, Nemotron-Parse VLM, |
| 106 | + InternVL3.5-4B, Ministral3 (3B/8B/14B), Phi-4-multimodal. |
107 | 107 |
|
108 | 108 | ### Distributed Training |
109 | 109 |
|
|
0 commit comments