Commit 0fd3fa2
committed
feat(engine): add Muon optimizer support for FSDP and Megatron engines
Add Muon optimizer (Newton-Schulz orthogonalization) with distributed
FSDP support, ported from samsja/muon_fsdp_2 v0.3.0.
Core changes:
- areal/utils/optimizer.py: Full Muon implementation with Work pipeline
for async NCCL overlap (Fsdp1dWork, SingleDeviceWork), Newton-Schulz
iteration, Moonlight RMS scaling option, and AdamW fallback for
non-2D parameters (embeddings, norms, biases)
- areal/api/cli_args.py: Add Muon-specific config fields (muon_momentum,
muon_nesterov, muon_ns_steps, muon_backend_steps, muon_rms_scale)
- areal/engine/fsdp_engine.py: Integrate Muon into FSDP optimizer creation
- areal/experimental/engine/archon_engine.py + archon_utils.py: Integrate
Muon into Archon engine optimizer creation
- pyproject.toml / pyproject.vllm.toml: Add muon_fsdp_2 dependency
- tests/test_muon_optimizer.py: Unit tests for Newton-Schulz, scaling,
optimizer step, and config validation
feat(megatron): enable Muon optimizer via Megatron-Core native dispatch
Megatron-Core natively supports Muon via _get_megatron_emerging_optimizer
when optimizer type is not in ('adam', 'sgd'). It handles TP-aware
Newton-Schulz, QKV splitting, and ChainedOptimizer (Muon for 2D weights,
AdamW for norms/biases/embeddings) out of the box.
- Allow 'muon' in _create_optimizer assertion
- Forward muon_momentum/muon_nesterov/muon_num_ns_steps from OptimizerConfig
to MCoreOptimizerConfig (with hasattr guard for older Megatron-Core)
- Requires the 'emerging-optimizers' package to be installed at runtime1 parent 95ca870 commit 0fd3fa2
9 files changed
Lines changed: 834 additions & 18 deletions
File tree
- areal
- api
- engine
- fsdp_utils
- experimental/engine
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
338 | 338 | | |
339 | 339 | | |
340 | 340 | | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
345 | 360 | | |
346 | 361 | | |
347 | | - | |
348 | | - | |
349 | 362 | | |
350 | 363 | | |
351 | 364 | | |
352 | | - | |
| 365 | + | |
| 366 | + | |
353 | 367 | | |
354 | 368 | | |
355 | 369 | | |
356 | 370 | | |
357 | 371 | | |
358 | | - | |
| 372 | + | |
| 373 | + | |
359 | 374 | | |
360 | 375 | | |
361 | 376 | | |
362 | 377 | | |
363 | 378 | | |
364 | | - | |
| 379 | + | |
| 380 | + | |
365 | 381 | | |
366 | 382 | | |
367 | 383 | | |
| |||
398 | 414 | | |
399 | 415 | | |
400 | 416 | | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
401 | 456 | | |
402 | 457 | | |
403 | 458 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
87 | 91 | | |
88 | 92 | | |
89 | 93 | | |
| |||
470 | 474 | | |
471 | 475 | | |
472 | 476 | | |
473 | | - | |
| 477 | + | |
474 | 478 | | |
475 | 479 | | |
476 | 480 | | |
| |||
1111 | 1115 | | |
1112 | 1116 | | |
1113 | 1117 | | |
1114 | | - | |
| 1118 | + | |
| 1119 | + | |
1115 | 1120 | | |
1116 | 1121 | | |
1117 | 1122 | | |
| |||
1121 | 1126 | | |
1122 | 1127 | | |
1123 | 1128 | | |
1124 | | - | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
| 1149 | + | |
| 1150 | + | |
| 1151 | + | |
| 1152 | + | |
| 1153 | + | |
| 1154 | + | |
| 1155 | + | |
| 1156 | + | |
| 1157 | + | |
| 1158 | + | |
| 1159 | + | |
| 1160 | + | |
| 1161 | + | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
| 1175 | + | |
1125 | 1176 | | |
1126 | 1177 | | |
1127 | 1178 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
0 commit comments