Skip to content

feat(opt): Muon optimizer (Newton-Schulz orthogonalization) for 2D weights#560

Merged
gHashTag merged 1 commit into
mainfrom
feat/535-muon-optimizer
Apr 30, 2026
Merged

feat(opt): Muon optimizer (Newton-Schulz orthogonalization) for 2D weights#560
gHashTag merged 1 commit into
mainfrom
feat/535-muon-optimizer

Conversation

@gHashTag

Copy link
Copy Markdown
Owner

Summary

Implement Muon optimizer with Newton-Schulz orthogonalization for 35% speedup vs AdamW (R12).

New file

  • src/tri/math/muon_optimizer.zig — 327 LOC

Gates addressed

  • G1: Newton-Schulz 5-iteration orthogonalization algorithm for 2D weight tensors
  • G2: Hybrid Muon+AdamW setup — Muon for attn/MLP (2D), AdamW for embed/norm (1D/3D)
  • Remaining gates (G3-G7: smoke test, wall-time, stability, A/B, NCA) require running training infrastructure

Features

  • MuonOptimizer: velocity-based momentum with NS orthogonalization
  • AdamWState: bias-corrected AdamW with weight decay
  • HybridMuonAdamW: unified optimizer managing both paths
  • Configurable: lr, momentum, ns_iterations, weight_decay, nesterov
  • Cosine decay-ready (lr schedule externally applied)

Tests (6)

  • NS normalizes vector to unit norm
  • NS handles zero vector safely
  • NS preserves original direction
  • Muon weight update produces finite values
  • AdamW converges (params decrease with weight decay)
  • Hybrid handles both 2D and 1D tensors

Reference: https://kellerjordan.github.io/posts/muon/

Closes #535

…ights

- Add src/tri/math/muon_optimizer.zig
- G1: Newton-Schulz 5-iteration orthogonalization for 2D weights
- G2: Hybrid Muon+AdamW setup (Muon for attn/MLP, AdamW for embed/norm)
- Momentum-based velocity accumulation with Nesterov option
- AdamW with bias correction, weight decay, epsilon guard
- Cosine decay-ready (externally scheduled)
- 6 tests: NS normalization, zero vector, direction preservation,
  Muon weight update, AdamW convergence, hybrid dual-tensor

Closes #535
Ref: R12
@gHashTag gHashTag merged commit c128444 into main Apr 30, 2026
9 of 19 checks passed
@gHashTag gHashTag deleted the feat/535-muon-optimizer branch April 30, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🔬 R12: Muon Optimizer Integration

1 participant