Parent plan: #16
Goal: make the winning gradient-based Microplex weight path the standard path, with sparse L0 used only when it is intentionally part of the size-class product.
Deliverables:
- Disable hard-concrete gates automatically when
l0_lambda=0.
- Preserve L0 gates for
mp-3m-fast sparse artifacts or explicit experiments.
- Write training loss and held-out target loss curves for every serious run.
- Define epoch stopping from held-out target performance, not just training loss.
- Document one standard
mp-300k calibration command and one standard mp-3m-fast sparse calibration command.
Exit criterion: current best mp-300k and mp-3m-fast runs are reproducible from documented commands and indexed loss artifacts.
Parent plan: #16
Goal: make the winning gradient-based Microplex weight path the standard path, with sparse L0 used only when it is intentionally part of the size-class product.
Deliverables:
l0_lambda=0.mp-3m-fastsparse artifacts or explicit experiments.mp-300kcalibration command and one standardmp-3m-fastsparse calibration command.Exit criterion: current best
mp-300kandmp-3m-fastruns are reproducible from documented commands and indexed loss artifacts.