Skip to content

Standardize GD calibration for mp-300k and mp-3m-fast #12

@MaxGhenis

Description

@MaxGhenis

Parent plan: #16

Goal: make the winning gradient-based Microplex weight path the standard path, with sparse L0 used only when it is intentionally part of the size-class product.

Deliverables:

  • Disable hard-concrete gates automatically when l0_lambda=0.
  • Preserve L0 gates for mp-3m-fast sparse artifacts or explicit experiments.
  • Write training loss and held-out target loss curves for every serious run.
  • Define epoch stopping from held-out target performance, not just training loss.
  • Document one standard mp-300k calibration command and one standard mp-3m-fast sparse calibration command.

Exit criterion: current best mp-300k and mp-3m-fast runs are reproducible from documented commands and indexed loss artifacts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions