Skip to content

Add CLI entry point to materialize and save an initialized model checkpoint #443

@rrutmann

Description

@rrutmann

Feature request

It would be useful to add a dedicated entry point for creating a checkpoint directly after model initialization, before training starts.

Today, seeded weight initialization is reproducible within a fixed topology, but it can still be topology-dependent when pipeline parallelism is involved. In particular, the same initialization seed can lead to different initialized weights when the parallelization strategy changes, for example between pure DP/FSDP and DP+PP. This is expected with the current approach, because pipeline stages intentionally use different effective seeds to avoid identical stage-local initialization.

A dedicated init-checkpoint workflow would provide a clean reproducibility boundary: initialize once, save once, and reuse that checkpoint across later runs and topologies.

Proposed CLI
modalities create_init_cp model_config.yaml

Motivation

This would help with several use cases:

Topology-independent reproducibility.
Initial weights are created once and then reused, instead of being regenerated under each runtime topology.

Cleaner experiment setup.
It separates “how the model is initialized” from “how the model is trained”.

Better testing support.
Some unit and integration tests would benefit from a stable, precomputed initialized checkpoint instead of depending on runtime initialization behavior.

Easier debugging and benchmarking.
Multiple runs can start from the exact same initialized state without relying on implicit RNG behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions