Feature request
It would be useful to add a dedicated entry point for creating a checkpoint directly after model initialization, before training starts.
Today, seeded weight initialization is reproducible within a fixed topology, but it can still be topology-dependent when pipeline parallelism is involved. In particular, the same initialization seed can lead to different initialized weights when the parallelization strategy changes, for example between pure DP/FSDP and DP+PP. This is expected with the current approach, because pipeline stages intentionally use different effective seeds to avoid identical stage-local initialization.
A dedicated init-checkpoint workflow would provide a clean reproducibility boundary: initialize once, save once, and reuse that checkpoint across later runs and topologies.
Proposed CLI
modalities create_init_cp model_config.yaml
Motivation
This would help with several use cases:
Topology-independent reproducibility.
Initial weights are created once and then reused, instead of being regenerated under each runtime topology.
Cleaner experiment setup.
It separates “how the model is initialized” from “how the model is trained”.
Better testing support.
Some unit and integration tests would benefit from a stable, precomputed initialized checkpoint instead of depending on runtime initialization behavior.
Easier debugging and benchmarking.
Multiple runs can start from the exact same initialized state without relying on implicit RNG behavior.
Feature request
It would be useful to add a dedicated entry point for creating a checkpoint directly after model initialization, before training starts.
Today, seeded weight initialization is reproducible within a fixed topology, but it can still be topology-dependent when pipeline parallelism is involved. In particular, the same initialization seed can lead to different initialized weights when the parallelization strategy changes, for example between pure DP/FSDP and DP+PP. This is expected with the current approach, because pipeline stages intentionally use different effective seeds to avoid identical stage-local initialization.
A dedicated init-checkpoint workflow would provide a clean reproducibility boundary: initialize once, save once, and reuse that checkpoint across later runs and topologies.
Proposed CLI
modalities create_init_cp model_config.yamlMotivation
This would help with several use cases:
Topology-independent reproducibility.
Initial weights are created once and then reused, instead of being regenerated under each runtime topology.
Cleaner experiment setup.
It separates “how the model is initialized” from “how the model is trained”.
Better testing support.
Some unit and integration tests would benefit from a stable, precomputed initialized checkpoint instead of depending on runtime initialization behavior.
Easier debugging and benchmarking.
Multiple runs can start from the exact same initialized state without relying on implicit RNG behavior.