Commit 83226f4
fix: auto-compute dp_replicate_size from world_size in ParallelismConfig
When dp_shard_size < world_size (e.g., dp_shard_size=4 on 8 GPUs),
ParallelismConfig raises "total_size does not match num_processes"
because dp_replicate_size defaults to 1.
Auto-compute dp_replicate_size = world_size // (dp_shard_size * cp_size)
so that intra-node FSDP2 sharding + inter-node data-parallel replication
works without requiring users to manually set dp_replicate_size.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>1 parent 289a239 commit 83226f4
1 file changed
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
215 | 220 | | |
216 | | - | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
217 | 224 | | |
218 | 225 | | |
219 | 226 | | |
| |||
0 commit comments