Commit b02db5b
fix: correct gradient accumulation off-by-one and lr_scheduler over-stepping (#82)
* fix: correct gradient accumulation off-by-one and lr_scheduler over-stepping
* fix: align scheduler total_iters with optimizer steps under gradient accumulation
lr_scheduler total_iters was set to micro-step count (total_steps), but
after moving lr_scheduler.step() to only fire on optimizer steps, the
scheduler would only traverse 1/backward_passes_per_step of its budget.
Divide total_iters by backward_passes_per_step so the full LR curve
(warmup + polynomial decay) completes over the actual optimizer steps.
No-op when backward_passes_per_step=1 (Stage-1).
---------
Co-authored-by: Xiang An <anxiangsir@outlook.com>1 parent 29826ef commit b02db5b
1 file changed
+4
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
353 | | - | |
| 353 | + | |
| 354 | + | |
354 | 355 | | |
355 | 356 | | |
356 | 357 | | |
| |||
652 | 653 | | |
653 | 654 | | |
654 | 655 | | |
655 | | - | |
| 656 | + | |
656 | 657 | | |
657 | 658 | | |
658 | 659 | | |
| |||
665 | 666 | | |
666 | 667 | | |
667 | 668 | | |
668 | | - | |
669 | | - | |
| 669 | + | |
670 | 670 | | |
671 | 671 | | |
672 | 672 | | |
| |||
0 commit comments