Commit 6052513
Pooya Moradi
get_optimizer: respect learning_rate_schedule_steps config knob
base.yml documents learning_rate_schedule_steps as the LR schedule shape
control ("By default the length of the schedule is set to the number of
steps", but configurable to a longer/different value). The post_train RL
get_optimizer ignored this knob and always used max_train_steps directly,
silently dropping any non-default value.
This matters for GPU<->TPU recipe parity: when reproducing a GPU recipe
with NUM_BATCHES different from the GPU's, you need to keep the LR
schedule SHAPE the same (e.g., warmup=50, decay=500 like NeMo-RL's
lr_warmup_iters/lr_decay_iters) regardless of how many TPU steps you
run. Without this fix, integrated LR scales linearly with NUM_BATCHES.
Backward-compatible: default learning_rate_schedule_steps=-1 (or unset)
falls back to max_train_steps, identical to old behavior.1 parent 493fba6 commit 6052513
1 file changed
Lines changed: 18 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
531 | 531 | | |
532 | 532 | | |
533 | 533 | | |
534 | | - | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
535 | 546 | | |
536 | 547 | | |
537 | 548 | | |
538 | 549 | | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
543 | 556 | | |
544 | 557 | | |
545 | 558 | | |
| |||
0 commit comments