Commit ff7eb45
Pooya Moradi
Plumb rl.loss_agg_mode to tunix GrpoConfig
tunix's GrpoConfig defaults loss_agg_mode to 'sequence-mean-token-mean',
but GPU NeMo-RL stacks use 'token-mean'. With group-normalized advantages
the two modes produce materially different losses, breaking GPU<->TPU
recipe parity.
Adds the field to the RL Pydantic schema + rl.yml default + passes it
through to GrpoConfig construction so users can override via cmdline:
'rl.loss_agg_mode=token-mean'.1 parent be4fd71 commit ff7eb45
3 files changed
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
57 | 60 | | |
58 | 61 | | |
59 | 62 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1963 | 1963 | | |
1964 | 1964 | | |
1965 | 1965 | | |
| 1966 | + | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
1966 | 1970 | | |
1967 | 1971 | | |
1968 | 1972 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
571 | 571 | | |
572 | 572 | | |
573 | 573 | | |
| 574 | + | |
574 | 575 | | |
575 | 576 | | |
576 | 577 | | |
| |||
0 commit comments