Commit 0d7759b
Pooya Moradi
Plumb rl.loss_agg_mode to tunix GrpoConfig
tunix's GrpoConfig defaults loss_agg_mode to 'sequence-mean-token-mean',
but GPU NeMo-RL stacks use 'token-mean'. With group-normalized advantages
the two modes produce materially different losses, breaking GPU<->TPU
recipe parity.
Adds the field to the RL Pydantic schema + rl.yml default + passes it
through to GrpoConfig construction so users can override via cmdline:
'rl.loss_agg_mode=token-mean'.1 parent be4fd71 commit 0d7759b
3 files changed
Lines changed: 17 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
57 | 63 | | |
58 | 64 | | |
59 | 65 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1963 | 1963 | | |
1964 | 1964 | | |
1965 | 1965 | | |
| 1966 | + | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
1966 | 1976 | | |
1967 | 1977 | | |
1968 | 1978 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
571 | 571 | | |
572 | 572 | | |
573 | 573 | | |
| 574 | + | |
574 | 575 | | |
575 | 576 | | |
576 | 577 | | |
| |||
0 commit comments