You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add --task-ids, --max-steps-per-episode, --max-new-tokens to standalone GRPO CLI
Without --task-ids, the trainer cycles through ALL tasks in --task-dir
including hard ones (calc-formula) that base models can't complete.
Now you can filter: --task-ids custom-notepad-hello
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments