Commit fecf461
fix: increase max_new_tokens to 2048 and make configurable via GRPOConfig (#62)
* fix: align GRPO prompt format with SFT training format
The GRPO rollout prompt was missing the "Thought:" line and action
history that the SFT training uses. Models fine-tuned via SFT output
"Thought: ...\nAction: CLICK(...)" but the GRPO prompt didn't
prompt for this format, causing verbose free-form output that
couldn't be parsed → reward 0.0 → zero gradients.
Changes:
- Add "Thought:" and "Action:" prompt lines matching SFT format
- Add action_history parameter for step context
- Parser extracts action from "Action: ..." line before regex matching
- Parser handles JSON format {"action_type": "click", "coordinate": [x,y]}
- Debug logging of raw VLM output for zero-reward diagnosis
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: increase max_new_tokens to 2048 and make configurable
The default of 100 tokens truncated reasoning models mid-thought,
producing unparseable output → DONE → reward 0.0 → zero gradients.
Caused 4 failed training runs (~20 GPU-hours wasted).
- Add max_new_tokens to GRPOConfig (default 2048)
- Use config value instead of hardcoded 100
- Add truncation warning when generation hits the limit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 04e6e9f commit fecf461
2 files changed
Lines changed: 17 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
89 | 96 | | |
90 | 97 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
500 | 500 | | |
501 | 501 | | |
502 | 502 | | |
503 | | - | |
| 503 | + | |
504 | 504 | | |
505 | 505 | | |
506 | 506 | | |
| |||
517 | 517 | | |
518 | 518 | | |
519 | 519 | | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
520 | 529 | | |
521 | 530 | | |
522 | 531 | | |
| |||
0 commit comments