Commit 1d94899
fix: use build_agent_messages for TRL prompt + fix 4x over-generation (#247)
Two critical fixes:
1. Garbage output root cause: TRL constructed user messages differently
from the standalone trainer. Standalone wraps instruction with
"Goal:" prefix, format guidance, and {"type": "image"} placeholder.
TRL passed raw instruction text. Now imports build_agent_messages
from standalone.prompt so both paths produce identical messages.
2. 4x over-generation: batch_size=num_gen with padded dataset caused
4 identical prompts × 4 generations = 16 rollouts (standalone does 4).
Now: batch_size=1, generation_batch_size=num_gen. One unique prompt
per step with num_gen rollouts. No dataset padding needed.
Also adds one-time prompt logging for operator verification.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 682b581 commit 1d94899
2 files changed
Lines changed: 37 additions & 45 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
| 418 | + | |
418 | 419 | | |
419 | 420 | | |
420 | 421 | | |
| |||
501 | 502 | | |
502 | 503 | | |
503 | 504 | | |
504 | | - | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | | - | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
511 | 513 | | |
512 | 514 | | |
513 | 515 | | |
514 | 516 | | |
515 | 517 | | |
516 | 518 | | |
517 | 519 | | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
518 | 529 | | |
519 | 530 | | |
520 | 531 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
226 | 225 | | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
231 | 234 | | |
232 | | - | |
233 | 235 | | |
234 | 236 | | |
235 | 237 | | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | 238 | | |
246 | 239 | | |
247 | 240 | | |
| |||
254 | 247 | | |
255 | 248 | | |
256 | 249 | | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
262 | 255 | | |
263 | 256 | | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | | - | |
269 | | - | |
270 | | - | |
271 | | - | |
272 | | - | |
273 | | - | |
274 | | - | |
275 | | - | |
276 | | - | |
277 | | - | |
278 | | - | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
279 | 260 | | |
280 | 261 | | |
281 | 262 | | |
| |||
0 commit comments