Commit 321dcea
fix: configurable max_grad_norm, lower default lr, remove premature deprecation (#255)
Three changes based on client training results (grad_norm=101, 0.00 eval delta):
1. Add max_grad_norm to TrainingConfig (was hardcoded to 1.0). When
grad_norm >> max_grad_norm, gradients are clipped to a near-random
direction — training makes no progress despite non-zero loss.
Now warns when grad_norm > 10x the clip threshold.
2. Lower default learning_rate from 5e-6 to 1e-6. With grad_norm=101
and lr=5e-6, effective step size overshoots. lr=1e-6 with
max_grad_norm=1.0 gives stable updates.
3. Remove "standalone trainer is deprecated" warning. It was premature —
TRL's rollout_func doesn't support multimodal VLMs (issue #5120).
The standalone trainer is the production training path until TRL
PR #5323 merges.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent a5f290d commit 321dcea
2 files changed
+22
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
51 | 56 | | |
52 | 57 | | |
53 | 58 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
457 | 457 | | |
458 | 458 | | |
459 | 459 | | |
460 | | - | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
461 | 472 | | |
462 | 473 | | |
463 | 474 | | |
| |||
485 | 496 | | |
486 | 497 | | |
487 | 498 | | |
488 | | - | |
489 | | - | |
490 | | - | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
491 | 503 | | |
492 | 504 | | |
493 | 505 | | |
| |||
0 commit comments