|
1 | 1 | # CHANGELOG |
2 | 2 |
|
3 | 3 |
|
| 4 | +## v0.81.5 (2026-03-29) |
| 5 | + |
| 6 | +### Bug Fixes |
| 7 | + |
| 8 | +- Wire on_before_collect and on_rollout_complete callbacks through rollout_func |
| 9 | + ([#243](https://github.com/OpenAdaptAI/openadapt-evals/pull/243), |
| 10 | + [`fc40bf4`](https://github.com/OpenAdaptAI/openadapt-evals/commit/fc40bf40784482a20a49600dd95b151b1342d6b7)) |
| 11 | + |
| 12 | +* fix: add truncation warning to TRL generate paths |
| 13 | + |
| 14 | +Add a truncation check after both generation paths (Outlines constrained and HF unconstrained) in |
| 15 | + generate_fn. When the output length reaches max_new_tokens - 1, a warning is logged suggesting to |
| 16 | + increase max_new_tokens or enable constrained_decoding. This helps diagnose cases where the model |
| 17 | + generates excessively long reasoning that gets cut off before producing a parseable action. |
| 18 | + |
| 19 | +Also replaced the tautological truncation tests in test_trl_robustness.py (which reimplemented the |
| 20 | + check logic inline) with tests that exercise the actual generate_fn code path by calling it |
| 21 | + through the rollout function with mocked torch and model.generate. |
| 22 | + |
| 23 | +Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 24 | + |
| 25 | +* fix: wire on_before_collect and on_rollout_complete callbacks through rollout_func |
| 26 | + |
| 27 | +The GRPOTrainer wrapper accepted on_before_collect and on_rollout_complete callbacks but silently |
| 28 | + ignored them. HookBridge stored them but only implemented on_step_end (for on_step_complete). TRL |
| 29 | + has no pre-rollout callback event, so these must fire from within make_waa_rollout_func. |
| 30 | + |
| 31 | +Changes: - Add on_before_collect and on_rollout_complete params to make_waa_rollout_func - Fire |
| 32 | + on_before_collect(task_id, env) before each episode - Fire on_rollout_complete(rollout_dict, |
| 33 | + gen_idx) after each episode - Wrap both in try/except so broken callbacks cannot crash training - |
| 34 | + Pass callbacks from GRPOTrainer.train() to make_waa_rollout_func - Remove these two callbacks from |
| 35 | + HookBridge (keep only on_step_complete) |
| 36 | + |
| 37 | +--------- |
| 38 | + |
| 39 | +Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
| 40 | + |
| 41 | + |
4 | 42 | ## v0.81.4 (2026-03-29) |
5 | 43 |
|
6 | 44 | ### Bug Fixes |
|
0 commit comments