You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add intermediate eval hook: fire evaluate() every eval_interval outer steps
`eval_interval` was a silently-dead config: even though it's plumbed into
tunix's `RLTrainingConfig.eval_every_n_steps`, tunix's `_run_eval` is a
no-op unless an `eval_ds` is passed to `trainer.train()`. And even if you
do pass one, tunix's default GRPO eval re-runs the full sampled rollout
(num_generations responses per prompt), which is ~3hr/eval and impractical
for trajectory monitoring.
Install a `tunix.sft.hooks.TrainingHooks` subclass that hooks
`on_train_step_end`, checks `rl_cluster.global_steps % eval_interval`,
and calls maxtext's own `evaluate(...)` (greedy decode + the configured
scoring pipeline). Gives matched-step PRE / step_N / POST trajectory
logging at near-zero cost beyond the eval itself (which is already fast
when `eval_batch_size` is set per commit d536d13).
No-op when eval_interval <= 0 or num_test_batches <= 0. Soft-skips with
a warning if tunix.sft.hooks isn't importable, so the launcher still
works against a stock-only tunix.
0 commit comments