Skip to content

fix: use training-appropriate evaluate timeouts instead of reordering eval#246

Merged
abrichr merged 1 commit into
mainfrom
fix/revert-eval-reorder-use-timeouts
Mar 29, 2026
Merged

fix: use training-appropriate evaluate timeouts instead of reordering eval#246
abrichr merged 1 commit into
mainfrom
fix/revert-eval-reorder-use-timeouts

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 29, 2026

Summary

Reverts the evaluate_dense reordering from #245 and applies the correct fix: training-appropriate timeouts on the adapter.

Problem: #245 skipped binary eval entirely when local checks existed — too aggressive, loses the binary signal when port 5050 IS available.

Correct fix: 2-line change in trl_wrapper.py:

evaluate_timeout=15.0,  # was 180.0 (benchmark default)
evaluate_retries=1,      # was 3 (benchmark default)

The evaluate_dense logic stays correct (try binary first, local fallback, take max). Training speed comes from fast failure (15s vs 9 min), not from skipping evaluation paths.

Context Timeout Retries Worst case
Benchmarking 180s 3 9+ min (thorough)
Training (new) 15s 1 15s (fast feedback)

Test plan

  • test_binary_eval_called_first — binary always called
  • test_local_fallback_when_binary_returns_zero — local runs after binary fails
  • test_local_not_called_when_binary_succeeds — local skipped when binary works
  • 1471 passed in full suite

🤖 Generated with Claude Code

… eval

Reverts the evaluate_dense reordering from #245 (local-first was too
aggressive — skipped binary eval entirely, losing the signal when 5050
IS available).

The actual fix: set evaluate_timeout=15s and evaluate_retries=1 on the
WAALiveAdapter in the TRL wrapper. The evaluate_dense logic stays
correct (try binary first, local fallback, take max). Training speed
comes from fast failure, not from skipping evaluation paths.

- Benchmarking: 180s timeout, 3 retries (thorough, one-shot)
- Training: 15s timeout, 1 retry (fast feedback, thousands of evals)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 114ad0e into main Mar 29, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant