fix: use training-appropriate evaluate timeouts instead of reordering eval#246
Merged
Merged
Conversation
… eval Reverts the evaluate_dense reordering from #245 (local-first was too aggressive — skipped binary eval entirely, losing the signal when 5050 IS available). The actual fix: set evaluate_timeout=15s and evaluate_retries=1 on the WAALiveAdapter in the TRL wrapper. The evaluate_dense logic stays correct (try binary first, local fallback, take max). Training speed comes from fast failure, not from skipping evaluation paths. - Benchmarking: 180s timeout, 3 retries (thorough, one-shot) - Training: 15s timeout, 1 retry (fast feedback, thousands of evals) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reverts the evaluate_dense reordering from #245 and applies the correct fix: training-appropriate timeouts on the adapter.
Problem: #245 skipped binary eval entirely when local checks existed — too aggressive, loses the binary signal when port 5050 IS available.
Correct fix: 2-line change in
trl_wrapper.py:The
evaluate_denselogic stays correct (try binary first, local fallback, take max). Training speed comes from fast failure (15s vs 9 min), not from skipping evaluation paths.Test plan
test_binary_eval_called_first— binary always calledtest_local_fallback_when_binary_returns_zero— local runs after binary failstest_local_not_called_when_binary_succeeds— local skipped when binary works🤖 Generated with Claude Code