fix: race conditions, re-staging bug, and parallel evaluator test suite#2127
Closed
KRRT7 wants to merge 4 commits into
Closed
fix: race conditions, re-staging bug, and parallel evaluator test suite#2127KRRT7 wants to merge 4 commits into
KRRT7 wants to merge 4 commits into
Conversation
This was referenced May 7, 2026
Closed
Covers the full stack: pool lifecycle/cleanup, file isolation between slots, subprocess stdout/stderr/timeout, and evaluator logic (failure with diffs, success routing, concurrent multi-candidate).
… evaluator Critical fixes from code review: - Deadlock: slots are now released after behavioral tests (Phase 1), re-acquired for benchmarking (Phase 2). Previously, holding slots across phases caused deadlock when passes >= pool_size. - Pydantic ValidationError: behavior_test_results is now stored in _BehavioralPass and passed through to OptimizedCandidateResult. - Slot leak on cancellation: catch BaseException in _behavioral_phase. WorktreePool improvements: - Graceful partial creation failure (one slot failing doesn't crash pool). - Cleanup resilience (one rmtree failure doesn't abort others). - Stream lifecycle: close send/receive in cleanup(). - Async-safe: use anyio.Path for exists() checks. - Python 3.12+: use onexc instead of deprecated onerror for rmtree. - Remove dead code: PID file, unused restore_file method. Other fixes: - _run_line_profiler_for_winner: catch all exceptions. - _dispatch_repair_if_possible: skip when diffs are empty. - aiservice.py: pass language to _get_valid_candidates in batch path. - Remove unused AIServiceBatchRefinerRequest dataclass. - Fix result file path collision: include slot.index in filename. - Remove _code_replace_lock (no longer needed since slots are released immediately and _replace_and_capture is serialized by GIL).
…ession test - Parallel path now checks if a successful candidate was previously refined (via path_to_root ancestry). If so, dispatches adaptive optimization instead of batch refinement — matching sequential behavior. - Adds regression test: 6 candidates with pool_size=2 all pass, proving no deadlock occurs when passes exceed available slots.
- Add replace_lock to serialize main-tree access in _replace_and_capture - Fix Phase 2 benchmark not writing candidate code to fresh worktree slot - Add _closed flag and ClosedResourceError suppression in pool release - Broaden exception handling and protect finally restore block - Remove unused eval_ctx/exp_type params from run_parallel_evaluation - Add tests for re-staging, partial pool init, restore-on-failure, empty candidates
ee15f84 to
d0586fd
Compare
31be51f to
a47cfad
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fixes 3 bugs found during testing + adds the full test suite (15 tests).
bugs fixed:
_replace_lockserializes concurrent_replace_and_capturecalls that mutate the main treeClosedResourceErroralso: broadened exception handling, protected finally restore block, removed dead params.
depends on #2126 (integration).
this is PR 4/4 in a stack. review and merge in order: