Skip to content

fix: race conditions, re-staging bug, and parallel evaluator test suite#2127

Closed
KRRT7 wants to merge 4 commits into
parallel-eval/03-integrationfrom
parallel-eval/04-tests-and-fixes
Closed

fix: race conditions, re-staging bug, and parallel evaluator test suite#2127
KRRT7 wants to merge 4 commits into
parallel-eval/03-integrationfrom
parallel-eval/04-tests-and-fixes

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented May 7, 2026

fixes 3 bugs found during testing + adds the full test suite (15 tests).

bugs fixed:

  • race condition_replace_lock serializes concurrent _replace_and_capture calls that mutate the main tree
  • re-staging bug — Phase 2 acquired a fresh slot but never wrote candidate code to it (benchmarks ran against original code)
  • cleanup crash — tasks releasing slots after pool streams closed caused ClosedResourceError

also: broadened exception handling, protected finally restore block, removed dead params.

depends on #2126 (integration).

this is PR 4/4 in a stack. review and merge in order:

  1. feat: parallel evaluation infrastructure (worktree pool, async subprocess, shared types) #2124 — infrastructure
  2. feat: two-phase parallel candidate evaluator and batch_refine API #2125 — evaluator algorithm
  3. feat: integrate parallel evaluator into FunctionOptimizer #2126 — optimizer integration
  4. fix: race conditions, re-staging bug, and parallel evaluator test suite #2127 (this) — bug fixes + tests

KRRT7 added 4 commits May 6, 2026 20:53
Covers the full stack: pool lifecycle/cleanup, file isolation between
slots, subprocess stdout/stderr/timeout, and evaluator logic (failure
with diffs, success routing, concurrent multi-candidate).
… evaluator

Critical fixes from code review:
- Deadlock: slots are now released after behavioral tests (Phase 1),
  re-acquired for benchmarking (Phase 2). Previously, holding slots
  across phases caused deadlock when passes >= pool_size.
- Pydantic ValidationError: behavior_test_results is now stored in
  _BehavioralPass and passed through to OptimizedCandidateResult.
- Slot leak on cancellation: catch BaseException in _behavioral_phase.

WorktreePool improvements:
- Graceful partial creation failure (one slot failing doesn't crash pool).
- Cleanup resilience (one rmtree failure doesn't abort others).
- Stream lifecycle: close send/receive in cleanup().
- Async-safe: use anyio.Path for exists() checks.
- Python 3.12+: use onexc instead of deprecated onerror for rmtree.
- Remove dead code: PID file, unused restore_file method.

Other fixes:
- _run_line_profiler_for_winner: catch all exceptions.
- _dispatch_repair_if_possible: skip when diffs are empty.
- aiservice.py: pass language to _get_valid_candidates in batch path.
- Remove unused AIServiceBatchRefinerRequest dataclass.
- Fix result file path collision: include slot.index in filename.
- Remove _code_replace_lock (no longer needed since slots are released
  immediately and _replace_and_capture is serialized by GIL).
…ession test

- Parallel path now checks if a successful candidate was previously
  refined (via path_to_root ancestry). If so, dispatches adaptive
  optimization instead of batch refinement — matching sequential behavior.
- Adds regression test: 6 candidates with pool_size=2 all pass, proving
  no deadlock occurs when passes exceed available slots.
- Add replace_lock to serialize main-tree access in _replace_and_capture
- Fix Phase 2 benchmark not writing candidate code to fresh worktree slot
- Add _closed flag and ClosedResourceError suppression in pool release
- Broaden exception handling and protect finally restore block
- Remove unused eval_ctx/exp_type params from run_parallel_evaluation
- Add tests for re-staging, partial pool init, restore-on-failure, empty candidates
@KRRT7 KRRT7 force-pushed the parallel-eval/04-tests-and-fixes branch from ee15f84 to d0586fd Compare May 7, 2026 01:53
@KRRT7 KRRT7 force-pushed the parallel-eval/03-integration branch from 31be51f to a47cfad Compare May 7, 2026 01:53
@KRRT7 KRRT7 closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant