Skip to content

feat: two-phase parallel candidate evaluator and batch_refine API#2125

Closed
KRRT7 wants to merge 3 commits into
parallel-eval/01-infrastructurefrom
parallel-eval/02-evaluator
Closed

feat: two-phase parallel candidate evaluator and batch_refine API#2125
KRRT7 wants to merge 3 commits into
parallel-eval/01-infrastructurefrom
parallel-eval/02-evaluator

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented May 7, 2026

the core evaluation algorithm. adds:

  • ParallelCandidateEvaluator with two-phase design: Phase 1 runs behavioral correctness tests concurrently (one worktree slot per candidate), Phase 2 runs benchmarks sequentially (no CPU contention for accurate timing)
  • batch_refine endpoint on AiServiceClient

depends on #2124 (infrastructure).

this is PR 2/4 in a stack. review and merge in order:

  1. feat: parallel evaluation infrastructure (worktree pool, async subprocess, shared types) #2124 — infrastructure
  2. feat: two-phase parallel candidate evaluator and batch_refine API #2125 (this) — evaluator algorithm
  3. feat: integrate parallel evaluator into FunctionOptimizer #2126 — optimizer integration
  4. fix: race conditions, re-staging bug, and parallel evaluator test suite #2127 — bug fixes + tests

KRRT7 added 3 commits May 6, 2026 20:52
Phase 1 (concurrent): behavioral correctness tests run in parallel.
  Failed candidates release their worktree slot immediately.
Phase 2 (sequential): only passing candidates get benchmarked, one
  at a time, for accurate timing without CPU contention.

EvalFailure carries test diffs for repair context.
Adds the API method for submitting multiple candidates for
refinement in a single request — used by the parallel evaluator
to dispatch refinement/repair after evaluation completes.
dataclass(slots=True) requires Python 3.10+.
@KRRT7 KRRT7 force-pushed the parallel-eval/02-evaluator branch from ec123fd to 22736d3 Compare May 7, 2026 01:53
@KRRT7 KRRT7 closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant