Skip to content

scripts(evals): run evals in parallel#4180

Open
aconchillo wants to merge 1 commit into
mainfrom
aleix/parallel-evals
Open

scripts(evals): run evals in parallel#4180
aconchillo wants to merge 1 commit into
mainfrom
aleix/parallel-evals

Conversation

@aconchillo
Copy link
Copy Markdown
Contributor

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

@aconchillo aconchillo requested a review from markbackman March 27, 2026 22:37
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In run_eval, the Daily room is created near the top of the function but deleted only at the bottom — outside any try/finally block. If an exception escapes after room creation (e.g., during task setup or the asyncio.wait_for call), the room will leak. Wrapping the body after room = await helper.create_room(...) in a try/finally that always calls delete_room_by_name would make the cleanup unconditional.

Additionally, two separate aiohttp.ClientSession instances are opened per eval (one for creation, one for deletion). Since these sessions are very short-lived and the client is stateless, this works correctly, but it's worth either reusing a single session within run_eval or sharing a session at the EvalRunner level across all concurrent evals to reduce connection overhead when many evals run in parallel.

The refactor to _create_eval_callbacks returning closures over a per-eval result_future is clean and correctly removes the shared self._result_future state that would have caused race conditions under parallel execution. The result_future.done() guard in assert_eval is a good addition for the case where both EndFrame and CancelFrame could fire close together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants