scripts(evals): run evals in parallel#4180
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
JiwaniZakir
left a comment
There was a problem hiding this comment.
In run_eval, the Daily room is created near the top of the function but deleted only at the bottom — outside any try/finally block. If an exception escapes after room creation (e.g., during task setup or the asyncio.wait_for call), the room will leak. Wrapping the body after room = await helper.create_room(...) in a try/finally that always calls delete_room_by_name would make the cleanup unconditional.
Additionally, two separate aiohttp.ClientSession instances are opened per eval (one for creation, one for deletion). Since these sessions are very short-lived and the client is stateless, this works correctly, but it's worth either reusing a single session within run_eval or sharing a session at the EvalRunner level across all concurrent evals to reduce connection overhead when many evals run in parallel.
The refactor to _create_eval_callbacks returning closures over a per-eval result_future is clean and correctly removes the shared self._result_future state that would have caused race conditions under parallel execution. The result_future.done() guard in assert_eval is a good addition for the case where both EndFrame and CancelFrame could fire close together.
Please describe the changes in your PR. If it is addressing an issue, please reference that as well.