Skip to content

feat(evaluators): add native run_async support to LLMEvaluator#11581

Merged
sjrl merged 7 commits into
deepset-ai:mainfrom
GovindhKishore:feat/llm-evaluator-run-async
Jun 15, 2026
Merged

feat(evaluators): add native run_async support to LLMEvaluator#11581
sjrl merged 7 commits into
deepset-ai:mainfrom
GovindhKishore:feat/llm-evaluator-run-async

Conversation

@GovindhKishore

@GovindhKishore GovindhKishore commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

Added native asynchronous support (run_async) to LLMEvaluator, FaithfulnessEvaluator, and ContextRelevanceEvaluator.This allows evaluation pipelines to run concurrently inside asynchronous environments (like FastMCP or FastAPI) without stalling the main event loop during LLM network requests.

How it works:

  • Runtime check for async support: Verifies whether the assigned chat generator natively supports async execution. If it only supports synchronous execution, it automatically drops down to a safe thread-pool fallback via asyncio.to_thread instead of blocking the event loop.
  • Post-processing loop mirroring: FaithfulnessEvaluator and ContextRelevanceEvaluator explicitly mirror their synchronous counterparts, calling the parent async engine and running their unique metrics parsing loops over the concurrent batch results.
  • Non-blocking progress bars: Processes inputs concurrently while using async_tqdm to keep tracking completely non-blocking.
  • Monitored failure tracking: Strictly matches the existing sync error-handling configurations (raise_on_failure), safely caching individual row exceptions into NaN targets when disabled.

How did you test it?

Added complete async test suites across all three evaluator test modules (TestLLMEvaluatorAsync, TestFaithfulnessEvaluatorAsync, and TestContextRelevanceEvaluatorAsync). The tests cover:

  • Standard async flow: Verifying that concurrent evaluation batches run successfully and parse the returned JSON data accurately.
  • Thread-pool fallback execution: Confirming that when a sync-only chat generator is passed, the engine safely offloads the execution to a worker thread via asyncio.to_thread without stalling the event loop.
  • Metric post-processing math: Validating that the custom scoring and statement parsing loops in FaithfulnessEvaluator and ContextRelevanceEvaluator aggregate data identically to their synchronous counterparts.
  • Error isolation and NaN propagation: Testing that when raise_on_failure=False, individual API failures are safely skipped, logged as warnings, and tracked as NaN values without breaking the final average score calculation.

Notes for the reviewer

The implementation structure mirrors the synchronous run method identically to keep the component maintenance straightforward. The async test fixtures utilize standard monkeypatch strategies to mirror existing testing conventions throughout the file.

Checklist

@GovindhKishore GovindhKishore requested a review from a team as a code owner June 10, 2026 17:25
@GovindhKishore GovindhKishore requested review from sjrl and removed request for a team June 10, 2026 17:25
@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

@GovindhKishore is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 10, 2026
@GovindhKishore GovindhKishore marked this pull request as draft June 10, 2026 17:53
@GovindhKishore

Copy link
Copy Markdown
Contributor Author

@sjrl The CI is failing because ContextRelevanceEvaluator and FaithfulnessEvaluator inherit from LLMEvaluator but override the sync run method with extra output fields. This clashes with the inherited base run_async schema and throws a ComponentError. Before touching those components, I want to confirm this is still within scope for this PR. Since most of the underlying generators have async support either directly or inherited from OpenAIChatGenerator, implementing run_async for these child components too makes sense and keeps things consistent.

Is this approach fine and within scope?

I plan to add matching run_async overrides to both child classes. They will call await super().run_async(**inputs) to use the new async engine, then handle their specific metrics to match their sync counterparts.

Let me know if this works or if you prefer a different pattern!

Comment thread haystack/components/evaluators/llm_evaluator.py Outdated
@sjrl

sjrl commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

I plan to add matching run_async overrides to both child classes. They will call await super().run_async(**inputs) to use the new async engine, then handle their specific metrics to match their sync counterparts.

Let me know if this works or if you prefer a different pattern!

Yes please do! Your approach sounds good and I'll double check it once you have it in

@GovindhKishore GovindhKishore force-pushed the feat/llm-evaluator-run-async branch 3 times, most recently from fa29ca2 to 8367773 Compare June 11, 2026 09:44
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/evaluators
  context_relevance.py
  faithfulness.py
  llm_evaluator.py 306-307, 312
Project Total  

This report was generated by python-coverage-comment-action

@GovindhKishore GovindhKishore marked this pull request as ready for review June 11, 2026 09:56
@GovindhKishore

Copy link
Copy Markdown
Contributor Author

I plan to add matching run_async overrides to both child classes. They will call await super().run_async(**inputs) to use the new async engine, then handle their specific metrics to match their sync counterparts.
Let me know if this works or if you prefer a different pattern!

Yes please do! Your approach sounds good and I'll double check it once you have it in

@sjrl I've updated the implementation. Kindly take your time to leave a review.

Comment thread haystack/components/evaluators/context_relevance.py
Comment thread haystack/components/evaluators/faithfulness.py
Comment thread haystack/components/evaluators/llm_evaluator.py
Comment thread test/components/evaluators/test_faithfulness_evaluator.py
Comment thread test/components/evaluators/test_context_relevance_evaluator.py
@GovindhKishore GovindhKishore force-pushed the feat/llm-evaluator-run-async branch from d82fc9a to 58ddbbf Compare June 15, 2026 09:23
Comment thread test/components/evaluators/test_context_relevance_evaluator.py Outdated
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

@sjrl sjrl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sjrl sjrl merged commit 4dd018a into deepset-ai:main Jun 15, 2026
24 of 25 checks passed
@GovindhKishore

Copy link
Copy Markdown
Contributor Author

Thanks!

Thanks a lot for your guidance and merge @sjrl

@GovindhKishore GovindhKishore deleted the feat/llm-evaluator-run-async branch June 15, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add asynchronous support (run_async) to LLMEvaluator

2 participants