feat(evaluators): add native run_async support to LLMEvaluator by GovindhKishore · Pull Request #11581 · deepset-ai/haystack

GovindhKishore · 2026-06-10T17:25:57Z

Related Issues

fixes feat: Add asynchronous support (run_async) to LLMEvaluator #11579

Proposed Changes:

Added native asynchronous support (run_async) to LLMEvaluator, FaithfulnessEvaluator, and ContextRelevanceEvaluator.This allows evaluation pipelines to run concurrently inside asynchronous environments (like FastMCP or FastAPI) without stalling the main event loop during LLM network requests.

How it works:

Runtime check for async support: Verifies whether the assigned chat generator natively supports async execution. If it only supports synchronous execution, it automatically drops down to a safe thread-pool fallback via asyncio.to_thread instead of blocking the event loop.
Post-processing loop mirroring: FaithfulnessEvaluator and ContextRelevanceEvaluator explicitly mirror their synchronous counterparts, calling the parent async engine and running their unique metrics parsing loops over the concurrent batch results.
Non-blocking progress bars: Processes inputs concurrently while using async_tqdm to keep tracking completely non-blocking.
Monitored failure tracking: Strictly matches the existing sync error-handling configurations (raise_on_failure), safely caching individual row exceptions into NaN targets when disabled.

How did you test it?

Added complete async test suites across all three evaluator test modules (TestLLMEvaluatorAsync, TestFaithfulnessEvaluatorAsync, and TestContextRelevanceEvaluatorAsync). The tests cover:

Standard async flow: Verifying that concurrent evaluation batches run successfully and parse the returned JSON data accurately.
Thread-pool fallback execution: Confirming that when a sync-only chat generator is passed, the engine safely offloads the execution to a worker thread via asyncio.to_thread without stalling the event loop.
Metric post-processing math: Validating that the custom scoring and statement parsing loops in FaithfulnessEvaluator and ContextRelevanceEvaluator aggregate data identically to their synchronous counterparts.
Error isolation and NaN propagation: Testing that when raise_on_failure=False, individual API failures are safely skipped, logged as warnings, and tracked as NaN values without breaking the final average score calculation.

Notes for the reviewer

The implementation structure mirrors the synchronous run method identically to keep the component maintenance straightforward. The async test fixtures utilize standard monkeypatch strategies to mirror existing testing conventions throughout the file.

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: feat: add run_async support to LLMEvaluator
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

vercel · 2026-06-10T17:26:03Z

@GovindhKishore is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

GovindhKishore · 2026-06-10T18:07:42Z

@sjrl The CI is failing because ContextRelevanceEvaluator and FaithfulnessEvaluator inherit from LLMEvaluator but override the sync run method with extra output fields. This clashes with the inherited base run_async schema and throws a ComponentError. Before touching those components, I want to confirm this is still within scope for this PR. Since most of the underlying generators have async support either directly or inherited from OpenAIChatGenerator, implementing run_async for these child components too makes sense and keeps things consistent.

Is this approach fine and within scope?

I plan to add matching run_async overrides to both child classes. They will call await super().run_async(**inputs) to use the new async engine, then handle their specific metrics to match their sync counterparts.

Let me know if this works or if you prefer a different pattern!

sjrl · 2026-06-11T05:49:03Z

I plan to add matching run_async overrides to both child classes. They will call await super().run_async(**inputs) to use the new async engine, then handle their specific metrics to match their sync counterparts.

Let me know if this works or if you prefer a different pattern!

Yes please do! Your approach sounds good and I'll double check it once you have it in

github-actions · 2026-06-11T09:54:43Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
haystack/components/evaluators
context_relevance.py
faithfulness.py
llm_evaluator.py					306-307, 312
Project Total

_{This report was generated by python-coverage-comment-action}

GovindhKishore · 2026-06-11T10:01:36Z

I plan to add matching run_async overrides to both child classes. They will call await super().run_async(**inputs) to use the new async engine, then handle their specific metrics to match their sync counterparts.
Let me know if this works or if you prefer a different pattern!

Yes please do! Your approach sounds good and I'll double check it once you have it in

@sjrl I've updated the implementation. Kindly take your time to leave a review.

…nerators to LLMEvaluator

…e with tests

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

sjrl

Thanks!

GovindhKishore · 2026-06-15T11:49:43Z

Thanks!

Thanks a lot for your guidance and merge @sjrl

GovindhKishore requested a review from a team as a code owner June 10, 2026 17:25

GovindhKishore requested review from sjrl and removed request for a team June 10, 2026 17:25

github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 10, 2026

GovindhKishore marked this pull request as draft June 10, 2026 17:53

sjrl reviewed Jun 11, 2026

View reviewed changes

Comment thread haystack/components/evaluators/llm_evaluator.py Outdated

GovindhKishore force-pushed the feat/llm-evaluator-run-async branch 3 times, most recently from fa29ca2 to 8367773 Compare June 11, 2026 09:44

GovindhKishore marked this pull request as ready for review June 11, 2026 09:56

sjrl reviewed Jun 11, 2026

View reviewed changes

Comment thread haystack/components/evaluators/context_relevance.py

sjrl reviewed Jun 11, 2026

View reviewed changes

Comment thread haystack/components/evaluators/faithfulness.py

sjrl reviewed Jun 11, 2026

View reviewed changes

Comment thread haystack/components/evaluators/llm_evaluator.py

sjrl reviewed Jun 15, 2026

View reviewed changes

Comment thread test/components/evaluators/test_faithfulness_evaluator.py

sjrl reviewed Jun 15, 2026

View reviewed changes

Comment thread test/components/evaluators/test_context_relevance_evaluator.py

GovindhKishore added 6 commits June 15, 2026 14:52

feat(evaluators): add native run_async support to LLMEvaluator

ff3818e

docs: fix reStructuredText formatting in release note

029993d

feat(evaluators): add run_async with thread fallback for sync only ge…

9e928f9

…nerators to LLMEvaluator

feat(evaluators): add run_async for faithfulness and context relevanc…

077f73e

…e with tests

refactor: extract private helper methods for evaluator post-processing

c2083d5

test: add async integration tests for faithfulness and context relevance

58ddbbf

GovindhKishore force-pushed the feat/llm-evaluator-run-async branch from d82fc9a to 58ddbbf Compare June 15, 2026 09:23

sjrl reviewed Jun 15, 2026

View reviewed changes

Comment thread test/components/evaluators/test_context_relevance_evaluator.py Outdated

test: fix method invocation from run to run_async

63e4f07

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

sjrl approved these changes Jun 15, 2026

View reviewed changes

sjrl merged commit 4dd018a into deepset-ai:main Jun 15, 2026
24 of 25 checks passed

GovindhKishore deleted the feat/llm-evaluator-run-async branch June 15, 2026 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluators): add native run_async support to LLMEvaluator#11581

feat(evaluators): add native run_async support to LLMEvaluator#11581
sjrl merged 7 commits into
deepset-ai:mainfrom
GovindhKishore:feat/llm-evaluator-run-async

GovindhKishore commented Jun 10, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

GovindhKishore commented Jun 10, 2026

Uh oh!

Uh oh!

sjrl commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

GovindhKishore commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjrl left a comment

Uh oh!

Uh oh!

GovindhKishore commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GovindhKishore commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented Jun 10, 2026

Uh oh!

GovindhKishore commented Jun 10, 2026

Uh oh!

Uh oh!

sjrl commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

GovindhKishore commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjrl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GovindhKishore commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GovindhKishore commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading