Update Groundedness Evaluator to support multi-turn evaluation by salma-elshafey · Pull Request #4942 · Azure/azureml-assets

salma-elshafey · 2026-04-20T11:17:46Z

Updated the eval to support multi turn.

multiturn quality report: https://msdata.visualstudio.com/Vienna/_git/evaluators?path=/experiments/evaluator_quality_analysis/reports/azure_mt_evaluator_quality_report.md&_a=preview

github-actions · 2026-04-20T11:18:17Z

Test Results for assets-test

92 tests 92 ✅ 2s ⏱️
1 suites 0 💤
1 files 0 ❌

Results for commit 97e8668.

♻️ This comment has been updated with latest results.

…ess_multiturn

…#4957) Data-driven prompt improvements validated on 300 FaithDial traces: - Added 5-step evaluation procedure with tool-result parsing guidance - Introduced 'substantive hallucination' vs 'NOT hallucination' taxonomy - Added score-4 example showing tool-call + paraphrase pattern - Tightened score-3 to require zero factual propositions - Added CRITICAL rule: any substantive hallucination -> score <= 2 Results (v1 -> v5): - gpt-5.4-mini: 76.3% -> 83.7% accuracy, F1 88.2% - gpt-5.4: 65.7% -> 79.7% accuracy, F1 83.1% Co-authored-by: Ali Mahmoudzadeh <amah@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

salma-elshafey added 2 commits April 19, 2026 17:45

Add groundedness multi-turn v0

d48eb8f

update output format and prompty

a4c0f3b

salma-elshafey temporarily deployed to Testing April 20, 2026 11:18 — with GitHub Actions Inactive

salma-elshafey had a problem deploying to Testing April 20, 2026 11:21 — with GitHub Actions Failure

update prompty

2313436

salma-elshafey temporarily deployed to Testing April 20, 2026 15:30 — with GitHub Actions Inactive

salma-elshafey had a problem deploying to Testing April 20, 2026 15:31 — with GitHub Actions Failure

salma-elshafey added 2 commits April 23, 2026 12:31

rename trace to turn

7026a5d

update groundedness post processing

cf81d6f

salma-elshafey temporarily deployed to Testing April 23, 2026 11:44 — with GitHub Actions Inactive

salma-elshafey temporarily deployed to Testing April 23, 2026 11:45 — with GitHub Actions Inactive

Merge remote-tracking branch 'upstream/main' into selshafey/groundedn…

5bd08f8

…ess_multiturn

salma-elshafey temporarily deployed to Testing April 23, 2026 11:46 — with GitHub Actions Inactive

salma-elshafey had a problem deploying to Testing April 23, 2026 11:48 — with GitHub Actions Failure

salma-elshafey added 2 commits April 23, 2026 14:25

update test

495a512

re-add details field

ee334c4

salma-elshafey temporarily deployed to Testing April 23, 2026 14:52 — with GitHub Actions Inactive

salma-elshafey temporarily deployed to Testing April 23, 2026 14:54 — with GitHub Actions Inactive

Merge remote-tracking branch 'upstream/main' into selshafey/groundedn…

ad60f8f

…ess_multiturn

salma-elshafey temporarily deployed to Testing April 23, 2026 17:17 — with GitHub Actions Inactive

salma-elshafey temporarily deployed to Testing April 23, 2026 17:19 — with GitHub Actions Inactive

AliMahmoudzadeh temporarily deployed to Testing April 23, 2026 23:36 — with GitHub Actions Inactive

AliMahmoudzadeh temporarily deployed to Testing April 23, 2026 23:38 — with GitHub Actions Inactive

Merge branch 'main' into selshafey/groundedness_multiturn

97e8668

AliMahmoudzadeh marked this pull request as ready for review April 24, 2026 18:00

AliMahmoudzadeh requested review from a team as code owners April 24, 2026 18:00

AliMahmoudzadeh temporarily deployed to Testing April 24, 2026 18:02 — with GitHub Actions Inactive

AliMahmoudzadeh temporarily deployed to Testing April 24, 2026 18:03 — with GitHub Actions Inactive

AliMahmoudzadeh temporarily deployed to Testing April 24, 2026 18:04 — with GitHub Actions Inactive

lykelly19 approved these changes Apr 24, 2026

View reviewed changes

AliMahmoudzadeh merged commit 9971fb5 into main Apr 24, 2026
38 checks passed

AliMahmoudzadeh deleted the selshafey/groundedness_multiturn branch April 24, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Groundedness Evaluator to support multi-turn evaluation#4942

Update Groundedness Evaluator to support multi-turn evaluation#4942
AliMahmoudzadeh merged 11 commits into
mainfrom
selshafey/groundedness_multiturn

salma-elshafey commented Apr 20, 2026 •

edited by AliMahmoudzadeh

Loading

Uh oh!

github-actions Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

salma-elshafey commented Apr 20, 2026 • edited by AliMahmoudzadeh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results for assets-test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

salma-elshafey commented Apr 20, 2026 •

edited by AliMahmoudzadeh

Loading

github-actions Bot commented Apr 20, 2026 •

edited

Loading