Skip to content

[WIP] Port changes from upstream PR to standardize output schema#5037

Draft
Copilot wants to merge 1 commit into
Standardize-Eval-Outputfrom
copilot/standardize-eval-output
Draft

[WIP] Port changes from upstream PR to standardize output schema#5037
Copilot wants to merge 1 commit into
Standardize-Eval-Outputfrom
copilot/standardize-eval-output

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 13, 2026

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.

Original prompt

Goal

Port the in-scope changes from upstream PR Azure/azure-sdk-for-python#46436 ("Standardize Output Schema for Evaluators") onto the existing branch Standardize-Eval-Output in this repository, then open a Pull Request from Standardize-Eval-Outputmain.

The upstream PR lives in a different repo with a different layout. Only files under sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/ are in scope. They must be re-targeted to this repo's layout under assets/evaluators/builtin/<evaluator>/evaluator/.

Important: Work directly on the existing Standardize-Eval-Output branch. Do not create a new branch. Open the PR from Standardize-Eval-Outputmain.


Scope — files to port

The mapping below lists every upstream source file (in sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/...) and its destination in this repo. Apply the upstream patches to the destination files. Use the upstream PR diff as the source of truth.

# Upstream file (in Azure/azure-sdk-for-python PR #46436) Target file in Azure/azureml-assets (branch Standardize-Eval-Output)
1 _evaluators/_bleu/_bleu.py assets/evaluators/builtin/bleu_score/evaluator/_bleu.py
2 _evaluators/_coherence/coherence.prompty assets/evaluators/builtin/coherence/evaluator/coherence.prompty
3 _evaluators/_document_retrieval/_document_retrieval.py assets/evaluators/builtin/document_retrieval/evaluator/_document_retrieval.py
4 _evaluators/_f1_score/_f1_score.py assets/evaluators/builtin/f1_score/evaluator/_f1_score.py
5 _evaluators/_fluency/fluency.prompty assets/evaluators/builtin/fluency/evaluator/fluency.prompty
6 _evaluators/_gleu/_gleu.py assets/evaluators/builtin/gleu_score/evaluator/_gleu.py
7 _evaluators/_groundedness/_groundedness.py assets/evaluators/builtin/groundedness/evaluator/_groundedness.py
8 _evaluators/_groundedness/groundedness_with_query.prompty assets/evaluators/builtin/groundedness/evaluator/groundedness_with_query.prompty
9 _evaluators/_groundedness/groundedness_without_query.prompty assets/evaluators/builtin/groundedness/evaluator/groundedness_without_query.prompty
10 _evaluators/_intent_resolution/_intent_resolution.py assets/evaluators/builtin/intent_resolution/evaluator/_intent_resolution.py
11 _evaluators/_intent_resolution/intent_resolution.prompty assets/evaluators/builtin/intent_resolution/evaluator/intent_resolution.prompty
12 _evaluators/_meteor/_meteor.py assets/evaluators/builtin/meteor_score/evaluator/_meteor.py
13 _evaluators/_relevance/_relevance.py assets/evaluators/builtin/relevance/evaluator/_relevance.py
14 _evaluators/_relevance/relevance.prompty assets/evaluators/builtin/relevance/evaluator/relevance.prompty
15 _evaluators/_response_completeness/_response_completeness.py assets/evaluators/builtin/response_completeness/evaluator/_response_completeness.py
16 _evaluators/_response_completeness/response_completeness.prompty assets/evaluators/builtin/response_completeness/evaluator/response_completeness.prompty
17 _evaluators/_retrieval/retrieval.prompty assets/evaluators/builtin/retrieval/evaluator/retrieval.prompty
18 _evaluators/_rouge/_rouge.py assets/evaluators/builtin/rouge_score/evaluator/_rouge.py
19 _evaluators/_similarity/similarity.prompty assets/evaluators/builtin/similarity/evaluator/similarity.prompty
20 _evaluators/_task_adherence/_task_adherence.py assets/evaluators/builtin/task_adherence/evaluator/_task_adherence.py
21 _evaluators/_task_adherence/task_adherence.prompty assets/evaluators/builtin/task_adherence/evaluator/task_adherence.prompty
22 _evaluators/_task_completion/_task_completion.py assets/evaluators/builtin/task_completion/evaluator/_task_completion.py
23 _evaluators/_task_completion/task_completion.prompty assets/evaluators/builtin/task_completion/evaluator/task_completion.prompty
24 _evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py assets/evaluators/builtin/task_navigation_efficiency/evaluator/_task_navigation_efficiency.py
25 _evaluators/_tool_call_accuracy/_tool_call_accuracy.py assets/evaluators/builtin/tool_call_accuracy/evaluator/_tool_call_accuracy.py
26 _evaluators/_tool_call_accuracy/tool_call_accuracy.prompty assets/evaluators/builtin/tool_call_accuracy/evaluator/tool_call_accuracy.prompty
27 _evaluators/_tool_call_success/_tool_call_success.py assets/evaluators/builtin/tool_call_success/evaluator/_tool_call_success.py
28 _evaluators/_tool_call_success/tool_call_success.prompty assets/evaluators/builtin/tool_call_success/evaluator/tool_call_success.prompty
29 `_evaluators/_tool_input_accuracy/_tool...

This pull request was created from Copilot chat.

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale because it has been inactive for 14 days.

@github-actions github-actions Bot added the Stale label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants