Skipped status and standardize output by ashaabansoliman · Pull Request #5042 · Azure/azureml-assets

ashaabansoliman · 2026-05-15T00:05:16Z

This PR is collaborative effort between Mohamed Hussein and Ahmed Shaaban to:
-Update and standardize evaluator output schema
-Provide new field "status" to indicate if the evaluation completed, resulted in an error or skipped
Note: The commits of this branch were created in multi-stages on multi-branches by both authors using Copilot

…mpty files Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/50beeb9d-8306-4f00-ab60-7924ef98ecd4 Co-authored-by: ashaabansoliman <109526961+ashaabansoliman@users.noreply.github.com>

… output fields - intent_resolution, relevance: rename 'explanation' -> 'reason', add 'status' field, add skipped handling - response_completeness: add skipped handling to task section (already had json_object + reason/status) - task_adherence: replace flagged/reasoning schema with score/reason/status, update all flag/unflag language to score 0/1 - task_completion: (already up to date) - tool_call_accuracy: increase max_tokens from 3000 to 5000 - tool_call_success: rename explanation->reason, details->properties, success->score+status, add skipped handling - tool_input_accuracy: rename chain_of_thought->reason, details->properties, result->score, add skipped handling - tool_output_utilization: wrap faulty_details in properties object, rename label->score (pass/fail -> 1/0), add status field - tool_selection: rename explanation->reason, details->properties, add status field and skipped handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: ashaabansoliman <109526961+ashaabansoliman@users.noreply.github.com>

github-actions · 2026-05-15T00:07:08Z

Test Results for assets-test

1 499 tests 1 499 ✅ 59s ⏱️
23 suites 0 💤
23 files 0 ❌

Results for commit 32b9d44.

♻️ This comment has been updated with latest results.

ashaabansoliman · 2026-05-15T17:07:41Z

This PR is collaborative effort between Mohamed Hussein and Ahmed Shaaban to:
-Update and standardize evaluator output schema
-Provide new field "status" to indicate if the evaluation completed, resulted in an error or skipped
Note: The commits of this branch were created in multi-stages on multi-branches by both authors using Copilot

Copilot AI and others added 3 commits May 13, 2026 22:27

Initial plan

182b6a0

Port upstream PR #46436: update Python evaluator files and simple pro…

ad98a19

…mpty files Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/50beeb9d-8306-4f00-ab60-7924ef98ecd4 Co-authored-by: ashaabansoliman <109526961+ashaabansoliman@users.noreply.github.com>

ashaabansoliman requested review from a team as code owners May 15, 2026 00:05

ashaabansoliman temporarily deployed to Testing May 15, 2026 00:06 — with GitHub Actions Inactive

ashaabansoliman temporarily deployed to Testing May 15, 2026 00:07 — with GitHub Actions Inactive

ashaabansoliman had a problem deploying to Testing May 15, 2026 00:12 — with GitHub Actions Failure