Skip to content

Skipped status and standardize output#5042

Merged
ashaabansoliman merged 7 commits into
mainfrom
Skipped_Status_and_Standardize_Output
May 15, 2026
Merged

Skipped status and standardize output#5042
ashaabansoliman merged 7 commits into
mainfrom
Skipped_Status_and_Standardize_Output

Conversation

@ashaabansoliman
Copy link
Copy Markdown
Contributor

@ashaabansoliman ashaabansoliman commented May 15, 2026

This PR is collaborative effort between Mohamed Hussein and Ahmed Shaaban to:
-Update and standardize evaluator output schema
-Provide new field "status" to indicate if the evaluation completed, resulted in an error or skipped
Note: The commits of this branch were created in multi-stages on multi-branches by both authors using Copilot

Copilot AI and others added 3 commits May 13, 2026 22:27
…mpty files

Agent-Logs-Url: https://github.com/Azure/azureml-assets/sessions/50beeb9d-8306-4f00-ab60-7924ef98ecd4

Co-authored-by: ashaabansoliman <109526961+ashaabansoliman@users.noreply.github.com>
… output fields

- intent_resolution, relevance: rename 'explanation' -> 'reason', add 'status' field, add skipped handling
- response_completeness: add skipped handling to task section (already had json_object + reason/status)
- task_adherence: replace flagged/reasoning schema with score/reason/status, update all flag/unflag language to score 0/1
- task_completion: (already up to date)
- tool_call_accuracy: increase max_tokens from 3000 to 5000
- tool_call_success: rename explanation->reason, details->properties, success->score+status, add skipped handling
- tool_input_accuracy: rename chain_of_thought->reason, details->properties, result->score, add skipped handling
- tool_output_utilization: wrap faulty_details in properties object, rename label->score (pass/fail -> 1/0), add status field
- tool_selection: rename explanation->reason, details->properties, add status field and skipped handling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: ashaabansoliman <109526961+ashaabansoliman@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Test Results for assets-test

1 499 tests   1 499 ✅  59s ⏱️
   23 suites      0 💤
   23 files        0 ❌

Results for commit 32b9d44.

♻️ This comment has been updated with latest results.

@ashaabansoliman
Copy link
Copy Markdown
Contributor Author

This PR is collaborative effort between Mohamed Hussein and Ahmed Shaaban to:
-Update and standardize evaluator output schema
-Provide new field "status" to indicate if the evaluation completed, resulted in an error or skipped
Note: The commits of this branch were created in multi-stages on multi-branches by both authors using Copilot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants