feat(evaluation): unify validators with azureml-assets by m7md7sien · Pull Request #47526 · Azure/azure-sdk-for-python

m7md7sien · 2026-06-16T16:34:27Z

Description

add DEVELOPER role, EvaluationLevel, MessagesOrQueryResponseInputValidator + level utils
support actions/expected_actions aliases in TaskNavigationEfficiencyValidator
align check_for_unsupported_tools flags in tool_call/input/output evaluators

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

- add DEVELOPER role, EvaluationLevel, MessagesOrQueryResponseInputValidator + level utils - support actions/expected_actions aliases in TaskNavigationEfficiencyValidator - align check_for_unsupported_tools flags in tool_call/input/output evaluators

Copilot

Pull request overview

This PR updates azure-ai-evaluation’s internal evaluator input validation layer to better align with azureml-assets naming and behavior, while expanding supported conversation roles and adding utilities for evaluation-level handling.

Changes:

Added DEVELOPER message role support and introduced EvaluationLevel plus evaluation-level utility helpers.
Added MessagesOrQueryResponseInputValidator to support both multi-turn (messages) and single-turn (query/response) input shapes.
Added actions/expected_actions aliases for task navigation efficiency inputs, and aligned check_for_unsupported_tools behavior across tool-related evaluators/validators.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_output_utilization/_tool_output_utilization.py	Enables unsupported-tool checking for tool output utilization inputs.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_input_accuracy/_tool_input_accuracy.py	Adjusts unsupported-tool checking behavior for tool input accuracy validation.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy.py	Adjusts unsupported-tool checking behavior for tool call accuracy validation.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_validators/_validation_constants.py	Adds `DEVELOPER` role and introduces the `EvaluationLevel` enum.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_validators/_task_navigation_efficiency_validator.py	Adds normalization to accept `actions`/`expected_actions` aliases.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_validators/_messages_or_query_response_validator.py	New validator supporting either `messages` or `query`/`response` input formats.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_validators/_evaluation_level_utils.py	New helper utilities for resolving evaluation levels and reshaping message inputs.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_validators/_conversation_validator.py	Adds developer-role validation handling and minor error-message cleanup.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_validators/init.py	Exposes new enums/validators/utilities from the validators package.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

…luated Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

mmkawale · 2026-06-17T23:33:37Z

+                    target=self.error_target,
+                )
+            # The final assistant message must contain text
+            last_content = messages[-1].get("content", "")


Here we assume that the last message will have a role as assistant, but that may not be the case. Can we explicitly check that the last message's role is assistant before moving on to content check?

mmkawale · 2026-06-17T23:36:31Z

        self._validator = ToolCallsValidator(
            error_target=ErrorTarget.TOOL_CALL_ACCURACY_EVALUATOR,
-            check_for_unsupported_tools=True,
+            check_for_unsupported_tools=False,


This is the same change I am making in my sdk pr: https://github.com/Azure/azure-sdk-for-python/pull/47462/changes#diff-f0cd98f94f077616907714246b399d03dcc97bde3cde5dbe0ff1dac8c5253869

mmkawale · 2026-06-17T23:49:56Z

-            error_target=ErrorTarget.TOOL_OUTPUT_UTILIZATION_EVALUATOR, optional_tool_definitions=False
+            error_target=ErrorTarget.TOOL_OUTPUT_UTILIZATION_EVALUATOR,
+            optional_tool_definitions=False,
+            check_for_unsupported_tools=True,


In assets we pass this flag check_for_unsupported_tools correctly. It would be great to create a matrix for all the built in evals with the expected inputs and outputs along with the values for these flags.

mmkawale · 2026-06-18T00:16:00Z

+from ._tool_definitions_validator import ToolDefinitionsValidator
+
+
+class MessagesOrQueryResponseInputValidator(ToolDefinitionsValidator):


Let's add unit tests for these new validators.

github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jun 16, 2026

m7md7sien requested a review from Copilot June 16, 2026 17:06

Copilot started reviewing on behalf of m7md7sien June 16, 2026 17:07 View session

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Potential fix for pull request finding

2808e5a

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of m7md7sien June 16, 2026 17:23 View session

Copilot AI and others added 2 commits June 16, 2026 17:28

Add unit tests for actions/expected_actions alias input normalization

106ac42

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Remove redundant assertions from test_both_aliases_normalized_and_eva…

aadd11c

…luated Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot finished work on behalf of m7md7sien June 16, 2026 17:29

mmkawale reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): unify validators with azureml-assets#47526

feat(evaluation): unify validators with azureml-assets#47526
m7md7sien wants to merge 4 commits into
mainfrom
mohessie/update_eval_validators

m7md7sien commented Jun 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmkawale Jun 17, 2026

Uh oh!

mmkawale Jun 17, 2026

Uh oh!

mmkawale Jun 17, 2026

Uh oh!

mmkawale Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		from ._tool_definitions_validator import ToolDefinitionsValidator


		class MessagesOrQueryResponseInputValidator(ToolDefinitionsValidator):

Conversation

m7md7sien commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mmkawale Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

mmkawale Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

mmkawale Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

mmkawale Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

m7md7sien commented Jun 16, 2026 •

edited

Loading