Skip to content

Commit 89f4129

Browse files
committed
Export _ToolInputAccuracyEvaluator from azure.ai.evaluation top-level namespace
Brings _ToolInputAccuracyEvaluator in line with its three sibling tool evaluators (ToolCallAccuracyEvaluator, _ToolCallSuccessEvaluator, _ToolOutputUtilizationEvaluator) which are already exposed on the top-level package. Consumers (notably the Foundry evaluations service catalog) can now import it from azure.ai.evaluation directly instead of reaching into the private _evaluators._tool_input_accuracy submodule.
1 parent af7b07a commit 89f4129

2 files changed

Lines changed: 3 additions & 0 deletions

File tree

sdk/evaluation/azure-ai-evaluation/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
### Features Added
66

77
- Enabled `ToolCallAccuracyEvaluator`, `_ToolInputAccuracyEvaluator`, and `_ToolCallSuccessEvaluator` to run on conversations that include built-in restricted tools (`bing_grounding`, `bing_custom_search`, `azure_ai_search`, `azure_fabric`, `sharepoint_grounding`). These three evaluators grade the agent's tool selection, input arguments, and call status — none of which require the (redacted) tool output body — so the previous unconditional rejection of conversations containing restricted tools is now lifted. Achieved by setting `check_for_unsupported_tools=False` on each evaluator's input validator. `GroundednessEvaluator` and `ToolOutputUtilizationEvaluator` continue to reject restricted tools because they consume the tool output body.
8+
- Exported `_ToolInputAccuracyEvaluator` from the top-level `azure.ai.evaluation` namespace so consumers no longer need to reach into the private `_evaluators._tool_input_accuracy` submodule. The other tool evaluators were already exposed there; this brings the four siblings in line.
89
- `_ToolCallSuccessEvaluator` now deterministically returns `fail` (score `0`, `_passed=False`) without invoking the LLM when any `tool_call` or `tool_result` in the response carries a known-failure `status` (`failed`, `error`, `incomplete`, `cancelled`/`canceled`). This matches the evaluator's binary contract ("FALSE: at least one tool call failed") and prevents the prompty rubric -- which doesn't see the `status` field -- from mis-grading conversations whose only failure signal is the runtime-reported execution status. Behavior is unchanged for responses where no `status` is populated.
910

1011
## 1.17.0 (2026-06-03)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from ._evaluators._document_retrieval import DocumentRetrievalEvaluator
3535
from ._evaluators._tool_output_utilization import _ToolOutputUtilizationEvaluator
3636
from ._evaluators._tool_call_success import _ToolCallSuccessEvaluator
37+
from ._evaluators._tool_input_accuracy import _ToolInputAccuracyEvaluator
3738
from ._model_configurations import (
3839
AzureAIProject,
3940
AzureOpenAIModelConfiguration,
@@ -135,6 +136,7 @@ def lazy_import():
135136
"ToolCallAccuracyEvaluator",
136137
"_ToolOutputUtilizationEvaluator",
137138
"_ToolCallSuccessEvaluator",
139+
"_ToolInputAccuracyEvaluator",
138140
"AzureOpenAIGrader",
139141
"AzureOpenAILabelGrader",
140142
"AzureOpenAIStringCheckGrader",

0 commit comments

Comments
 (0)