Skip to content

Commit ffd4b61

Browse files
Copilotm7md7sienCopilot
authored
Fix TaskNavigationEfficiencyEvaluator threshold defaulting to 3.0 for binary metric (#46542)
* Initial plan * Fix TaskNavigationEfficiencyEvaluator threshold: use 1.0 instead of default 3.0 Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/e376f26a-4cd6-44a9-b271-81eb2b6e86d9 Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> * Update sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Use self._threshold in return dict and add result/threshold test assertions Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/28756174-3e26-4ea2-849c-9d5c0a28d6c3 Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> Co-authored-by: Mohamed Hessien <mohessie@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 472cc19 commit ffd4b61

2 files changed

Lines changed: 4 additions & 2 deletions

File tree

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ def __init__(
137137
error_target=ErrorTarget.TASK_NAVIGATION_EFFICIENCY_EVALUATOR
138138
)
139139

140-
super().__init__()
140+
super().__init__(threshold=1.0)
141141

142142
@override
143143
async def _real_call(self, **kwargs):
@@ -345,7 +345,7 @@ async def _do_eval(self, eval_input: Dict) -> Dict[str, Union[float, str, Dict[s
345345
"task_navigation_efficiency_passed": match_result,
346346
"task_navigation_efficiency_reason": None,
347347
"task_navigation_efficiency_status": "completed",
348-
"task_navigation_efficiency_threshold": None,
348+
"task_navigation_efficiency_threshold": float(self._threshold),
349349
"task_navigation_efficiency_properties": additional_properties_metrics,
350350
}
351351
else:

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_task_navigation_efficiency_evaluators.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ def test_exact_match_scenario(self):
3030

3131
result = evaluator(response=response, ground_truth=ground_truth)
3232
assert result["task_navigation_efficiency_passed"] is True
33+
assert result["task_navigation_efficiency_result"] == "pass"
34+
assert result["task_navigation_efficiency_threshold"] == 1.0
3335
assert "task_navigation_efficiency_properties" in result
3436
assert result["task_navigation_efficiency_properties"]["precision_score"] == 1.0
3537
assert result["task_navigation_efficiency_properties"]["recall_score"] == 1.0

0 commit comments

Comments
 (0)