You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: assets/evaluators/builtin/tool_call_accuracy/spec.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
type: "evaluator"
2
2
name: "builtin.tool_call_accuracy"
3
-
version: 7
3
+
version: 8
4
4
displayName: "Tool-Call-Accuracy-Evaluator"
5
5
description: "Measures whether the agent selects the correct tool calls, applies the correct parameters, and tracks inefficient or missing too calls, in order to resolve a user's request. This is an umbrella evaluators that assessing overall tool call quality. Use this metric in agent-based systems, and AI assistants that rely on tool integration."
Copy file name to clipboardExpand all lines: assets/evaluators/builtin/tool_input_accuracy/spec.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
type: "evaluator"
2
2
name: "builtin.tool_input_accuracy"
3
-
version: 9
3
+
version: 10
4
4
displayName: "Tool-Input-Accuracy-Evaluator"
5
5
description: "A binary evaluator (0 or 1) that checks whether all parameters in an agent’s tool call are correct, validating grounding, type, format, completeness, and contextual appropriateness using LLM-based analysis. Use it to verify agent tool usage, API integration tests, or to ensure tool call parameters are fully correct in AI workflows."
Copy file name to clipboardExpand all lines: assets/evaluators/builtin/tool_selection/spec.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
type: "evaluator"
2
2
name: "builtin.tool_selection"
3
-
version: 7
3
+
version: 8
4
4
displayName: "Tool-Selection-Evaluator"
5
5
description: "Evaluates whether an AI agent selected the most appropriate and efficient tools for a given task, avoiding redundancy or missing essentials. Use it to assess tool choice quality in agent-based systems, orchestration platforms, and AI assistants that must pick the right tools from available options."
0 commit comments