Skip to content

Commit 2c37dca

Browse files
Updated version number for evaluators (#5099)
1 parent eccf238 commit 2c37dca

14 files changed

Lines changed: 14 additions & 14 deletions

File tree

assets/evaluators/builtin/customer_satisfaction/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.customer_satisfaction"
3-
version: 11
3+
version: 12
44
displayName: "Customer-Satisfaction-Evaluator"
55
description: "Evaluates the predicted customer satisfaction level of an AI agent interaction on a 1-5 Likert scale. This evaluator assesses whether the agent's response would likely result in a satisfied customer based on helpfulness, completeness, tone, and resolution of the user's needs. Useful for measuring customer support quality, chatbot effectiveness, and overall user experience."
66
evaluatorType: "builtin"

assets/evaluators/builtin/fluency/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.fluency"
3-
version: 7
3+
version: 8
44
displayName: "Fluency-Evaluator"
55
description: "Evaluates how natural and grammatically correct the response sounds. Higher scores indicate smoother and clearer language. It’s best used for generative business writing such as summarizing meeting notes, creating marketing materials, and drafting email."
66
evaluatorType: "builtin"

assets/evaluators/builtin/groundedness/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.groundedness"
3-
version: 13
3+
version: 14
44
displayName: "Groundedness-Evaluator"
55
description: "Assesses whether the response stays true to the given context in a retrieval-augmented generation scenario. It’s best used for retrieval-augmented generation (RAG) scenarios, including question and answering and summarization. Use the groundedness metric when you need to verify that ai-generated responses align with and are validated by the provided context."
66
evaluatorType: "builtin"

assets/evaluators/builtin/intent_resolution/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.intent_resolution"
3-
version: 6
3+
version: 7
44
displayName: "Intent-Resolution-Evaluator-(Preview)"
55
description: "Checks whether the model correctly interprets and resolves user intent. Ensures the response aligns with what the user asked. Use this metric in conversational AI assistants, and customer support bots where understanding user intent is essential."
66
evaluatorType: "builtin"

assets/evaluators/builtin/relevance/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.relevance"
3-
version: 9
3+
version: 10
44
displayName: "Relevance-Evaluator"
55
description: "Assesses how well the response matches the user’s intent or question. Higher scores mean better alignment with the prompt. It’s best used for generative business writing such as summarizing meeting notes, creating marketing materials, and drafting email."
66
evaluatorType: "builtin"

assets/evaluators/builtin/response_completeness/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.response_completeness"
3-
version: 7
3+
version: 8
44
displayName: "Response-Completeness-Evaluator-(Preview)"
55
description: "Assesses whether the response covers all key aspects of the question. Higher scores indicate more thorough and complete answers. This evaluator is useful when evaluating chatbots, virtual assistants, and QA systems where full and informative responses are critical."
66
evaluatorType: "builtin"

assets/evaluators/builtin/retrieval/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.retrieval"
3-
version: 9
3+
version: 10
44
displayName: "Retrieval-Evaluator"
55
description: "Measures how effectively the system retrieves relevant data or content. Higher scores mean better recall of useful information. It’s best used for the quality of search in information retrieval and retrieval augmented generation, when you don't have ground truth for chunk retrieval rankings. Use the retrieval score when you want to assess to what extent the context chunks retrieved are highly relevant and ranked at the top for answering your users' queries."
66
evaluatorType: "builtin"

assets/evaluators/builtin/similarity/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.similarity"
3-
version: 4
3+
version: 5
44
displayName: "Similarity-Evaluator"
55
description: "Measures how closely two pieces of text resemble each other in meaning. Higher scores indicate greater semantic similarity. It’s best used for NLP tasks with a user query. Use it when you want an objective evaluation of an AI model's performance, particularly in text generation tasks where you have access to ground truth responses."
66
evaluatorType: "builtin"

assets/evaluators/builtin/task_adherence/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.task_adherence"
3-
version: 12
3+
version: 13
44
displayName: "Task-Adherence-Evaluator-(Preview)"
55
description: "Evaluates whether the agent completed the task within the confines of the instructions given to the agentic system. Higher scores indicate better compliance with the instructions. This evaluator is useful when useful for end-to-end system-level task evaluation for agents. Example outputs include actions such as updating a database and textual responses such as writing a report."
66
evaluatorType: "builtin"

assets/evaluators/builtin/task_completion/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type: "evaluator"
22
name: "builtin.task_completion"
3-
version: 15
3+
version: 16
44
displayName: "Task-Completion-Evaluator-(Preview)"
55
description: "Evaluates whether an AI agent successfully completed the requested task end to end by analyzing the conversation history and agent response to determine if all task requirements were met, ignoring rule adherence or intent understanding. This evaluator is useful for assessing agent effectiveness in task-oriented scenarios, workflow automation, and goal-oriented AI interactions."
66
evaluatorType: "builtin"

0 commit comments

Comments
 (0)