From 181c592bf874db0aa7ebc8fb60bde200ebfa0505 Mon Sep 17 00:00:00 2001 From: Chang Liu Date: Fri, 4 Apr 2025 01:23:55 -0400 Subject: [PATCH 01/11] added samples for agent eval --- ...AI_Judge_Evaluator_Intent_Resolution.ipynb | 487 +++++++++++++++++ .../AI_Judge_Evaluator_Task_Adherence.ipynb | 395 ++++++++++++++ ...I_Judge_Evaluator_Tool_Call_Accuracy.ipynb | 495 ++++++++++++++++++ ...dge_Evaluators_Response_Completeness.ipynb | 240 +++++++++ .../Evaluate_Azure_AI_Agent_Quality.ipynb | 386 ++++++++++++++ .../Agent_Evaluation/README.md | 33 ++ .../Agent_Evaluation/agent_messages.jsonl | 10 + .../Agent_Evaluation/user_functions.py | 268 ++++++++++ 8 files changed, 2314 insertions(+) create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb new file mode 100644 index 00000000..5aaa7aae --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb @@ -0,0 +1,487 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Intent Resolution Evaluator\n", + "\n", + "## Objective\n", + "This sample demonstrates to how to use intent resolution evaluator on agent data. The supported input formats include:\n", + "- simple data such as strings;\n", + "- user-agent conversations in the form of list of agent messages. \n", + "\n", + "## Time\n", + "\n", + "You should expect to spend about 20 minutes running this notebook. \n", + "\n", + "## Before you begin\n", + "For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities. \n", + "\n", + "### Prerequisite\n", + "```bash\n", + "pip install azure-ai-projects azure-identity azure-ai-evaluation\n", + "```\n", + "Set these environment variables with your own values:\n", + "1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n", + "2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", + "3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", + "4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", + "5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n", + "6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n", + "7) **PROJECT_NAME** - Azure AI Project Name\n", + "8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Intent Resolution evaluator measures how well an agent has identified and resolved the user intent.\n", + "The scoring is on a 1-5 integer scale and is as follows:\n", + "\n", + " - Score 1: Response completely unrelated to user intent\n", + " - Score 2: Response minimally relates to user intent\n", + " - Score 3: Response partially addresses the user intent but lacks complete details\n", + " - Score 4: Response addresses the user intent with moderate accuracy but has minor inaccuracies or omissions\n", + " - Score 5: Response directly addresses the user intent and fully resolves it\n", + "\n", + "The evaluation requires the following inputs:\n", + "\n", + " - Query : The user query. Either a string with a user request or a list of messages with previous requests from the user and responses from the assistant, potentially including a system message.\n", + " - Response : The response to be evaluated. Either a string or a message with the response from the agent to the last user query.\n", + "\n", + "There is a third optional parameter:\n", + " - ToolDefinitions : The list of tool definitions the agent can call. This may be useful for the evaluator to better assess if the right tool was called to resolve a given intent." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize Intent Resolution Evaluator\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azure.ai.evaluation import AzureOpenAIModelConfiguration\n", + "from azure.identity import DefaultAzureCredential\n", + "from azure.ai.evaluation import IntentResolutionEvaluator\n", + "from pprint import pprint\n", + "\n", + "model_config = AzureOpenAIModelConfiguration(\n", + " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", + " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", + " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", + " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", + ")\n", + "\n", + "intent_resolution_evaluator = IntentResolutionEvaluator(model_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Samples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating query and response as string" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Success example. Intent is identified and understood and the response correctly resolves user intent\n", + "result = intent_resolution_evaluator(query=\"What are the opening hours of the Eiffel Tower?\",\n", + " response=\"Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.\",\n", + " )\n", + "pprint(result)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Failure example. Even though intent is correctly identified, the response does not resolve the user intent\n", + "result = intent_resolution_evaluator(query=\"What is the opening hours of the Eiffel Tower?\",\n", + " response=\"Please check the official website for the up-to-date information on Eiffel Tower opening hours.\",\n", + " )\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating query and response as list of messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"You are a friendly and helpful customer service agent.\"\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:20Z\",\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?\"\n", + " }\n", + " ]\n", + " }\n", + "]\n", + "\n", + "response = [\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:30Z\",\n", + " \"run_id\": \"0\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"Hello! Let me quickly look up your account details.\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:35Z\",\n", + " \"run_id\": \"0\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"tool_call_20250310_001\",\n", + " \"name\": \"get_orders\",\n", + " \"arguments\": {\n", + " \"account_number\": \"888\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:40Z\",\n", + " \"run_id\": \"0\",\n", + " \"tool_call_id\": \"tool_call_20250310_001\",\n", + " \"role\": \"tool\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": \"[{ \\\"order_id\\\": \\\"123\\\" }, { \\\"order_id\\\": \\\"124\\\" }]\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:45Z\",\n", + " \"run_id\": \"0\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"Thanks for your patience. I see two orders on your account. Let me fetch the details for both.\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:50Z\",\n", + " \"run_id\": \"0\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"tool_call_20250310_002\",\n", + " \"name\": \"get_order\",\n", + " \"arguments\": {\n", + " \"order_id\": \"123\"\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"tool_call_20250310_003\",\n", + " \"name\": \"get_order\",\n", + " \"arguments\": {\n", + " \"order_id\": \"124\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:14:55Z\",\n", + " \"run_id\": \"0\",\n", + " \"tool_call_id\": \"tool_call_20250310_002\",\n", + " \"role\": \"tool\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": \"{ \\\"order\\\": { \\\"id\\\": \\\"123\\\", \\\"status\\\": \\\"shipped\\\", \\\"delivery_date\\\": \\\"2025-03-15\\\" } }\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:15:00Z\",\n", + " \"run_id\": \"0\",\n", + " \"tool_call_id\": \"tool_call_20250310_003\",\n", + " \"role\": \"tool\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": \"{ \\\"order\\\": { \\\"id\\\": \\\"124\\\", \\\"status\\\": \\\"delayed\\\", \\\"expected_delivery\\\": \\\"2025-03-20\\\" } }\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T06:15:05Z\",\n", + " \"run_id\": \"0\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?\"\n", + " }\n", + " ]\n", + " }\n", + "]\n", + "\n", + "#please note that the tool definitions are not strictly required, and that some of the tools below are not used in the example above and that is ok.\n", + "#if context length is a concern you can remove the unused tool definitions or even the tool definitions altogether as the impact to the intent resolution evaluation is usual minimal.\n", + "tool_definitions = [\n", + " {\n", + " \"name\": \"get_orders\",\n", + " \"description\": \"Get the list of orders for a given account number.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"account_number\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The account number to get the orders for.\"\n", + " }\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"get_order\",\n", + " \"description\": \"Get the details of a specific order.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"order_id\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The order ID to get the details for.\"\n", + " }\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"initiate_return\",\n", + " \"description\": \"Initiate the return process for an order.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"order_id\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The order ID for the return process.\"\n", + " }\n", + " }\n", + " }\n", + " },\n", + " {\n", + " \"name\": \"update_shipping_address\",\n", + " \"description\": \"Update the shipping address for a given account.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"account_number\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The account number to update.\"\n", + " },\n", + " \"new_address\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The new shipping address.\"\n", + " }\n", + " }\n", + " }\n", + " }\n", + "]\n", + "\n", + "result = intent_resolution_evaluator(query = query,\n", + " response = response,\n", + " tool_definitions = tool_definitions,\n", + " )\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Evaluating an agent conversation loaded from disk\n", + "\n", + "We will use convert to convert agent messages into required evaluation data. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from azure.ai.evaluation import AIAgentConverter\n", + "\n", + "def load_conversations(filename):\n", + " with open(filename, 'r') as file:\n", + " lines = file.readlines()\n", + " parsed_conversations = [json.loads(line) for line in lines]\n", + " print(f\"Loaded {len(parsed_conversations)} conversations from {filename}.\")\n", + " return parsed_conversations\n", + "\n", + "conversations_filename = r'agent_messages.jsonl'\n", + "\n", + "sample_conversations = load_conversations(conversations_filename)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluate a single conversation from the loaded conversations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "conversation = sample_conversations[0]\n", + "\n", + "run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", + "print(f\"Run IDs in conversation: {run_ids}\")\n", + "run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int\n", + "converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)\n", + "# Extract the query and response from the conversation\n", + "query = converted_conv['query']\n", + "response = converted_conv['response']\n", + "tool_definitions = converted_conv['tool_definitions']\n", + "\n", + "print(f\"Run ID: {run_id}\")\n", + "print(f\"Query: {query}\")\n", + "print(f\"Response: {response}\")\n", + "print(f\"Tool Definitions: {tool_definitions}\")\n", + "\n", + "result = intent_resolution_evaluator(query = query, response = response, tool_definitions = tool_definitions)\n", + "print(f\"Evaluation result\")\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Batch evaluate and visualize results on Azure AI Foundry\n", + "Batch evaluate to leverage asynchronous evaluation on a dataset. \n", + "\n", + "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Write the sample_conversations variable into a .jsonl file\n", + "file_name = \"evaluation_data.jsonl\"\n", + "\n", + "with open(file_name, 'w') as file:\n", + " for conversation in sample_conversations:\n", + " run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", + " for run_id in run_ids:\n", + " converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))\n", + " line = {\n", + " \"query\": converted_conv['query'],\n", + " \"response\": converted_conv['response'],\n", + " \"tool_definitions\": converted_conv['tool_definitions']\n", + " }\n", + " file.write(json.dumps(line) + '\\n')\n", + "\n", + "print(f\"Sample conversations have been written to {file_name}.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import evaluate\n", + "\n", + "\n", + "response = evaluate(\n", + " data=file_name,\n", + " evaluation_name=\"Intent Resolution Evaluation\",\n", + " evaluators={\n", + " \"intent_resolution\": intent_resolution_evaluator,\n", + " },\n", + " azure_ai_project={\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + " }\n", + ")\n", + "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb new file mode 100644 index 00000000..86c6b31e --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb @@ -0,0 +1,395 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Task Adherence Evaluator\n", + "\n", + "# Intent Resolution Evaluator\n", + "\n", + "## Objective\n", + "This sample demonstrates to how to use task adherence evaluator on agent data. The supported input formats include:\n", + "- simple data such as strings;\n", + "- user-agent conversations in the form of list of agent messages. \n", + "\n", + "## Time\n", + "\n", + "You should expect to spend about 10 minutes running this notebook. \n", + "\n", + "## Before you begin\n", + "For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities. \n", + "\n", + "### Prerequisite\n", + "```bash\n", + "pip install azure-ai-projects azure-identity azure-ai-evaluation\n", + "```\n", + "Set these environment variables with your own values:\n", + "1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n", + "2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", + "3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", + "4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", + "5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n", + "6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n", + "7) **PROJECT_NAME** - Azure AI Project Name\n", + "8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Getting Started\n", + "\n", + "This sample demonstrates how to use Task Adherence Evaluator\n", + "Before running the sample:\n", + "```bash\n", + "pip install azure-ai-projects azure-identity azure-ai-evaluation\n", + "```\n", + "Set these environment variables with your own values:\n", + "1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n", + "2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the AI model, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", + "3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", + "4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", + "5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n", + "6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n", + "7) **PROJECT_NAME** - Azure AI Project Name\n", + "8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Task Adherence evaluator measures how well the agent adheres to their assigned tasks or predefined goal.\n", + "\n", + "The scoring is on a 1-5 integer scale and is as follows:\n", + "\n", + " - Score 1: Fully Inadherent\n", + " - Score 2: Barely Adherent\n", + " - Score 3: Moderately Adherent\n", + " - Score 4: Mostly Adherent\n", + " - Score 5: Fully Adherent\n", + "\n", + "The evaluation requires the following inputs:\n", + "\n", + " - Query : The user query. Either a string with a user request or a list of messages with previous requests from the user and responses from the assistant, potentially including a system message.\n", + " - Response : The response to be evaluated. Either a string or a message with the response from the agent to the last user query.\n", + "\n", + "There is a third optional parameter:\n", + " - ToolDefinitions : The list of tool definitions the agent can call. This may be useful for the evaluator to better assess if the right tool was called to adhere to user intent." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize Task Adherence Evaluator\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azure.ai.evaluation import TaskAdherenceEvaluator, AzureOpenAIModelConfiguration\n", + "from pprint import pprint\n", + "\n", + "model_config = AzureOpenAIModelConfiguration(\n", + " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", + " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", + " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", + " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", + ")\n", + "task_adherence_evaluator = TaskAdherenceEvaluator(model_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Samples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating query and response as string" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Failure example, there's only a vague adherence to the task\n", + "result = task_adherence_evaluator(query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", + " response=\"Make sure to water your roses regularly and trim them occasionally.\",\n", + " )\n", + "pprint(result)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Success example, full adherence to the task\n", + "result = task_adherence_evaluator(query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", + " response=\"For optimal summer care of your rose garden, start by watering deeply early in the morning to ensure the roots are well-hydrated without encouraging fungal growth. Apply a 2-3 inch layer of organic mulch around the base of the plants to conserve moisture and regulate soil temperature. Fertilize with a balanced rose fertilizer every 4–6 weeks to support healthy growth. Prune away any dead or diseased wood to promote good air circulation, and inspect regularly for pests such as aphids or spider mites, treating them promptly with an appropriate organic insecticidal soap. Finally, ensure that your roses receive at least 6 hours of direct sunlight daily for robust flowering.\",\n", + " )\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating query and response as list of messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = [\n", + " {\n", + " \"role\": \"system\",\n", + " \"content\": \"You are an expert in literature and at provid can provide book recommendations.\"\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:00Z\",\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"I love historical fiction. Can you recommend a good book from that genre?\"\n", + " }\n", + " ]\n", + " }\n", + " ]\n", + "\n", + "response = [\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:05Z\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"Let me fetch a recommendation for historical fiction.\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:10Z\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"tool_call_20250314_001\",\n", + " \"name\": \"get_book\",\n", + " \"arguments\": {\n", + " \"genre\": \"historical fiction\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:15Z\",\n", + " \"role\": \"tool\",\n", + " \"tool_call_id\": \"tool_call_20250314_001\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": \"{ \\\"book\\\": { \\\"title\\\": \\\"The Pillars of the Earth\\\", \\\"author\\\": \\\"Ken Follett\\\", \\\"summary\\\": \\\"A captivating tale set in medieval England that weaves historical events with personal drama.\\\" } }\"\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:20Z\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"Based on our records, I recommend 'The Pillars of the Earth' by Ken Follett. This novel is an excellent example of historical fiction with a rich narrative and well-developed characters. Would you like more details or another suggestion?\"\n", + " }\n", + " ]\n", + " }\n", + " ]\n", + "\n", + "tool_definitions = [\n", + " {\n", + " \"name\": \"get_book\",\n", + " \"description\": \"Retrieve a book recommendation for a specified genre.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"genre\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The genre for which a book recommendation is requested.\"\n", + " }\n", + " }\n", + " }\n", + " }\n", + " ]\n", + "\n", + "result = task_adherence_evaluator(query=query,\n", + " response=response,\n", + " tool_definitions=tool_definitions,\n", + " )\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Evaluating an agent conversation loaded from disk\n", + "\n", + "We will use convert to convert agent messages into required evaluation data. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from azure.ai.evaluation import AIAgentConverter\n", + "\n", + "def load_conversations(filename):\n", + " with open(filename, 'r') as file:\n", + " lines = file.readlines()\n", + " parsed_conversations = [json.loads(line) for line in lines]\n", + " print(f\"Loaded {len(parsed_conversations)} conversations from {filename}.\")\n", + " return parsed_conversations\n", + "\n", + "conversations_filename = r'agent_messages.jsonl'\n", + "\n", + "sample_conversations = load_conversations(conversations_filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluate a single conversation from the loaded conversations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "conversation = sample_conversations[0]\n", + "\n", + "run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", + "print(f\"Run IDs in conversation: {run_ids}\")\n", + "run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int\n", + "converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)\n", + "# Extract the query and response from the conversation\n", + "query = converted_conv['query']\n", + "response = converted_conv['response']\n", + "tool_definitions = converted_conv['tool_definitions']\n", + "\n", + "print(f\"Run ID: {run_id}\")\n", + "print(f\"Query: {query}\")\n", + "print(f\"Response: {response}\")\n", + "print(f\"Tool Definitions: {tool_definitions}\")\n", + "\n", + "result = task_adherence_evaluator(query = query, response = response, tool_definitions = tool_definitions)\n", + "print(f\"Evaluation result\")\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Batch evaluate and visualize results on Azure AI Foundry\n", + "Batch evaluate to leverage asynchronous evaluation on a dataset. \n", + "\n", + "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Write the sample_conversations variable into a .jsonl file\n", + "file_name = \"evaluation_data.jsonl\"\n", + "\n", + "with open(file_name, 'w') as file:\n", + " for conversation in sample_conversations:\n", + " run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", + " for run_id in run_ids:\n", + " converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))\n", + " line = {\n", + " \"query\": converted_conv['query'],\n", + " \"response\": converted_conv['response'],\n", + " \"tool_definitions\": converted_conv['tool_definitions']\n", + " }\n", + " file.write(json.dumps(line) + '\\n')\n", + "\n", + "print(f\"Sample conversations have been written to {file_name}.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import evaluate\n", + "\n", + "\n", + "response = evaluate(\n", + " data=file_name,\n", + " evaluation_name=\"Task Adherence Evaluation\",\n", + " evaluators={\n", + " \"task_adherence\": task_adherence_evaluator,\n", + " },\n", + " azure_ai_project={\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + " }\n", + ")\n", + "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "test_agent_evaluator_prp", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb new file mode 100644 index 00000000..d9e16ab8 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb @@ -0,0 +1,495 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tool Call Accuracy Evaluator\n", + "\n", + "## Objective\n", + "This sample demonstrates to how to use tool call accuracy evaluator on agent data. The supported input formats include:\n", + "- simple data such as strings and `dict` describing tool calls;\n", + "- user-agent conversations in the form of list of agent messages. \n", + "\n", + "## Time\n", + "\n", + "You should expect to spend about 20 minutes running this notebook. \n", + "\n", + "## Before you begin\n", + "For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities. \n", + "\n", + "### Prerequisite\n", + "```bash\n", + "pip install azure-ai-projects azure-identity azure-ai-evaluation\n", + "```\n", + "Set these environment variables with your own values:\n", + "1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n", + "2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", + "3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", + "4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", + "5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n", + "6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n", + "7) **PROJECT_NAME** - Azure AI Project Name\n", + "8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Tool Call Accuracy evaluator assesses how accurately an AI uses tools by examining:\n", + "- Relevance to the conversation\n", + "- Parameter correctness according to tool definitions\n", + "- Parameter value extraction from the conversation\n", + "- Potential usefulness of the tool call\n", + "\n", + "The evaluator uses a binary scoring (0 or 1) for each tool call:\n", + "\n", + " - Score 0: The tool call is irrelevant or contains information not in the conversation/definition\n", + " - Score 1: The tool call is relevant with properly extracted parameters from the conversation\n", + "\n", + "If there are multiple call, the final score will be an **average** of individual tool calls, which can be interpreted as the **passing rate** of tool calls.\n", + "\n", + "This evaluation focuses on measuring whether tool calls meaningfully contribute to addressing query while properly following tool definitions and using information present in the conversation history." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Tool Call Accuracy requires following input:\n", + "- Query - This can be a single query or a list of messages(conversation history with agent). Latter helps to determine if Agent used the information in history to make right tool calls.\n", + "- Tool Calls - Tool Call(s) made by Agent to answer the query. Optional - if response has tool calls, if not provided evaluator will look for tool calls in response.\n", + "- Response - (Optional) Response from Agent (or any GenAI App). This can be a single text response or a list or messages generated as part of Agent Response. If tool calls are not provide Tool Call Accuracy Evaluator will look at response for tool calls.\n", + "- Tool Definitions - Tool(s) definition used by Agent to answer the query. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize Tool Call Accuracy Evaluator\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration\n", + "from pprint import pprint\n", + "\n", + "model_config = AzureOpenAIModelConfiguration(\n", + " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", + " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", + " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", + " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", + ")\n", + "\n", + "\n", + "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Samples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating Single Tool Call" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"How is the weather in Seattle ?\"\n", + "tool_call = {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"Seattle\"\n", + " }\n", + " }\n", + "\n", + "tool_definition = {\n", + " \"id\": \"fetch_weather\",\n", + " \"name\": \"fetch_weather\",\n", + " \"description\": \"Fetches the weather information for the specified location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The location to fetch weather for.\"\n", + " }\n", + " }\n", + " }\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = tool_call_accuracy(query=query, tool_calls=tool_call, tool_definitions=tool_definition)\n", + "pprint(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Multiple Tool Calls used by Agent to respond" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"How is the weather in Seattle ?\"\n", + "tool_calls = [{\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"Seattle\"\n", + " }\n", + " },\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"London\"\n", + " }\n", + " }]\n", + "\n", + "tool_definition = {\n", + " \"id\": \"fetch_weather\",\n", + " \"name\": \"fetch_weather\",\n", + " \"description\": \"Fetches the weather information for the specified location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The location to fetch weather for.\"\n", + " }\n", + " }\n", + " }\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = tool_call_accuracy(query=query, tool_calls=tool_calls, tool_definitions=tool_definition)\n", + "pprint(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Tool Calls passed as part of `Response` (common for agent case)\n", + "- Tool Call Accuracy Evaluator extracts tool calls from response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"Can you send me an email with weather information for Seattle?\"\n", + "response = [\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:35Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\n", + " \"location\": \"Seattle\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:37Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"role\": \"tool\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": {\n", + " \"weather\": \"Rainy, 14\\u00b0C\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:38Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_iq9RuPxqzykebvACgX8pqRW2\",\n", + " \"name\": \"send_email\",\n", + " \"arguments\": {\n", + " \"recipient\": \"your_email@example.com\",\n", + " \"subject\": \"Weather Information for Seattle\",\n", + " \"body\": \"The current weather in Seattle is rainy with a temperature of 14\\u00b0C.\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:41Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"tool_call_id\": \"call_iq9RuPxqzykebvACgX8pqRW2\",\n", + " \"role\": \"tool\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": {\n", + " \"message\": \"Email successfully sent to your_email@example.com.\"\n", + " }\n", + " }\n", + " ]\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:42Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"I have successfully sent you an email with the weather information for Seattle. The current weather is rainy with a temperature of 14\\u00b0C.\"\n", + " }\n", + " ]\n", + " }\n", + " ]\n", + "\n", + "tool_definitions = [\n", + " {\n", + "\t\t\"name\": \"fetch_weather\",\n", + "\t\t\"description\": \"Fetches the weather information for the specified location.\",\n", + "\t\t\"parameters\": {\n", + "\t\t\t\"type\": \"object\",\n", + "\t\t\t\"properties\": {\n", + "\t\t\t\t\"location\": {\n", + "\t\t\t\t\t\"type\": \"string\",\n", + "\t\t\t\t\t\"description\": \"The location to fetch weather for.\"\n", + "\t\t\t\t}\n", + "\t\t\t}\n", + "\t\t}\n", + "\t},\n", + " {\n", + "\t\t\"name\": \"send_email\",\n", + "\t\t\"description\": \"Sends an email with the specified subject and body to the recipient.\",\n", + "\t\t\"parameters\": {\n", + "\t\t\t\"type\": \"object\",\n", + "\t\t\t\"properties\": {\n", + "\t\t\t\t\"recipient\": {\n", + "\t\t\t\t\t\"type\": \"string\",\n", + "\t\t\t\t\t\"description\": \"Email address of the recipient.\"\n", + "\t\t\t\t},\n", + "\t\t\t\t\"subject\": {\n", + "\t\t\t\t\t\"type\": \"string\",\n", + "\t\t\t\t\t\"description\": \"Subject of the email.\"\n", + "\t\t\t\t},\n", + "\t\t\t\t\"body\": {\n", + "\t\t\t\t\t\"type\": \"string\",\n", + "\t\t\t\t\t\"description\": \"Body content of the email.\"\n", + "\t\t\t\t}\n", + "\t\t\t}\n", + "\t\t}\n", + "\t}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = tool_call_accuracy(query=query, response=response, tool_definitions=tool_definitions)\n", + "pprint(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Evaluating an agent conversation loaded from disk\n", + "\n", + "We will use convert to convert agent messages into required evaluation data. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from azure.ai.evaluation import AIAgentConverter\n", + "\n", + "def load_conversations(filename):\n", + " with open(filename, 'r') as file:\n", + " lines = file.readlines()\n", + " parsed_conversations = [json.loads(line) for line in lines]\n", + " print(f\"Loaded {len(parsed_conversations)} conversations from {filename}.\")\n", + " return parsed_conversations\n", + "\n", + "conversations_filename = r'agent_messages.jsonl'\n", + "\n", + "sample_conversations = load_conversations(conversations_filename)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluate a single conversation from the loaded conversations" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "conversation = sample_conversations[0]\n", + "\n", + "run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", + "print(f\"Run IDs in conversation: {run_ids}\")\n", + "run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int\n", + "converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)\n", + "# Extract the query and response from the conversation\n", + "query = converted_conv['query']\n", + "response = converted_conv['response']\n", + "tool_definitions = converted_conv['tool_definitions']\n", + "\n", + "print(f\"Run ID: {run_id}\")\n", + "print(f\"Query: {query}\")\n", + "print(f\"Response: {response}\")\n", + "print(f\"Tool Definitions: {tool_definitions}\")\n", + "\n", + "result = tool_call_accuracy(query = query, response = response, tool_definitions = tool_definitions)\n", + "print(f\"Evaluation result\")\n", + "pprint(result)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Batch evaluate and visualize results on Azure AI Foundry\n", + "Batch evaluate to leverage asynchronous evaluation on a dataset. \n", + "\n", + "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Write the sample_conversations variable into a .jsonl file\n", + "file_name = \"evaluation_data.jsonl\"\n", + "\n", + "with open(file_name, 'w') as file:\n", + " for conversation in sample_conversations:\n", + " run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", + " for run_id in run_ids:\n", + " converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))\n", + " line = {\n", + " \"query\": converted_conv['query'],\n", + " \"response\": converted_conv['response'],\n", + " \"tool_definitions\": converted_conv['tool_definitions']\n", + " }\n", + " file.write(json.dumps(line) + '\\n')\n", + "\n", + "print(f\"Sample conversations have been written to {file_name}.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import evaluate\n", + "\n", + "\n", + "response = evaluate(\n", + " data=file_name,\n", + " evaluation_name=\"Tool Call Accuracy Evaluation\",\n", + " evaluators={\n", + " \"intent_resolution\": tool_call_accuracy,\n", + " },\n", + " azure_ai_project={\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + " }\n", + ")\n", + "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "test_agent_evaluator_prp", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb new file mode 100644 index 00000000..a4377ea2 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb @@ -0,0 +1,240 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Response Completeness Evaluator" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Getting Started\n", + "\n", + "This sample demonstrates how to use Response Completeness Evaluator on agent's response when ground truth is provided. This evaluator is helpful when you have ground truth to assess the quality of the agent's final response. \n", + "\n", + "## Time\n", + "\n", + "You should expect to spend about 20 minutes running this notebook. \n", + "\n", + "## Before you begin\n", + "For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities. \n", + "\n", + "### Prerequisite\n", + "```bash\n", + "pip install azure-ai-projects azure-identity azure-ai-evaluation\n", + "```\n", + "Set these environment variables with your own values:\n", + "1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n", + "2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", + "3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", + "4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", + "5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n", + "6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n", + "7) **PROJECT_NAME** - Azure AI Project Name\n", + "8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The Response Completeness evaluator assesses the quality of an agent response by examining how well it aligns with the provided ground truth. The evaluation is based on the following scoring system:\n", + "\n", + "
\n",
+    "Score 1: Fully incomplete: The response misses all necessary and relevant information compared to the ground truth.\n",
+    "Score 2: Barely complete: The response contains only a small percentage of the necessary information.\n",
+    "Score 3: Moderately complete: The response includes about half of the necessary information.\n",
+    "Score 4: Mostly complete: The response contains most of the necessary information, with only minor omissions.\n",
+    "Score 5: Fully complete: The response perfectly matches all necessary and relevant information from the ground truth.\n",
+    "
\n", + "\n", + "The evaluation requires the following inputs:\n", + "\n", + "- Response: The response to be evaluated. (string)\n", + "- Ground Truth: The correct and complete information against which the response is compared. (string)\n", + "\n", + "The evaluator uses these inputs to determine the completeness score, ensuring that the response meaningfully addresses the query while adhering to the provided definitions and data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize Completeness Evaluator\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import ResponseCompletenessEvaluator , AzureOpenAIModelConfiguration\n", + "from pprint import pprint\n", + "import os\n", + "\n", + "model_config = AzureOpenAIModelConfiguration(\n", + " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", + " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", + " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", + " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import ResponseCompletenessEvaluator , AzureOpenAIModelConfiguration\n", + "from pprint import pprint\n", + "\n", + "response_completeness_evaluator = ResponseCompletenessEvaluator(model_config=model_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Samples" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating for a ground_truth and response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# agent response is complete \n", + "result = response_completeness_evaluator(\n", + " response=\"Itinery: Day 1 check out the downtown district of the city on train; for Day 2, we can rest in hotel.\",\n", + " ground_truth=\"Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel.\"\n", + ")\n", + "result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# agent response is incomplete\n", + "result = response_completeness_evaluator(\n", + " response=\"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025.\",\n", + " ground_truth=\"The order with ID 124 is delayed and should now arrive by March 20, 2025.\"\n", + ")\n", + "result" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare ground truth for agent response\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "data = [\n", + " {\n", + " \"response\": \"The temperature of Seattle now is 70 degrees. Based on the temperature, having an outdoor office party is recommended.\",\n", + " \"ground_truth\": \"The temperature of Seattle now is 50 degrees. It will be recommended to bring a jacket in the evening.\",\n", + " },\n", + " {\n", + " \"response\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback.\",\n", + " \"ground_truth\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback by EOD.\",\n", + " },\n", + " {\n", + " \"response\": \"Based on the retrieved documents, the shareholder meeting discussed the operational efficiency of the company and financing options.\",\n", + " \"ground_truth\": \"The shareholder meeting discussed the compensation package of the company CEO.\",\n", + " },\n", + " {\n", + " \"response\": \"The calendar API returns an error code 500. Please check the server logs for more details.\",\n", + " \"ground_truth\": \"The meeting is scheduled for 2 PM tomorrow. Please confirm your availability by EOD.\",\n", + " }\n", + "]\n", + "\n", + "file_path = \"response_completeness_data.jsonl\"\n", + "\n", + "with open(file_path, 'w') as file:\n", + " for line in data:\n", + " file.write(json.dumps(line) + '\\n')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Batch evaluate and visualize results on Azure AI Foundry\n", + "Batch evaluate to leverage asynchronous evaluation on a dataset. \n", + "\n", + "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "from azure.ai.evaluation import evaluate\n", + "\n", + "azure_ai_project={\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + "}\n", + "\n", + "response = evaluate(\n", + " data=file_path,\n", + " evaluators={\n", + " \"response_completeness\": response_completeness_evaluator,\n", + " },\n", + " azure_ai_project=azure_ai_project,\n", + ")\n", + "\n", + "pprint(f'AI Foundry URL: {response.get(\"studio_url\")}')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "test_agent_evaluator_prp", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb new file mode 100644 index 00000000..adeb4c50 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -0,0 +1,386 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Evaluate AI agents (Azure AI Agent Service) in Azure AI Foundry" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Objective\n", + "\n", + "\n", + "This sample demonstrates how to evaluate an AI agent (Azure AI Agent Service) on these important aspects of your agentic workflow:\n", + "\n", + "- Intent Resolution: Measures how well the agent identifies the user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities.\n", + "- Tool Call Accuracy: Evaluates the agent's ability to select the appropriate tools, and process correct parameters from previous steps.\n", + "- Task Adherence: Measures how well the agent’s response adheres to its assigned tasks, according to its system message and prior steps.\n", + "\n", + "For AI agents outside of Azure AI Agent Service, you can still provide th agent data in the two formats (either simple data or agent messages) specified in the individual evaluator samples:\n", + "- [Intent resolution](https://aka.ms/intentresolution-sample)\n", + "- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample)\n", + "- [Task adherence](https://aka.ms/taskadherence-sample)\n", + "- [Response Completeness](https://aka.ms/rescompleteness-sample)\n", + "\n", + "\n", + "\n", + "## Time \n", + "\n", + "You should expect to spend about 20 minutes running this notebook. \n", + "\n", + "## Before you begin\n", + "Creating an agent using Azure AI agent service requires an Azure AI Foundry project and a deployed, supported model. See more details in [Create a new agent](https://learn.microsoft.com/azure/ai-services/agents/quickstart?pivots=ai-foundry-portal).\n", + "\n", + "For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities. \n", + "\n", + "Important: Make sure to authenticate to Azure using `az login` in your terminal before running this notebook.\n", + "\n", + "### Prerequisite\n", + "\n", + "Before running the sample:\n", + "```bash\n", + "pip install azure-ai-projects azure-identity azure-ai-evaluation\n", + "```\n", + "Set these environment variables with your own values:\n", + "1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n", + "2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for AI-assisted evaluators, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n", + "3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n", + "4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n", + "5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n", + "6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n", + "7) **PROJECT_NAME** - Azure AI Project Name\n", + "8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name\n", + "9) **AGENT_MODEL_DEPLOYMENT_NAME** - The deployment name of the model for your Azure AI agent, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initializing Project Client" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os, json\n", + "from azure.ai.projects import AIProjectClient\n", + "from azure.identity import DefaultAzureCredential\n", + "from azure.ai.projects.models import FunctionTool, ToolSet\n", + "\n", + "# Import your custom functions to be used as Tools for the Agent\n", + "from user_functions import user_functions\n", + "\n", + "project_client = AIProjectClient.from_connection_string(\n", + " credential=DefaultAzureCredential(),\n", + " conn_str=os.environ[\"PROJECT_CONNECTION_STRING\"],\n", + ")\n", + "\n", + "AGENT_NAME = \"Seattle Tourist Assistant\"\n", + "\n", + "# Add Tools to be used by Agent \n", + "functions = FunctionTool(user_functions)\n", + "\n", + "toolset = ToolSet()\n", + "toolset.add(functions)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an AI agent (Azure AI Agent Service)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "agent = project_client.agents.create_agent(\n", + " model=os.environ[\"AGENT_MODEL_DEPLOYMENT_NAME\"],\n", + " name=AGENT_NAME,\n", + " instructions=\"You are a helpful assistant\",\n", + " toolset=toolset,\n", + ")\n", + "\n", + "print(f\"Created agent, ID: {agent.id}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Thread" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "thread = project_client.agents.create_thread()\n", + "print(f\"Created thread, ID: {thread.id}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conversation with Agent\n", + "Use below cells to have conversation with the agent\n", + "- `Create Message[1]`\n", + "- `Execute[2]`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Message[1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create message to thread\n", + "\n", + "MESSAGE = \"Can you send me an email with weather information for Seattle?\"\n", + "\n", + "message = project_client.agents.create_message(\n", + " thread_id=thread.id,\n", + " role=\"user\",\n", + " content=MESSAGE,\n", + ")\n", + "print(f\"Created message, ID: {message.id}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Execute[2]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=agent.id)\n", + "\n", + "print(f\"Run finished with status: {run.status}\")\n", + "\n", + "if run.status == \"failed\":\n", + " print(f\"Run failed: {run.last_error}\")\n", + "\n", + "print(f\"Run ID: {run.id}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### List Messages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for message in project_client.agents.list_messages(thread.id, order=\"asc\").data:\n", + " print(f\"Role: {message.role}\")\n", + " print(f\"Content: {message.content[0].text.value}\")\n", + " print(\"-\" * 40)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Evaluate" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get data from agent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from azure.ai.evaluation import AIAgentConverter\n", + "\n", + "# Initialize the converter that will be backed by the project.\n", + "converter = AIAgentConverter(project_client)\n", + "\n", + "thread_id = thread.id\n", + "run_id = run.id\n", + "\n", + "converted_data = converter.convert(thread_id, run_id)\n", + "print(json.dumps(converted_data, indent=4))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Save the agent thread data to a JSONL file\n", + "\n", + "file_name = \"agent_thread_data.jsonl\"\n", + "evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setting up evaluator\n", + "\n", + "We will select the following evaluators to assess the different aspects relevant for agent quality: \n", + "\n", + "- [Intent resolution](https://aka.ms/intentresolution-sample): measures the extent of which an agent identifies the correct intent from a user query. Scale: integer 1-5. Higher is better.\n", + "- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps. Scale: float 0-1. Higher is better.\n", + "- [Task adherence](https://aka.ms/taskadherence-sample): measures the extent of which an agent’s final response adheres to the task based on its system message and a user query. Scale: integer 1-5. Higher is better.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration, IntentResolutionEvaluator, TaskAdherenceEvaluator, ViolenceEvaluator\n", + "from pprint import pprint\n", + "\n", + "model_config = AzureOpenAIModelConfiguration(\n", + " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", + " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", + " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", + " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", + ")\n", + "# Needed to use content safety evaluators\n", + "azure_ai_project={\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + "}\n", + "\n", + "\n", + "# measures how well an agent identifies a user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities\n", + "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n", + "\n", + "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", + "\n", + "task_adherence = TaskAdherenceEvaluator(model_config=model_config)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# test on a single agent run \n", + "evaluation_result = intent_resolution(**converted_data)\n", + "print(f\"Results: {evaluation_result}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run Evaluator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azure.ai.evaluation import evaluate\n", + "\n", + "response = evaluate(\n", + " data=file_name,\n", + " evaluators={\n", + " \"tool_call_accuracy\": tool_call_accuracy,\n", + " \"intent_resolution\": intent_resolution,\n", + " \"task_adherence\": task_adherence,\n", + " },\n", + " azure_ai_project={\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + " }\n", + ")\n", + "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Inspect results on Azure AI Foundry\n", + "\n", + "Go to AI Foundry URL for rich Azure AI Foundry data visualization to inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# alternatively, you can use the following to get the evaluation results in memory\n", + "\n", + "# average scores across all runs\n", + "pprint(response['metrics'])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md new file mode 100644 index 00000000..080c92b1 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md @@ -0,0 +1,33 @@ +--- +page_type: sample +languages: +- python +products: +- ai-services +- azure-openai +description: Evaluate agents with Azure AI Evaluation SDK +--- + + +## Evaluate AI agents with Azure AI Evaluation SDK + +### Overview + +A general AI agent workflow typically contains a linear workflow of intent resolution, tool calling, and final response, at a minimum. We abstracted these evaluation aspects to enable observability user for users into an agent system. You can seamlessly evaluate AI agents using Azure AI Agent Service via converter support. You can also follow the input schema of each evaluator to use our evaluator, detailed in each notebook. We enable evaluation support for AI agents on these aspects: +- [Intent resolution](https://aka.ms/intentresolution-sample): measures the extent of which an agent identifies the correct intent from a user query. +- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps. +- [Task adherence](https://aka.ms/taskadherence-sample): measures the extent of which an agent’s final response adheres to the task based on its system message and a user query. +- [Response Completeness](https://aka.ms/rescompleteness-sample): measures the extent of which an agent or RAG response is complete (does not miss critical information) compared to the ground truth. +- [End-to-end Azure AI agent evaluation](https://aka.ms/e2e-agent-eval-sample): create an agent using Azure AI Agent Service and seamlessly evaluate its thread and run data, via converter support. + +### Objective + +This tutorial provides a step-by-step guide on how to evaluate AI agents using quality evaluators. By the end of this tutorial, you should be able to: + + - Learn about evaluators relevant for AI agents + - Evaluate Azure AI agents and other agents using these evaluators + +### Programming Languages + - Python + +### Estimated Runtime: 15 mins \ No newline at end of file diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl new file mode 100644 index 00000000..bc2ceb73 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl @@ -0,0 +1,10 @@ +{"messages": [{"role": "system", "createdAt": 1700000000, "content": "You are a friendly and helpful customer service agent."}, {"role": "user", "createdAt": 1700000060, "content": [{"type": "text", "text": "Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000070, "content": [{"type": "text", "text": "Hello! Let me quickly look up your account details."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000075, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_001", "type": "function", "function": {"name": "get_orders", "arguments": {"account_number": "888"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700000080, "tool_call_id": "tool_call_20250310_001", "content": [{"type": "tool_result", "tool_result": "[{ \"order_id\": \"123\" }, { \"order_id\": \"124\" }]"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000085, "content": [{"type": "text", "text": "Thanks for your patience. I see two orders on your account. Let me fetch the details for both."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_002", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "123"}}}}, {"type": "tool_call", "tool_call": {"id": "tool_call_20250310_003", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "124"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700000095, "tool_call_id": "tool_call_20250310_002", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"123\", \"status\": \"shipped\", \"delivery_date\": \"2025-03-15\" } }"}]}, {"run_id": 0, "role": "tool", "createdAt": 1700000100, "tool_call_id": "tool_call_20250310_003", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"124\", \"status\": \"delayed\", \"expected_delivery\": \"2025-03-20\" } }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000105, "content": [{"type": "text", "text": "The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?"}]}, {"role": "user", "createdAt": 1700000200, "content": [{"type": "text", "text": "Yes, I need help with returning an item from order 124."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700000205, "content": [{"type": "text", "text": "I can assist you with the return process. Let me check the details of order 124 for you."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700000210, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_004", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "124"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700000215, "tool_call_id": "tool_call_20250310_004", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"124\", \"status\": \"delayed\", \"return_eligible\": true } }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700000220, "content": [{"type": "text", "text": "Order 124 is eligible for a return. I will initiate the return process. Please confirm if you want to proceed with the return."}]}, {"role": "user", "createdAt": 1700000225, "content": [{"type": "text", "text": "Yes, please proceed."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700000230, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_005", "type": "function", "function": {"name": "initiate_return", "arguments": {"order_id": "124"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700000235, "tool_call_id": "tool_call_20250310_005", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"return initiated\", \"return_id\": \"R789\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700000240, "content": [{"type": "text", "text": "Your return for order 124 has been initiated. Your return ID is R789. Is there anything else I can help you with?"}]}, {"role": "user", "createdAt": 1700000300, "content": [{"type": "text", "text": "Actually, I want to update my shipping address on my account."}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700000305, "content": [{"type": "text", "text": "Let me update your shipping address. Can you please provide your new address?"}]}, {"role": "user", "createdAt": 1700000310, "content": [{"type": "text", "text": "It's 123 New Street, Lisbon, Portugal."}]}, {"run_id": 4, "role": "assistant", "createdAt": 1700000315, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_006", "type": "function", "function": {"name": "update_shipping_address", "arguments": {"account_number": "888", "new_address": "123 New Street, Lisbon, Portugal"}}}}]}, {"run_id": 4, "role": "tool", "createdAt": 1700000320, "tool_call_id": "tool_call_20250310_006", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"address updated\" }"}]}, {"run_id": 4, "role": "assistant", "createdAt": 1700000325, "content": [{"type": "text", "text": "Your shipping address has been updated successfully. Is there anything else I can help you with?"}]}, {"role": "user", "createdAt": 1700000330, "content": [{"type": "text", "text": "No, that's all. Thank you for your help!"}]}, {"run_id": 5, "role": "assistant", "createdAt": 1700000335, "content": [{"type": "text", "text": "You're welcome! Have a great day."}]}], "tools": [{"id": "tool_123", "name": "get_orders", "description": "Get the list of orders for a given account number.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to get the orders for."}}}}, {"id": "tool_124", "name": "get_order", "description": "Get the details of a specific order.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "The order ID to get the details for."}}}}, {"id": "tool_125", "name": "initiate_return", "description": "Initiate the return process for an order.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "The order ID for the return process."}}}}, {"id": "tool_126", "name": "update_shipping_address", "description": "Update the shipping address for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to update."}, "new_address": {"type": "string", "description": "The new shipping address."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700010000, "content": "You are a knowledgeable banking support agent with expertise in transaction inquiries and fund management."}, {"role": "user", "createdAt": 1700010050, "content": [{"type": "text", "text": "Hello, I need a detailed update on my account #777. Please provide my current balance and recent transaction history."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700010055, "content": [{"type": "text", "text": "Hi there! I\u2019m checking your account details right now. One moment please."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700010060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_032", "type": "function", "function": {"name": "get_balance", "arguments": {"account_number": "777"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700010065, "tool_call_id": "tool_call_20250310_032", "content": [{"type": "tool_result", "tool_result": "{ \"balance\": \"$5,678.90\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700010070, "content": [{"type": "text", "text": "Your current balance is $5,678.90. Would you also like to see your last 10 transactions?"}]}, {"role": "user", "createdAt": 1700010080, "content": [{"type": "text", "text": "Yes, please provide the details."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700010085, "content": [{"type": "text", "text": "Retrieving your last 10 transactions now..."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700010090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_033", "type": "function", "function": {"name": "get_transactions", "arguments": {"account_number": "777", "limit": 10}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700010095, "tool_call_id": "tool_call_20250310_033", "content": [{"type": "tool_result", "tool_result": "[ {\"date\": \"2025-03-09\", \"description\": \"Grocery Store\", \"amount\": \"-$123.45\"}, {\"date\": \"2025-03-08\", \"description\": \"Salary Credit\", \"amount\": \"+$3,000.00\"}, {\"date\": \"2025-03-07\", \"description\": \"Online Shopping\", \"amount\": \"-$250.00\"}, {\"date\": \"2025-03-06\", \"description\": \"Restaurant\", \"amount\": \"-$75.00\"}, {\"date\": \"2025-03-05\", \"description\": \"Utility Bill\", \"amount\": \"-$150.00\"}, {\"date\": \"2025-03-04\", \"description\": \"Transfer Received\", \"amount\": \"+$500.00\"}, {\"date\": \"2025-03-03\", \"description\": \"ATM Withdrawal\", \"amount\": \"-$200.00\"}, {\"date\": \"2025-03-02\", \"description\": \"Fuel\", \"amount\": \"-$60.00\"}, {\"date\": \"2025-03-01\", \"description\": \"Coffee Shop\", \"amount\": \"-$15.00\"}, {\"date\": \"2025-02-28\", \"description\": \"Book Store\", \"amount\": \"-$30.00\"} ]"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700010100, "content": [{"type": "text", "text": "Here are your last 10 transactions. Would you like to perform any further actions, such as transferring funds or reviewing a specific transaction?"}]}, {"role": "user", "createdAt": 1700010110, "content": [{"type": "text", "text": "Actually, I\u2019d like to transfer $200 to account #888."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700010115, "content": [{"type": "text", "text": "Initiating a transfer of $200 to account #888 now. Please hold on."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700010120, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_034", "type": "function", "function": {"name": "transfer_funds", "arguments": {"from_account": "777", "to_account": "888", "amount": "200"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700010125, "tool_call_id": "tool_call_20250310_034", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"transfer successful\", \"transaction_id\": \"TX987\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700010130, "content": [{"type": "text", "text": "The transfer was successful. Your transaction ID is TX987. Is there anything else I can assist you with regarding your account?"}]}], "tools": [{"id": "tool_127", "name": "get_balance", "description": "Retrieve the account balance for a given account number.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to check the balance for."}}}}, {"id": "tool_138", "name": "get_transactions", "description": "Retrieve recent transactions for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to retrieve transactions for."}, "limit": {"type": "integer", "description": "The number of transactions to retrieve."}}}}, {"id": "tool_139", "name": "transfer_funds", "description": "Transfer funds between accounts.", "parameters": {"type": "object", "properties": {"from_account": {"type": "string", "description": "The source account number."}, "to_account": {"type": "string", "description": "The destination account number."}, "amount": {"type": "string", "description": "The amount to transfer."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700020000, "content": "You are a technical support agent specializing in connectivity and device troubleshooting."}, {"role": "user", "createdAt": 1700020050, "content": [{"type": "text", "text": "My home WiFi has been dropping intermittently and my laptop won\u2019t reconnect automatically. What can I do?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700020055, "content": [{"type": "text", "text": "Hello, I understand how frustrating connectivity issues can be. Let\u2019s run through some troubleshooting steps. First, I\u2019ll check the status of your router remotely."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700020060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_040", "type": "function", "function": {"name": "check_router_status", "arguments": {"device_id": "router_002"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700020065, "tool_call_id": "tool_call_20250310_040", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"online\", \"firmware_version\": \"1.2.3\", \"signal\": \"moderate\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700020070, "content": [{"type": "text", "text": "Your router appears to be online, though the signal strength is moderate. I recommend rebooting your router. Would you like me to initiate a remote reboot?"}]}, {"role": "user", "createdAt": 1700020080, "content": [{"type": "text", "text": "Yes, please reboot it and let me know if that fixes the problem."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700020085, "content": [{"type": "text", "text": "Rebooting your router now."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700020090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_041", "type": "function", "function": {"name": "reboot_router", "arguments": {"device_id": "router_002"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700020095, "tool_call_id": "tool_call_20250310_041", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"rebooted\", \"reboot_time\": \"2025-03-10T02:00:00Z\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700020100, "content": [{"type": "text", "text": "Your router has been rebooted. Please try reconnecting your laptop. Also, I suggest updating your WiFi adapter drivers if the issue persists."}]}, {"role": "user", "createdAt": 1700020150, "content": [{"type": "text", "text": "The connection seems stable now, but my laptop still sometimes fails to reconnect automatically. Can you assist with that?"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700020155, "content": [{"type": "text", "text": "Certainly. I\u2019ll check if there are any firmware updates available for your laptop\u2019s WiFi adapter."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700020160, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_042", "type": "function", "function": {"name": "check_adapter_firmware", "arguments": {"device_id": "laptop_wifi_001"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700020165, "tool_call_id": "tool_call_20250310_042", "content": [{"type": "tool_result", "tool_result": "{ \"firmware_update\": \"available\", \"version\": \"2.0.1\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700020170, "content": [{"type": "text", "text": "A firmware update (version 2.0.1) is available for your WiFi adapter. I recommend installing it. Would you like instructions on how to update?"}]}, {"role": "user", "createdAt": 1700020180, "content": [{"type": "text", "text": "Yes, please provide detailed update instructions."}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700020185, "content": [{"type": "text", "text": "Sure. First, visit the manufacturer\u2019s support website, download the latest driver, and then run the installer as an administrator. After installation, restart your laptop. Please let me know if you need any more help."}]}], "tools": [{"id": "tool_140", "name": "check_router_status", "description": "Check the status of a router device.", "parameters": {"type": "object", "properties": {"device_id": {"type": "string", "description": "The ID of the router to check."}}}}, {"id": "tool_141", "name": "reboot_router", "description": "Reboot the specified router.", "parameters": {"type": "object", "properties": {"device_id": {"type": "string", "description": "The ID of the router to reboot."}}}}, {"id": "tool_142", "name": "check_adapter_firmware", "description": "Check if a firmware update is available for a WiFi adapter.", "parameters": {"type": "object", "properties": {"device_id": {"type": "string", "description": "The ID of the WiFi adapter."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700030000, "content": "You are a subscription support agent. Your role is to help customers with subscription cancellations, renewals, and upgrades."}, {"role": "user", "createdAt": 1700030050, "content": [{"type": "text", "text": "Hi, I want to cancel my premium magazine subscription. I've been thinking about switching to a digital-only option. Can you help me with the cancellation and also provide details on the digital plan?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700030055, "content": [{"type": "text", "text": "I\u2019m sorry to hear you\u2019re considering cancellation. I\u2019ll first process your cancellation request for the premium subscription and then provide you with details about our digital plan."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700030060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_050", "type": "function", "function": {"name": "cancel_subscription", "arguments": {"subscription_id": "PREMIUM789"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700030065, "tool_call_id": "tool_call_20250310_050", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"cancelled\", \"refund\": \"$10.00 credited\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700030070, "content": [{"type": "text", "text": "Your premium subscription has been cancelled and a refund of $10.00 has been credited. Now, regarding our digital plan, it offers full access to all online content at a reduced rate. Would you like me to send you more detailed information about the digital plan and how to subscribe?"}]}, {"role": "user", "createdAt": 1700030080, "content": [{"type": "text", "text": "Yes, please send me all the details including pricing and any special offers."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700030085, "content": [{"type": "text", "text": "Our digital plan costs $4.99 per month, and if you subscribe within the next 7 days, you'll get an additional 10% discount for the first three months. It includes access to digital issues, exclusive online articles, and interactive content. Would you like to subscribe to the digital plan now?"}]}, {"role": "user", "createdAt": 1700030090, "content": [{"type": "text", "text": "I\u2019m interested. Please sign me up for the digital plan."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700030095, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_051", "type": "function", "function": {"name": "subscribe_digital", "arguments": {"account_number": "555", "plan": "digital", "discount": "10%"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700030100, "tool_call_id": "tool_call_20250310_051", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"subscribed\", \"subscription_id\": \"DIGI123\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700030105, "content": [{"type": "text", "text": "You are now subscribed to the digital plan. Your subscription ID is DIGI123. Is there anything else I can help you with today?"}]}], "tools": [{"id": "tool_130", "name": "cancel_subscription", "description": "Cancel a subscription given its ID.", "parameters": {"type": "object", "properties": {"subscription_id": {"type": "string", "description": "The subscription ID to cancel."}}}}, {"id": "tool_131", "name": "subscribe_digital", "description": "Subscribe to a digital plan.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number of the customer."}, "plan": {"type": "string", "description": "The subscription plan to subscribe to."}, "discount": {"type": "string", "description": "Any discount applied to the subscription."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700040000, "content": "You are an appointment scheduling assistant for a car service center."}, {"role": "user", "createdAt": 1700040050, "content": [{"type": "text", "text": "Good morning, I need to book an appointment for my car service. My preferred date is 2025-04-05 at 09:30 AM, but I\u2019m flexible if that slot isn\u2019t available."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700040055, "content": [{"type": "text", "text": "Good morning! I\u2019ll check our available slots for that day and get back to you shortly. Please hold on a moment."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700040060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_060", "type": "function", "function": {"name": "check_availability", "arguments": {"service": "car_service", "date": "2025-04-05", "preferred_time": "09:30 AM"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700040065, "tool_call_id": "tool_call_20250310_060", "content": [{"type": "tool_result", "tool_result": "{ \"available_slots\": [\"09:30 AM\", \"10:00 AM\", \"10:30 AM\"] }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700040070, "content": [{"type": "text", "text": "We have several slots available on April 5th: 09:30 AM, 10:00 AM, and 10:30 AM. Which one would you prefer?"}]}, {"role": "user", "createdAt": 1700040080, "content": [{"type": "text", "text": "I\u2019d like the 10:00 AM slot, please."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700040085, "content": [{"type": "text", "text": "Great choice! I am now scheduling your appointment for car service on April 5, 2025 at 10:00 AM."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700040090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_061", "type": "function", "function": {"name": "schedule_appointment", "arguments": {"service": "car_service", "date": "2025-04-05", "time": "10:00 AM", "account_number": "777"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700040095, "tool_call_id": "tool_call_20250310_061", "content": [{"type": "tool_result", "tool_result": "{ \"appointment_id\": \"APT456\", \"status\": \"scheduled\", \"confirmation\": \"Your appointment is confirmed for 10:00 AM on 2025-04-05.\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700040100, "content": [{"type": "text", "text": "Your appointment is confirmed. Your appointment ID is APT456. Would you like me to send you a reminder email or SMS?"}]}, {"role": "user", "createdAt": 1700040110, "content": [{"type": "text", "text": "Yes, please send me an email reminder."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700040115, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_062", "type": "function", "function": {"name": "send_reminder", "arguments": {"appointment_id": "APT456", "method": "email", "email": "user@example.com"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700040120, "tool_call_id": "tool_call_20250310_062", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"reminder sent\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700040125, "content": [{"type": "text", "text": "Your email reminder has been sent. Is there anything else I can help you with regarding your appointment?"}]}], "tools": [{"id": "tool_131", "name": "check_availability", "description": "Check available appointment slots for a given service and date.", "parameters": {"type": "object", "properties": {"service": {"type": "string", "description": "The service type."}, "date": {"type": "string", "description": "The desired appointment date."}, "preferred_time": {"type": "string", "description": "The preferred appointment time."}}}}, {"id": "tool_132", "name": "schedule_appointment", "description": "Schedule an appointment for a specified service.", "parameters": {"type": "object", "properties": {"service": {"type": "string", "description": "The type of service."}, "date": {"type": "string", "description": "The date of the appointment."}, "time": {"type": "string", "description": "The time of the appointment."}, "account_number": {"type": "string", "description": "Customer account number."}}}}, {"id": "tool_133", "name": "send_reminder", "description": "Send a reminder for an appointment.", "parameters": {"type": "object", "properties": {"appointment_id": {"type": "string", "description": "The appointment ID."}, "method": {"type": "string", "description": "The reminder method (email or SMS)."}, "email": {"type": "string", "description": "Email address if method is email."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700050000, "content": "You are a travel support agent handling flight bookings and modifications."}, {"role": "user", "createdAt": 1700050050, "content": [{"type": "text", "text": "I booked flight AB123 a few days ago. Can you check its status and help me with a few modifications?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700050055, "content": [{"type": "text", "text": "Certainly, let me pull up your flight details for AB123."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700050060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_070", "type": "function", "function": {"name": "get_flight_status", "arguments": {"flight_number": "AB123"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700050065, "tool_call_id": "tool_call_20250310_070", "content": [{"type": "tool_result", "tool_result": "{ \"flight\": \"AB123\", \"status\": \"on time\", \"departure\": \"2025-04-01T14:00:00Z\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700050070, "content": [{"type": "text", "text": "Your flight AB123 is currently on time and is scheduled to depart at 2:00 PM UTC. What modifications would you like to make?"}]}, {"role": "user", "createdAt": 1700050080, "content": [{"type": "text", "text": "I want to change my seat to an aisle seat, and also add an extra baggage allowance."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050085, "content": [{"type": "text", "text": "Okay, I will update your seat preference to an aisle seat and request an extra baggage allowance. Please hold on."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_071", "type": "function", "function": {"name": "update_seat", "arguments": {"flight_number": "AB123", "seat": "aisle"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700050095, "tool_call_id": "tool_call_20250310_071", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"seat updated\", \"seat\": \"aisle\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050100, "content": [{"type": "text", "text": "Next, I will add the extra baggage allowance to your booking."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050105, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_072", "type": "function", "function": {"name": "add_baggage", "arguments": {"flight_number": "AB123", "baggage": "extra"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700050110, "tool_call_id": "tool_call_20250310_072", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"baggage updated\", \"allowance\": \"extra\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050115, "content": [{"type": "text", "text": "Your seat preference and extra baggage allowance have been updated. Is there anything else you\u2019d like to adjust for your flight booking?"}]}, {"role": "user", "createdAt": 1700050120, "content": [{"type": "text", "text": "No, that\u2019s all. Thank you for your help."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700050125, "content": [{"type": "text", "text": "You're welcome! Have a safe and pleasant journey."}]}], "tools": [{"id": "tool_140", "name": "get_flight_status", "description": "Retrieve the status of a flight.", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string", "description": "Flight number to check."}}}}, {"id": "tool_141", "name": "update_seat", "description": "Update the seat preference for a flight booking.", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string", "description": "Flight number."}, "seat": {"type": "string", "description": "Desired seat type."}}}}, {"id": "tool_142", "name": "add_baggage", "description": "Add extra baggage allowance to a flight booking.", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string", "description": "Flight number."}, "baggage": {"type": "string", "description": "Type of baggage update."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700060000, "content": "You are a product information specialist with detailed knowledge about electronics."}, {"role": "user", "createdAt": 1700060050, "content": [{"type": "text", "text": "I\u2019m considering buying the new X100 smartphone. Can you give me a detailed breakdown of its features, warranty, and how it compares to the previous model?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700060055, "content": [{"type": "text", "text": "Certainly! I will retrieve all the technical specifications, warranty details, and comparison data for the X100 smartphone."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700060060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_080", "type": "function", "function": {"name": "get_product_details", "arguments": {"product_id": "X100"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700060065, "tool_call_id": "tool_call_20250310_080", "content": [{"type": "tool_result", "tool_result": "{ \"product_id\": \"X100\", \"features\": [\"6.5-inch display\", \"108MP camera\", \"5G support\", \"5000mAh battery\", \"AI-enhanced photography\"], \"warranty\": \"2 years\", \"price\": \"$999\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700060070, "content": [{"type": "text", "text": "The X100 smartphone features a 6.5-inch display, a 108MP camera, 5G connectivity, and a robust 5000mAh battery. It comes with a 2-year warranty and is priced at $999. Compared to the previous model, the improvements include a higher resolution camera and faster processing speeds. Would you like additional details on any specific feature?"}]}, {"role": "user", "createdAt": 1700060080, "content": [{"type": "text", "text": "Yes, please tell me more about the camera improvements and any available promotions."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700060085, "content": [{"type": "text", "text": "The X100\u2019s 108MP camera uses advanced AI algorithms for better low-light performance and faster image processing. Additionally, there is a promotional offer providing a free wireless charger with purchase until the end of the month. Do you need assistance with placing an order or more technical specs?"}]}, {"role": "user", "createdAt": 1700060090, "content": [{"type": "text", "text": "That sounds great. I\u2019ll think about it. Can you email me the full spec sheet?"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700060095, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_082", "type": "function", "function": {"name": "send_product_spec", "arguments": {"product_id": "X100", "email": "user@example.com"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700060100, "tool_call_id": "tool_call_20250310_082", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"spec sheet sent\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700060105, "content": [{"type": "text", "text": "The full specification sheet has been sent to your email. Let me know if you have any further questions."}]}], "tools": [{"id": "tool_134", "name": "get_product_details", "description": "Retrieve detailed information for a given product.", "parameters": {"type": "object", "properties": {"product_id": {"type": "string", "description": "The ID of the product."}}}}, {"id": "tool_135", "name": "send_product_spec", "description": "Send the product specification sheet to the specified email.", "parameters": {"type": "object", "properties": {"product_id": {"type": "string", "description": "The ID of the product."}, "email": {"type": "string", "description": "The email address to send the spec sheet to."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700070000, "content": "You are a billing support agent with a focus on resolving disputes and processing refunds."}, {"role": "user", "createdAt": 1700070050, "content": [{"type": "text", "text": "I\u2019ve noticed several discrepancies on my bill for account #777. There\u2019s a $50 charge for a service fee and a $30 charge I don\u2019t recognize. Can you review my billing details and help me dispute these charges?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700070055, "content": [{"type": "text", "text": "I understand your concern. I\u2019ll review your billing details now and identify the charges in question."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700070060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_090", "type": "function", "function": {"name": "get_billing_details", "arguments": {"account_number": "777"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700070065, "tool_call_id": "tool_call_20250310_090", "content": [{"type": "tool_result", "tool_result": "{ \"billing\": { \"charges\": [{ \"item\": \"service fee\", \"amount\": \"$50\" }, { \"item\": \"mystery charge\", \"amount\": \"$30\" }, { \"item\": \"subscription\", \"amount\": \"$15\" }] } }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700070070, "content": [{"type": "text", "text": "I see the $50 service fee and the $30 mystery charge. I can initiate a dispute for these two charges. Would you like me to process refunds for both, or would you like further clarification first?"}]}, {"role": "user", "createdAt": 1700070080, "content": [{"type": "text", "text": "Please process refunds for both charges."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700070085, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_091", "type": "function", "function": {"name": "initiate_refund", "arguments": {"account_number": "777", "charges": ["$50", "$30"]}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700070090, "tool_call_id": "tool_call_20250310_091", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"refunds initiated\", \"refund_ids\": [\"RF555\", \"RF556\"] }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700070095, "content": [{"type": "text", "text": "Refunds have been initiated for the disputed charges. Your refund IDs are RF555 and RF556. Please allow 3-5 business days for processing. Is there anything else you need help with regarding your billing?"}]}], "tools": [{"id": "tool_136", "name": "get_billing_details", "description": "Retrieve billing details for a given account number.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "Account number to check billing details for."}}}}, {"id": "tool_137", "name": "initiate_refund", "description": "Initiate refunds for disputed charges.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "Account number associated with the charges."}, "charges": {"type": "array", "description": "List of charge amounts to refund."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700080000, "content": "You are a returns support agent specializing in faulty or damaged product returns."}, {"role": "user", "createdAt": 1700080050, "content": [{"type": "text", "text": "I received my order #555 last week, but the product (Widget Pro) is defective. It keeps malfunctioning and shows error codes. I\u2019d like to return it and get either a replacement or a refund."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700080055, "content": [{"type": "text", "text": "I\u2019m sorry to hear about the issue with your Widget Pro. Let me review your order details and check if it\u2019s eligible for return."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700080060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_100", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "555"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700080065, "tool_call_id": "tool_call_20250310_100", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"555\", \"status\": \"delivered\", \"item\": \"Widget Pro\", \"defective\": true, \"purchase_date\": \"2025-03-03\" } }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700080070, "content": [{"type": "text", "text": "I have verified that your Widget Pro is defective and eligible for return. Would you prefer a replacement or a refund?"}]}, {"role": "user", "createdAt": 1700080080, "content": [{"type": "text", "text": "I\u2019d like a refund, please. Also, please guide me through the return shipping process."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700080085, "content": [{"type": "text", "text": "Understood. I will initiate the refund process and send you the return shipping label via email."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700080090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_101", "type": "function", "function": {"name": "initiate_return", "arguments": {"order_id": "555", "return_type": "refund"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700080095, "tool_call_id": "tool_call_20250310_101", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"return initiated\", \"return_id\": \"R7890\", \"shipping_label_url\": \"http://example.com/label/R7890\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700080100, "content": [{"type": "text", "text": "Your return has been initiated and your refund will be processed once we receive the product. A shipping label has been sent to your email. Please print it and attach it to the return package. Do you need further assistance?"}]}, {"role": "user", "createdAt": 1700080110, "content": [{"type": "text", "text": "No, that covers everything. Thank you for your help."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700080115, "content": [{"type": "text", "text": "You're welcome! If you have any more issues in the future, feel free to reach out."}]}], "tools": [{"id": "tool_138", "name": "get_order", "description": "Retrieve order details for a given order ID.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "Order ID to retrieve details for."}}}}, {"id": "tool_139", "name": "initiate_return", "description": "Initiate the return process for an order, specifying the type of return.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "Order ID for the return process."}, "return_type": {"type": "string", "description": "Type of return (refund or replacement)."}}}}]} +{"messages": [{"role": "system", "createdAt": 1700090000, "content": "You are an account settings support agent with expertise in updating personal details and security configurations."}, {"role": "user", "createdAt": 1700090050, "content": [{"type": "text", "text": "Hi, I need to update my personal details including my phone number, email, and also change my password. I\u2019d like to ensure that my security questions are also updated."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700090055, "content": [{"type": "text", "text": "Certainly. Let\u2019s start by updating your contact details. Could you please provide your new phone number and email address?"}]}, {"role": "user", "createdAt": 1700090060, "content": [{"type": "text", "text": "My new phone number is 123-456-7890 and my new email is newemail@example.com."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700090065, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_110", "type": "function", "function": {"name": "update_contact_details", "arguments": {"account_number": "777", "new_details": {"phone": "123-456-7890", "email": "newemail@example.com"}}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700090070, "tool_call_id": "tool_call_20250310_110", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"contact details updated\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700090075, "content": [{"type": "text", "text": "Your contact details have been updated. Now, please provide your current password and the new password you wish to use."}]}, {"role": "user", "createdAt": 1700090080, "content": [{"type": "text", "text": "My current password is OldPass123 and I\u2019d like to change it to NewSecurePass456."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700090085, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_111", "type": "function", "function": {"name": "change_password", "arguments": {"account_number": "777", "current_password": "OldPass123", "new_password": "NewSecurePass456"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700090090, "tool_call_id": "tool_call_20250310_111", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"password changed\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700090095, "content": [{"type": "text", "text": "Lastly, let\u2019s update your security questions. Please provide a new question and answer."}]}, {"role": "user", "createdAt": 1700090100, "content": [{"type": "text", "text": "My new security question is: 'What is the name of your first pet?' and the answer is 'Fluffy'."}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700090105, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_112", "type": "function", "function": {"name": "update_security_question", "arguments": {"account_number": "777", "security_question": "What is the name of your first pet?", "security_answer": "Fluffy"}}}}]}, {"run_id": 3, "role": "tool", "createdAt": 1700090110, "tool_call_id": "tool_call_20250310_112", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"security question updated\" }"}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700090115, "content": [{"type": "text", "text": "Your account settings have been fully updated. If you need any more changes, feel free to ask."}]}, {"role": "user", "createdAt": 1700090120, "content": [{"type": "text", "text": "Thank you very much for your assistance."}]}, {"run_id": 4, "role": "assistant", "createdAt": 1700090125, "content": [{"type": "text", "text": "You're welcome! Have a great day."}]}], "tools": [{"id": "tool_140", "name": "update_contact_details", "description": "Update the contact details (phone, email) for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number."}, "new_details": {"type": "object", "description": "New contact details."}}}}, {"id": "tool_141", "name": "change_password", "description": "Change the password for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number."}, "current_password": {"type": "string", "description": "The current password."}, "new_password": {"type": "string", "description": "The new password."}}}}, {"id": "tool_142", "name": "update_security_question", "description": "Update the security question and answer for an account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number."}, "security_question": {"type": "string", "description": "The new security question."}, "security_answer": {"type": "string", "description": "The answer to the security question."}}}}]} \ No newline at end of file diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py new file mode 100644 index 00000000..2f720de6 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py @@ -0,0 +1,268 @@ +# pylint: disable=line-too-long,useless-suppression +# ------------------------------------ +# Copyright (c) Microsoft Corporation. +# Licensed under the MIT License. +# ------------------------------------ + +import json +import datetime +from typing import Any, Callable, Set, Dict, List, Optional + +# These are the user-defined functions that can be called by the agent. + + +def fetch_current_datetime(format: Optional[str] = None) -> str: + """ + Get the current time as a JSON string, optionally formatted. + + :param format (Optional[str]): The format in which to return the current time. Defaults to None, which uses a standard format. + :return: The current time in JSON format. + :rtype: str + """ + current_time = datetime.datetime.now() + + # Use the provided format if available, else use a default format + if format: + time_format = format + else: + time_format = "%Y-%m-%d %H:%M:%S" + + time_json = json.dumps({"current_time": current_time.strftime(time_format)}) + return time_json + + +def fetch_weather(location: str) -> str: + """ + Fetches the weather information for the specified location. + + :param location (str): The location to fetch weather for. + :return: Weather information as a JSON string. + :rtype: str + """ + # In a real-world scenario, you'd integrate with a weather API. + # Here, we'll mock the response. + mock_weather_data = {"New York": "Sunny, 25°C", "London": "Cloudy, 18°C", "Tokyo": "Rainy, 22°C", "Seattle": "Rainy, 14°C"} + weather = mock_weather_data.get(location, "Weather data not available for this location.") + weather_json = json.dumps({"weather": weather}) + return weather_json + +def opening_hours(tourist_destination: str) -> str: + """ + Fetches the opening hours of a tourist destination in Seattle. + + :param tourist_destination (str): The tourist destination to fetch opening hours for. + :return: Opening hours as a JSON string. + :rtype: str + """ + # In a real-world scenario, you'd integrate with an API or database. + # Here, we'll mock the response. + mock_opening_hours_data = { + "Space Needle": "9 AM - 11 PM", + "Pike Place Market": "9 AM - 6 PM", + "Museum of Pop Culture": "10 AM - 5 PM", + "Seattle Aquarium": "9:30 AM - 6 PM" + } + opening_hours = mock_opening_hours_data.get(tourist_destination, "Opening hours not available for this location.") + opening_hours_json = json.dumps({"opening_hours": opening_hours}) + return opening_hours_json + +def send_email(recipient: str, subject: str, body: str) -> str: + """ + Sends an email with the specified subject and body to the recipient. + + :param recipient (str): Email address of the recipient. + :param subject (str): Subject of the email. + :param body (str): Body content of the email. + :return: Confirmation message. + :rtype: str + """ + # In a real-world scenario, you'd use an SMTP server or an email service API. + # Here, we'll mock the email sending. + print(f"Sending email to {recipient}...") + print(f"Subject: {subject}") + print(f"Body:\n{body}") + + message_json = json.dumps({"message": f"Email successfully sent to {recipient}."}) + return message_json + + +def send_email_using_recipient_name(recipient: str, subject: str, body: str) -> str: + """ + Sends an email with the specified subject and body to the recipient. + + :param recipient (str): Name of the recipient. + :param subject (str): Subject of the email. + :param body (str): Body content of the email. + :return: Confirmation message. + :rtype: str + """ + # In a real-world scenario, you'd use an SMTP server or an email service API. + # Here, we'll mock the email sending. + print(f"Sending email to {recipient}...") + print(f"Subject: {subject}") + print(f"Body:\n{body}") + + message_json = json.dumps({"message": f"Email successfully sent to {recipient}."}) + return message_json + + +def calculate_sum(a: int, b: int) -> str: + """Calculates the sum of two integers. + + :param a (int): First integer. + :rtype: int + :param b (int): Second integer. + :rtype: int + + :return: The sum of the two integers. + :rtype: str + """ + result = a + b + return json.dumps({"result": result}) + + +def convert_temperature(celsius: float) -> str: + """Converts temperature from Celsius to Fahrenheit. + + :param celsius (float): Temperature in Celsius. + :rtype: float + + :return: Temperature in Fahrenheit. + :rtype: str + """ + fahrenheit = (celsius * 9 / 5) + 32 + return json.dumps({"fahrenheit": fahrenheit}) + + +def toggle_flag(flag: bool) -> str: + """Toggles a boolean flag. + + :param flag (bool): The flag to toggle. + :rtype: bool + + :return: The toggled flag. + :rtype: str + """ + toggled = not flag + return json.dumps({"toggled_flag": toggled}) + + +def merge_dicts(dict1: Dict[str, Any], dict2: Dict[str, Any]) -> str: + """Merges two dictionaries. + + :param dict1 (Dict[str, Any]): First dictionary. + :rtype: dict + :param dict2 (Dict[str, Any]): Second dictionary. + :rtype: dict + + :return: The merged dictionary. + :rtype: str + """ + merged = dict1.copy() + merged.update(dict2) + return json.dumps({"merged_dict": merged}) + + +def get_user_info(user_id: int) -> str: + """Retrieves user information based on user ID. + + :param user_id (int): ID of the user. + :rtype: int + + :return: User information as a JSON string. + :rtype: str + """ + mock_users = { + 1: {"name": "Alice", "email": "alice@example.com"}, + 2: {"name": "Bob", "email": "bob@example.com"}, + 3: {"name": "Charlie", "email": "charlie@example.com"}, + } + user_info = mock_users.get(user_id, {"error": "User not found."}) + return json.dumps({"user_info": user_info}) + + +def longest_word_in_sentences(sentences: List[str]) -> str: + """Finds the longest word in each sentence. + + :param sentences (List[str]): A list of sentences. + :return: A JSON string mapping each sentence to its longest word. + :rtype: str + """ + if not sentences: + return json.dumps({"error": "The list of sentences is empty"}) + + longest_words = {} + for sentence in sentences: + # Split sentence into words + words = sentence.split() + if words: + # Find the longest word + longest_word = max(words, key=len) + longest_words[sentence] = longest_word + else: + longest_words[sentence] = "" + + return json.dumps({"longest_words": longest_words}) + + +def process_records(records: List[Dict[str, int]]) -> str: + """ + Process a list of records, where each record is a dictionary with string keys and integer values. + + :param records: A list containing dictionaries that map strings to integers. + :return: A list of sums of the integer values in each record. + """ + sums = [] + for record in records: + # Sum up all the values in each dictionary and append the result to the sums list + total = sum(record.values()) + sums.append(total) + return json.dumps({"sums": sums}) + + +# Example User Input for Each Function +# 1. Fetch Current DateTime +# User Input: "What is the current date and time?" +# User Input: "What is the current date and time in '%Y-%m-%d %H:%M:%S' format?" + +# 2. Fetch Weather +# User Input: "Can you provide the weather information for New York?" + +# 3. Send Email +# User Input: "Send an email to john.doe@example.com with the subject 'Meeting Reminder' and body 'Don't forget our meeting at 3 PM.'" + +# 4. Calculate Sum +# User Input: "What is the sum of 45 and 55?" + +# 5. Convert Temperature +# User Input: "Convert 25 degrees Celsius to Fahrenheit." + +# 6. Toggle Flag +# User Input: "Toggle the flag True." + +# 7. Merge Dictionaries +# User Input: "Merge these two dictionaries: {'name': 'Alice'} and {'age': 30}." + +# 8. Get User Info +# User Input: "Retrieve user information for user ID 1." + +# 9. Longest Word in Sentences +# User Input: "Find the longest word in each of these sentences: ['The quick brown fox jumps over the lazy dog', 'Python is an amazing programming language', 'Azure AI capabilities are impressive']." + +# 10. Process Records +# User Input: "Process the following records: [{'a': 10, 'b': 20}, {'x': 5, 'y': 15, 'z': 25}, {'m': 30}]." + +# Statically defined user functions for fast reference +user_functions: Set[Callable[..., Any]] = { + fetch_current_datetime, + fetch_weather, + send_email, + calculate_sum, + convert_temperature, + toggle_flag, + merge_dicts, + get_user_info, + longest_word_in_sentences, + process_records, + opening_hours, +} From f493e631db6e4c45d0555903fd7217f24c89788c Mon Sep 17 00:00:00 2001 From: Chang Liu Date: Fri, 4 Apr 2025 14:03:56 -0400 Subject: [PATCH 02/11] minor update on notebook comment --- .../Evaluate_Azure_AI_Agent_Quality.ipynb | 32 ++++++++++--------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index adeb4c50..991d11f5 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -70,7 +70,8 @@ "metadata": {}, "outputs": [], "source": [ - "import os, json\n", + "import os\n", + "import json\n", "from azure.ai.projects import AIProjectClient\n", "from azure.identity import DefaultAzureCredential\n", "from azure.ai.projects.models import FunctionTool, ToolSet\n", @@ -85,7 +86,7 @@ "\n", "AGENT_NAME = \"Seattle Tourist Assistant\"\n", "\n", - "# Add Tools to be used by Agent \n", + "# Add Tools to be used by Agent\n", "functions = FunctionTool(user_functions)\n", "\n", "toolset = ToolSet()\n", @@ -229,7 +230,6 @@ "metadata": {}, "outputs": [], "source": [ - "import json\n", "from azure.ai.evaluation import AIAgentConverter\n", "\n", "# Initialize the converter that will be backed by the project.\n", @@ -251,7 +251,7 @@ "# Save the agent thread data to a JSONL file\n", "\n", "file_name = \"agent_thread_data.jsonl\"\n", - "evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)\n" + "evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)" ] }, { @@ -273,7 +273,12 @@ "metadata": {}, "outputs": [], "source": [ - "from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration, IntentResolutionEvaluator, TaskAdherenceEvaluator, ViolenceEvaluator\n", + "from azure.ai.evaluation import (\n", + " ToolCallAccuracyEvaluator,\n", + " AzureOpenAIModelConfiguration,\n", + " IntentResolutionEvaluator,\n", + " TaskAdherenceEvaluator,\n", + ")\n", "from pprint import pprint\n", "\n", "model_config = AzureOpenAIModelConfiguration(\n", @@ -283,19 +288,17 @@ " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", ")\n", "# Needed to use content safety evaluators\n", - "azure_ai_project={\n", + "azure_ai_project = {\n", " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", "}\n", "\n", "\n", - "# measures how well an agent identifies a user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities\n", + "# select evaluators relevant for your agent workflow\n", "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n", - "\n", "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", - "\n", - "task_adherence = TaskAdherenceEvaluator(model_config=model_config)\n" + "task_adherence = TaskAdherenceEvaluator(model_config=model_config)" ] }, { @@ -304,7 +307,7 @@ "metadata": {}, "outputs": [], "source": [ - "# test on a single agent run \n", + "# test on a single agent run\n", "evaluation_result = intent_resolution(**converted_data)\n", "print(f\"Results: {evaluation_result}\")" ] @@ -335,7 +338,7 @@ " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", - " }\n", + " },\n", ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] @@ -358,7 +361,7 @@ "# alternatively, you can use the following to get the evaluation results in memory\n", "\n", "# average scores across all runs\n", - "pprint(response['metrics'])" + "pprint(response[\"metrics\"])" ] } ], @@ -377,8 +380,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.7" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From de2139f9682f78a90f79a2d875a6f86eafaebf69 Mon Sep 17 00:00:00 2001 From: Chang Liu Date: Fri, 4 Apr 2025 15:46:52 -0400 Subject: [PATCH 03/11] minor update on notebook comment --- .../Supported_Evaluation_Metrics/Agent_Evaluation/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md index 080c92b1..2698a618 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/README.md @@ -5,7 +5,7 @@ languages: products: - ai-services - azure-openai -description: Evaluate agents with Azure AI Evaluation SDK +description: Evaluate AI agents with Azure AI Evaluation SDK --- From dec5d61f25e862aea6c91c781d6564395c2f4797 Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Fri, 4 Apr 2025 15:35:35 -0700 Subject: [PATCH 04/11] Updating notebooks for agent evaluators --- ...AI_Judge_Evaluator_Intent_Resolution.ipynb | 96 +---------------- .../AI_Judge_Evaluator_Task_Adherence.ipynb | 94 +--------------- ...I_Judge_Evaluator_Tool_Call_Accuracy.ipynb | 102 +----------------- .../Evaluate_Azure_AI_Agent_Quality.ipynb | 37 ++----- .../Agent_Evaluation/evaluation_data.jsonl | 5 + 5 files changed, 29 insertions(+), 305 deletions(-) create mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/evaluation_data.jsonl diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb index 5aaa7aae..87b7413f 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb @@ -332,71 +332,6 @@ "pprint(result)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Evaluating an agent conversation loaded from disk\n", - "\n", - "We will use convert to convert agent messages into required evaluation data. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "from azure.ai.evaluation import AIAgentConverter\n", - "\n", - "def load_conversations(filename):\n", - " with open(filename, 'r') as file:\n", - " lines = file.readlines()\n", - " parsed_conversations = [json.loads(line) for line in lines]\n", - " print(f\"Loaded {len(parsed_conversations)} conversations from {filename}.\")\n", - " return parsed_conversations\n", - "\n", - "conversations_filename = r'agent_messages.jsonl'\n", - "\n", - "sample_conversations = load_conversations(conversations_filename)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Evaluate a single conversation from the loaded conversations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "conversation = sample_conversations[0]\n", - "\n", - "run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", - "print(f\"Run IDs in conversation: {run_ids}\")\n", - "run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int\n", - "converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)\n", - "# Extract the query and response from the conversation\n", - "query = converted_conv['query']\n", - "response = converted_conv['response']\n", - "tool_definitions = converted_conv['tool_definitions']\n", - "\n", - "print(f\"Run ID: {run_id}\")\n", - "print(f\"Query: {query}\")\n", - "print(f\"Response: {response}\")\n", - "print(f\"Tool Definitions: {tool_definitions}\")\n", - "\n", - "result = intent_resolution_evaluator(query = query, response = response, tool_definitions = tool_definitions)\n", - "print(f\"Evaluation result\")\n", - "pprint(result)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -407,30 +342,6 @@ "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell.\n" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Write the sample_conversations variable into a .jsonl file\n", - "file_name = \"evaluation_data.jsonl\"\n", - "\n", - "with open(file_name, 'w') as file:\n", - " for conversation in sample_conversations:\n", - " run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", - " for run_id in run_ids:\n", - " converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))\n", - " line = {\n", - " \"query\": converted_conv['query'],\n", - " \"response\": converted_conv['response'],\n", - " \"tool_definitions\": converted_conv['tool_definitions']\n", - " }\n", - " file.write(json.dumps(line) + '\\n')\n", - "\n", - "print(f\"Sample conversations have been written to {file_name}.\")" - ] - }, { "cell_type": "code", "execution_count": null, @@ -439,6 +350,9 @@ "source": [ "from azure.ai.evaluation import evaluate\n", "\n", + "# This sample files contains the evaluation data in JSONL format. Where each line is a run from agent.\n", + "# This was saved using agent thread and converter.\n", + "file_name = \"evaluation_data.jsonl\"\n", "\n", "response = evaluate(\n", " data=file_name,\n", @@ -465,7 +379,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "test_agent_eval_sample", "language": "python", "name": "python3" }, @@ -479,7 +393,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.7" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb index 86c6b31e..e0ab3fcf 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb @@ -247,71 +247,6 @@ "pprint(result)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Evaluating an agent conversation loaded from disk\n", - "\n", - "We will use convert to convert agent messages into required evaluation data. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "from azure.ai.evaluation import AIAgentConverter\n", - "\n", - "def load_conversations(filename):\n", - " with open(filename, 'r') as file:\n", - " lines = file.readlines()\n", - " parsed_conversations = [json.loads(line) for line in lines]\n", - " print(f\"Loaded {len(parsed_conversations)} conversations from {filename}.\")\n", - " return parsed_conversations\n", - "\n", - "conversations_filename = r'agent_messages.jsonl'\n", - "\n", - "sample_conversations = load_conversations(conversations_filename)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Evaluate a single conversation from the loaded conversations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "conversation = sample_conversations[0]\n", - "\n", - "run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", - "print(f\"Run IDs in conversation: {run_ids}\")\n", - "run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int\n", - "converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)\n", - "# Extract the query and response from the conversation\n", - "query = converted_conv['query']\n", - "response = converted_conv['response']\n", - "tool_definitions = converted_conv['tool_definitions']\n", - "\n", - "print(f\"Run ID: {run_id}\")\n", - "print(f\"Query: {query}\")\n", - "print(f\"Response: {response}\")\n", - "print(f\"Tool Definitions: {tool_definitions}\")\n", - "\n", - "result = task_adherence_evaluator(query = query, response = response, tool_definitions = tool_definitions)\n", - "print(f\"Evaluation result\")\n", - "pprint(result)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -322,30 +257,6 @@ "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Write the sample_conversations variable into a .jsonl file\n", - "file_name = \"evaluation_data.jsonl\"\n", - "\n", - "with open(file_name, 'w') as file:\n", - " for conversation in sample_conversations:\n", - " run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", - " for run_id in run_ids:\n", - " converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))\n", - " line = {\n", - " \"query\": converted_conv['query'],\n", - " \"response\": converted_conv['response'],\n", - " \"tool_definitions\": converted_conv['tool_definitions']\n", - " }\n", - " file.write(json.dumps(line) + '\\n')\n", - "\n", - "print(f\"Sample conversations have been written to {file_name}.\")" - ] - }, { "cell_type": "code", "execution_count": null, @@ -354,6 +265,9 @@ "source": [ "from azure.ai.evaluation import evaluate\n", "\n", + "# This sample files contains the evaluation data in JSONL format. Where each line is a run from agent.\n", + "# This was saved using agent thread and converter.\n", + "file_name = \"evaluation_data.jsonl\"\n", "\n", "response = evaluate(\n", " data=file_name,\n", @@ -373,7 +287,7 @@ ], "metadata": { "kernelspec": { - "display_name": "test_agent_evaluator_prp", + "display_name": "test_agent_eval_sample", "language": "python", "name": "python3" }, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb index d9e16ab8..5cc3edea 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb @@ -342,71 +342,6 @@ "pprint(response)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Evaluating an agent conversation loaded from disk\n", - "\n", - "We will use convert to convert agent messages into required evaluation data. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "from azure.ai.evaluation import AIAgentConverter\n", - "\n", - "def load_conversations(filename):\n", - " with open(filename, 'r') as file:\n", - " lines = file.readlines()\n", - " parsed_conversations = [json.loads(line) for line in lines]\n", - " print(f\"Loaded {len(parsed_conversations)} conversations from {filename}.\")\n", - " return parsed_conversations\n", - "\n", - "conversations_filename = r'agent_messages.jsonl'\n", - "\n", - "sample_conversations = load_conversations(conversations_filename)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Evaluate a single conversation from the loaded conversations" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "conversation = sample_conversations[0]\n", - "\n", - "run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", - "print(f\"Run IDs in conversation: {run_ids}\")\n", - "run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int\n", - "converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)\n", - "# Extract the query and response from the conversation\n", - "query = converted_conv['query']\n", - "response = converted_conv['response']\n", - "tool_definitions = converted_conv['tool_definitions']\n", - "\n", - "print(f\"Run ID: {run_id}\")\n", - "print(f\"Query: {query}\")\n", - "print(f\"Response: {response}\")\n", - "print(f\"Tool Definitions: {tool_definitions}\")\n", - "\n", - "result = tool_call_accuracy(query = query, response = response, tool_definitions = tool_definitions)\n", - "print(f\"Evaluation result\")\n", - "pprint(result)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -417,30 +352,6 @@ "Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell.\n" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Write the sample_conversations variable into a .jsonl file\n", - "file_name = \"evaluation_data.jsonl\"\n", - "\n", - "with open(file_name, 'w') as file:\n", - " for conversation in sample_conversations:\n", - " run_ids = AIAgentConverter.run_ids_from_conversation(conversation)\n", - " for run_id in run_ids:\n", - " converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))\n", - " line = {\n", - " \"query\": converted_conv['query'],\n", - " \"response\": converted_conv['response'],\n", - " \"tool_definitions\": converted_conv['tool_definitions']\n", - " }\n", - " file.write(json.dumps(line) + '\\n')\n", - "\n", - "print(f\"Sample conversations have been written to {file_name}.\")" - ] - }, { "cell_type": "code", "execution_count": null, @@ -448,13 +359,15 @@ "outputs": [], "source": [ "from azure.ai.evaluation import evaluate\n", - "\n", + "# This sample files contains the evaluation data in JSONL format. Where each line is a run from agent.\n", + "# This was saved using agent thread and converter.\n", + "file_name = \"evaluation_data.jsonl\"\n", "\n", "response = evaluate(\n", " data=file_name,\n", " evaluation_name=\"Tool Call Accuracy Evaluation\",\n", " evaluators={\n", - " \"intent_resolution\": tool_call_accuracy,\n", + " \"tool_call_accuracy\": tool_call_accuracy,\n", " },\n", " azure_ai_project={\n", " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", @@ -464,16 +377,11 @@ ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] } ], "metadata": { "kernelspec": { - "display_name": "test_agent_evaluator_prp", + "display_name": "test_agent_eval_sample", "language": "python", "name": "python3" }, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index adeb4c50..09772b97 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -157,7 +157,7 @@ "source": [ "# Create message to thread\n", "\n", - "MESSAGE = \"Can you send me an email with weather information for Seattle?\"\n", + "MESSAGE = \"Can you email me weather info for Seattle ?\"\n", "\n", "message = project_client.agents.create_message(\n", " thread_id=thread.id,\n", @@ -237,21 +237,15 @@ "\n", "thread_id = thread.id\n", "run_id = run.id\n", + "file_name = \"evaluation_data.jsonl\"\n", "\n", - "converted_data = converter.convert(thread_id, run_id)\n", - "print(json.dumps(converted_data, indent=4))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Save the agent thread data to a JSONL file\n", + "# Get a single agent run data\n", + "evaluation_data_single_run = converter.convert(thread_id=thread_id, run_id=run_id)\n", "\n", - "file_name = \"agent_thread_data.jsonl\"\n", - "evaluation_data = converter.prepare_evaluation_data(thread_ids=thread.id, filename=file_name)\n" + "# Run this to save thread data to a JSONL file for evaluation\n", + "# Save the agent thread data to a JSONL file\n", + "# evaluation_data = converter.prepare_evaluation_data(thread_ids=thread_id, filename=<>)\n", + "# print(json.dumps(evaluation_data, indent=4))" ] }, { @@ -298,17 +292,6 @@ "task_adherence = TaskAdherenceEvaluator(model_config=model_config)\n" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# test on a single agent run \n", - "evaluation_result = intent_resolution(**converted_data)\n", - "print(f\"Results: {evaluation_result}\")" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -364,7 +347,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "test_agent_eval_sample", "language": "python", "name": "python3" }, @@ -378,7 +361,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.7" + "version": "3.12.9" } }, "nbformat": 4, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/evaluation_data.jsonl b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/evaluation_data.jsonl new file mode 100644 index 00000000..ecccf1e3 --- /dev/null +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/evaluation_data.jsonl @@ -0,0 +1,5 @@ +{"query": [{"role": "system", "content": "You are a helpful assistant"}, {"createdAt": "2025-04-04T20:48:01Z", "role": "user", "content": [{"type": "text", "text": "How is the weather in London ?"}]}], "response": [{"createdAt": "2025-04-04T20:48:07Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "name": "fetch_weather", "arguments": {"location": "London"}}]}, {"createdAt": "2025-04-04T20:48:08Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Cloudy, 18\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:09Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "text", "text": "The weather in London is currently cloudy with a temperature of 18\u00b0C."}]}], "tool_definitions": [{"name": "convert_temperature", "description": "Converts temperature from Celsius to Fahrenheit.", "parameters": {"type": "object", "properties": {"celsius": {"type": "number", "description": "Temperature in Celsius."}}}}, {"name": "send_email", "description": "Sends an email with the specified subject and body to the recipient.", "parameters": {"type": "object", "properties": {"recipient": {"type": "string", "description": "Email address of the recipient."}, "subject": {"type": "string", "description": "Subject of the email."}, "body": {"type": "string", "description": "Body content of the email."}}}}, {"name": "get_user_info", "description": "Retrieves user information based on user ID.", "parameters": {"type": "object", "properties": {"user_id": {"type": "integer", "description": "ID of the user."}}}}, {"name": "opening_hours", "description": "Fetches the opening hours of a tourist destination in Seattle.", "parameters": {"type": "object", "properties": {"tourist_destination": {"type": "string", "description": "The tourist destination to fetch opening hours for."}}}}, {"name": "toggle_flag", "description": "Toggles a boolean flag.", "parameters": {"type": "object", "properties": {"flag": {"type": "boolean", "description": "The flag to toggle."}}}}, {"name": "longest_word_in_sentences", "description": "Finds the longest word in each sentence.", "parameters": {"type": "object", "properties": {"sentences": {"type": "array", "items": {"type": "string"}, "description": "A list of sentences."}}}}, {"name": "merge_dicts", "description": "Merges two dictionaries.", "parameters": {"type": "object", "properties": {"dict1": {"type": "object", "description": "First dictionary."}, "dict2": {"type": "object", "description": "Second dictionary."}}}}, {"name": "calculate_sum", "description": "Calculates the sum of two integers.", "parameters": {"type": "object", "properties": {"a": {"type": "integer", "description": "First integer."}, "b": {"type": "integer", "description": "Second integer."}}}}, {"name": "process_records", "description": "Process a list of records, where each record is a dictionary with string keys and integer values.", "parameters": {"type": "object", "properties": {"records": {"type": "array", "items": {"type": "object"}, "description": "A list containing dictionaries that map strings to integers."}}}}, {"name": "fetch_current_datetime", "description": "Get the current time as a JSON string, optionally formatted.", "parameters": {"type": "object", "properties": {"format": {"type": ["string", "null"], "description": "The format in which to return the current time. Defaults to None, which uses a standard format."}}}}, {"name": "fetch_weather", "description": "Fetches the weather information for the specified location.", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}}}}]} +{"query": [{"role": "system", "content": "You are a helpful assistant"}, {"createdAt": "2025-04-04T20:48:01Z", "role": "user", "content": [{"type": "text", "text": "How is the weather in London ?"}]}, {"createdAt": "2025-04-04T20:48:07Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "name": "fetch_weather", "arguments": {"location": "London"}}]}, {"createdAt": "2025-04-04T20:48:08Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Cloudy, 18\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:09Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "text", "text": "The weather in London is currently cloudy with a temperature of 18\u00b0C."}]}, {"createdAt": "2025-04-04T20:48:31Z", "role": "user", "content": [{"type": "text", "text": "How about Seattle ?"}]}], "response": [{"createdAt": "2025-04-04T20:48:33Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "name": "fetch_weather", "arguments": {"location": "Seattle"}}]}, {"createdAt": "2025-04-04T20:48:34Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Rainy, 14\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:36Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "text", "text": "The weather in Seattle is currently rainy with a temperature of 14\u00b0C."}]}], "tool_definitions": [{"name": "convert_temperature", "description": "Converts temperature from Celsius to Fahrenheit.", "parameters": {"type": "object", "properties": {"celsius": {"type": "number", "description": "Temperature in Celsius."}}}}, {"name": "send_email", "description": "Sends an email with the specified subject and body to the recipient.", "parameters": {"type": "object", "properties": {"recipient": {"type": "string", "description": "Email address of the recipient."}, "subject": {"type": "string", "description": "Subject of the email."}, "body": {"type": "string", "description": "Body content of the email."}}}}, {"name": "get_user_info", "description": "Retrieves user information based on user ID.", "parameters": {"type": "object", "properties": {"user_id": {"type": "integer", "description": "ID of the user."}}}}, {"name": "opening_hours", "description": "Fetches the opening hours of a tourist destination in Seattle.", "parameters": {"type": "object", "properties": {"tourist_destination": {"type": "string", "description": "The tourist destination to fetch opening hours for."}}}}, {"name": "toggle_flag", "description": "Toggles a boolean flag.", "parameters": {"type": "object", "properties": {"flag": {"type": "boolean", "description": "The flag to toggle."}}}}, {"name": "longest_word_in_sentences", "description": "Finds the longest word in each sentence.", "parameters": {"type": "object", "properties": {"sentences": {"type": "array", "items": {"type": "string"}, "description": "A list of sentences."}}}}, {"name": "merge_dicts", "description": "Merges two dictionaries.", "parameters": {"type": "object", "properties": {"dict1": {"type": "object", "description": "First dictionary."}, "dict2": {"type": "object", "description": "Second dictionary."}}}}, {"name": "calculate_sum", "description": "Calculates the sum of two integers.", "parameters": {"type": "object", "properties": {"a": {"type": "integer", "description": "First integer."}, "b": {"type": "integer", "description": "Second integer."}}}}, {"name": "process_records", "description": "Process a list of records, where each record is a dictionary with string keys and integer values.", "parameters": {"type": "object", "properties": {"records": {"type": "array", "items": {"type": "object"}, "description": "A list containing dictionaries that map strings to integers."}}}}, {"name": "fetch_current_datetime", "description": "Get the current time as a JSON string, optionally formatted.", "parameters": {"type": "object", "properties": {"format": {"type": ["string", "null"], "description": "The format in which to return the current time. Defaults to None, which uses a standard format."}}}}, {"name": "fetch_weather", "description": "Fetches the weather information for the specified location.", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}}}}]} +{"query": [{"role": "system", "content": "You are a helpful assistant"}, {"createdAt": "2025-04-04T20:48:01Z", "role": "user", "content": [{"type": "text", "text": "How is the weather in London ?"}]}, {"createdAt": "2025-04-04T20:48:07Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "name": "fetch_weather", "arguments": {"location": "London"}}]}, {"createdAt": "2025-04-04T20:48:08Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Cloudy, 18\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:09Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "text", "text": "The weather in London is currently cloudy with a temperature of 18\u00b0C."}]}, {"createdAt": "2025-04-04T20:48:31Z", "role": "user", "content": [{"type": "text", "text": "How about Seattle ?"}]}, {"createdAt": "2025-04-04T20:48:33Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "name": "fetch_weather", "arguments": {"location": "Seattle"}}]}, {"createdAt": "2025-04-04T20:48:34Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Rainy, 14\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:36Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "text", "text": "The weather in Seattle is currently rainy with a temperature of 14\u00b0C."}]}, {"createdAt": "2025-04-04T20:49:25Z", "role": "user", "content": [{"type": "text", "text": "How are popular places to visit in Seattle ?"}]}, {"createdAt": "2025-04-04T20:50:00Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_o5oYob00xO5OFWiNEOyax8Xh", "name": "opening_hours", "arguments": {"tourist_destination": "Space Needle"}}]}, {"createdAt": "2025-04-04T20:50:02Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "tool_call_id": "call_o5oYob00xO5OFWiNEOyax8Xh", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"opening_hours": "9 AM - 11 PM"}}]}, {"createdAt": "2025-04-04T20:50:30Z", "run_id": "run_xgvcILwpONBHBXRXHfayWpII", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_DnaETwKobebPZnVttxZzQnAM", "name": "send_email", "arguments": {"recipient": "your_email@example.com", "subject": "Seattle and London Weather & Attractions Info", "body": "Weather in:\n\nLondon: Cloudy, 18\u00b0C\nSeattle: Rainy, 14\u00b0C\n\nPopular Places to Visit in Seattle:\n1. Space Needle\n2. Pike Place Market\n3. Chihuly Garden and Glass\n4. Seattle Aquarium\n5. Museum of Pop Culture\n6. Seattle Art Museum\n7. Kerry Park\n8. Seattle Great Wheel\n\nOpening Hours:\n- Space Needle: 9 AM - 11 PM"}}]}, {"createdAt": "2025-04-04T20:50:33Z", "run_id": "run_xgvcILwpONBHBXRXHfayWpII", "tool_call_id": "call_DnaETwKobebPZnVttxZzQnAM", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"message": "Email successfully sent to your_email@example.com."}}]}], "response": [{"createdAt": "2025-04-04T20:49:28Z", "run_id": "run_Hf1BY0vcUXPM72lYpOCgjm8s", "role": "assistant", "content": [{"type": "text", "text": "Seattle is known for various popular tourist attractions, including:\n\n1. **Space Needle**: An iconic observation tower that offers panoramic views of the city and surrounding mountains.\n2. **Pike Place Market**: A bustling market famous for its fresh produce, gourmet foods, and unique shops, as well as the original Starbucks.\n3. **Chihuly Garden and Glass**: An exhibition showcasing the stunning glass works of artist Dale Chihuly, including a beautiful indoor and outdoor garden.\n4. **Seattle Aquarium**: A family-friendly attraction that features marine life native to the Pacific Northwest.\n5. **Museum of Pop Culture**: A museum dedicated to contemporary popular culture, featuring exhibits on music, science fiction, and more.\n6. **Seattle Art Museum**: A museum that houses an extensive collection of art from various cultures and time periods.\n7. **Kerry Park**: A small park with a renowned viewpoint of the Seattle skyline.\n8. **Seattle Great Wheel**: A large Ferris wheel located on Pier 57 that offers great views of Elliott Bay and the city.\n\nIf you need information about the opening hours for any specific attractions, let me know!"}]}], "tool_definitions": [{"name": "convert_temperature", "description": "Converts temperature from Celsius to Fahrenheit.", "parameters": {"type": "object", "properties": {"celsius": {"type": "number", "description": "Temperature in Celsius."}}}}, {"name": "send_email", "description": "Sends an email with the specified subject and body to the recipient.", "parameters": {"type": "object", "properties": {"recipient": {"type": "string", "description": "Email address of the recipient."}, "subject": {"type": "string", "description": "Subject of the email."}, "body": {"type": "string", "description": "Body content of the email."}}}}, {"name": "get_user_info", "description": "Retrieves user information based on user ID.", "parameters": {"type": "object", "properties": {"user_id": {"type": "integer", "description": "ID of the user."}}}}, {"name": "opening_hours", "description": "Fetches the opening hours of a tourist destination in Seattle.", "parameters": {"type": "object", "properties": {"tourist_destination": {"type": "string", "description": "The tourist destination to fetch opening hours for."}}}}, {"name": "toggle_flag", "description": "Toggles a boolean flag.", "parameters": {"type": "object", "properties": {"flag": {"type": "boolean", "description": "The flag to toggle."}}}}, {"name": "longest_word_in_sentences", "description": "Finds the longest word in each sentence.", "parameters": {"type": "object", "properties": {"sentences": {"type": "array", "items": {"type": "string"}, "description": "A list of sentences."}}}}, {"name": "merge_dicts", "description": "Merges two dictionaries.", "parameters": {"type": "object", "properties": {"dict1": {"type": "object", "description": "First dictionary."}, "dict2": {"type": "object", "description": "Second dictionary."}}}}, {"name": "calculate_sum", "description": "Calculates the sum of two integers.", "parameters": {"type": "object", "properties": {"a": {"type": "integer", "description": "First integer."}, "b": {"type": "integer", "description": "Second integer."}}}}, {"name": "process_records", "description": "Process a list of records, where each record is a dictionary with string keys and integer values.", "parameters": {"type": "object", "properties": {"records": {"type": "array", "items": {"type": "object"}, "description": "A list containing dictionaries that map strings to integers."}}}}, {"name": "fetch_current_datetime", "description": "Get the current time as a JSON string, optionally formatted.", "parameters": {"type": "object", "properties": {"format": {"type": ["string", "null"], "description": "The format in which to return the current time. Defaults to None, which uses a standard format."}}}}, {"name": "fetch_weather", "description": "Fetches the weather information for the specified location.", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}}}}]} +{"query": [{"role": "system", "content": "You are a helpful assistant"}, {"createdAt": "2025-04-04T20:48:01Z", "role": "user", "content": [{"type": "text", "text": "How is the weather in London ?"}]}, {"createdAt": "2025-04-04T20:48:07Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "name": "fetch_weather", "arguments": {"location": "London"}}]}, {"createdAt": "2025-04-04T20:48:08Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Cloudy, 18\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:09Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "text", "text": "The weather in London is currently cloudy with a temperature of 18\u00b0C."}]}, {"createdAt": "2025-04-04T20:48:31Z", "role": "user", "content": [{"type": "text", "text": "How about Seattle ?"}]}, {"createdAt": "2025-04-04T20:48:33Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "name": "fetch_weather", "arguments": {"location": "Seattle"}}]}, {"createdAt": "2025-04-04T20:48:34Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Rainy, 14\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:36Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "text", "text": "The weather in Seattle is currently rainy with a temperature of 14\u00b0C."}]}, {"createdAt": "2025-04-04T20:49:25Z", "role": "user", "content": [{"type": "text", "text": "How are popular places to visit in Seattle ?"}]}, {"createdAt": "2025-04-04T20:49:28Z", "run_id": "run_Hf1BY0vcUXPM72lYpOCgjm8s", "role": "assistant", "content": [{"type": "text", "text": "Seattle is known for various popular tourist attractions, including:\n\n1. **Space Needle**: An iconic observation tower that offers panoramic views of the city and surrounding mountains.\n2. **Pike Place Market**: A bustling market famous for its fresh produce, gourmet foods, and unique shops, as well as the original Starbucks.\n3. **Chihuly Garden and Glass**: An exhibition showcasing the stunning glass works of artist Dale Chihuly, including a beautiful indoor and outdoor garden.\n4. **Seattle Aquarium**: A family-friendly attraction that features marine life native to the Pacific Northwest.\n5. **Museum of Pop Culture**: A museum dedicated to contemporary popular culture, featuring exhibits on music, science fiction, and more.\n6. **Seattle Art Museum**: A museum that houses an extensive collection of art from various cultures and time periods.\n7. **Kerry Park**: A small park with a renowned viewpoint of the Seattle skyline.\n8. **Seattle Great Wheel**: A large Ferris wheel located on Pier 57 that offers great views of Elliott Bay and the city.\n\nIf you need information about the opening hours for any specific attractions, let me know!"}]}, {"createdAt": "2025-04-04T20:49:58Z", "role": "user", "content": [{"type": "text", "text": "What are the opening hours of Space Needle ?"}]}], "response": [{"createdAt": "2025-04-04T20:50:00Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_o5oYob00xO5OFWiNEOyax8Xh", "name": "opening_hours", "arguments": {"tourist_destination": "Space Needle"}}]}, {"createdAt": "2025-04-04T20:50:02Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "tool_call_id": "call_o5oYob00xO5OFWiNEOyax8Xh", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"opening_hours": "9 AM - 11 PM"}}]}, {"createdAt": "2025-04-04T20:50:03Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "role": "assistant", "content": [{"type": "text", "text": "The Space Needle is open from 9 AM to 11 PM."}]}], "tool_definitions": [{"name": "convert_temperature", "description": "Converts temperature from Celsius to Fahrenheit.", "parameters": {"type": "object", "properties": {"celsius": {"type": "number", "description": "Temperature in Celsius."}}}}, {"name": "send_email", "description": "Sends an email with the specified subject and body to the recipient.", "parameters": {"type": "object", "properties": {"recipient": {"type": "string", "description": "Email address of the recipient."}, "subject": {"type": "string", "description": "Subject of the email."}, "body": {"type": "string", "description": "Body content of the email."}}}}, {"name": "get_user_info", "description": "Retrieves user information based on user ID.", "parameters": {"type": "object", "properties": {"user_id": {"type": "integer", "description": "ID of the user."}}}}, {"name": "opening_hours", "description": "Fetches the opening hours of a tourist destination in Seattle.", "parameters": {"type": "object", "properties": {"tourist_destination": {"type": "string", "description": "The tourist destination to fetch opening hours for."}}}}, {"name": "toggle_flag", "description": "Toggles a boolean flag.", "parameters": {"type": "object", "properties": {"flag": {"type": "boolean", "description": "The flag to toggle."}}}}, {"name": "longest_word_in_sentences", "description": "Finds the longest word in each sentence.", "parameters": {"type": "object", "properties": {"sentences": {"type": "array", "items": {"type": "string"}, "description": "A list of sentences."}}}}, {"name": "merge_dicts", "description": "Merges two dictionaries.", "parameters": {"type": "object", "properties": {"dict1": {"type": "object", "description": "First dictionary."}, "dict2": {"type": "object", "description": "Second dictionary."}}}}, {"name": "calculate_sum", "description": "Calculates the sum of two integers.", "parameters": {"type": "object", "properties": {"a": {"type": "integer", "description": "First integer."}, "b": {"type": "integer", "description": "Second integer."}}}}, {"name": "process_records", "description": "Process a list of records, where each record is a dictionary with string keys and integer values.", "parameters": {"type": "object", "properties": {"records": {"type": "array", "items": {"type": "object"}, "description": "A list containing dictionaries that map strings to integers."}}}}, {"name": "fetch_current_datetime", "description": "Get the current time as a JSON string, optionally formatted.", "parameters": {"type": "object", "properties": {"format": {"type": ["string", "null"], "description": "The format in which to return the current time. Defaults to None, which uses a standard format."}}}}, {"name": "fetch_weather", "description": "Fetches the weather information for the specified location.", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}}}}]} +{"query": [{"role": "system", "content": "You are a helpful assistant"}, {"createdAt": "2025-04-04T20:48:01Z", "role": "user", "content": [{"type": "text", "text": "How is the weather in London ?"}]}, {"createdAt": "2025-04-04T20:48:07Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "name": "fetch_weather", "arguments": {"location": "London"}}]}, {"createdAt": "2025-04-04T20:48:08Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "tool_call_id": "call_a2XxwMTijdIclDEdZMjEaePv", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Cloudy, 18\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:09Z", "run_id": "run_l32Z3SXA5g2bxrs6IXbnRolz", "role": "assistant", "content": [{"type": "text", "text": "The weather in London is currently cloudy with a temperature of 18\u00b0C."}]}, {"createdAt": "2025-04-04T20:48:31Z", "role": "user", "content": [{"type": "text", "text": "How about Seattle ?"}]}, {"createdAt": "2025-04-04T20:48:33Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "name": "fetch_weather", "arguments": {"location": "Seattle"}}]}, {"createdAt": "2025-04-04T20:48:34Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "tool_call_id": "call_aS0HwOu4vYRFvlEqcrKLmSIN", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"weather": "Rainy, 14\u00b0C"}}]}, {"createdAt": "2025-04-04T20:48:36Z", "run_id": "run_r889lwQqezJlcSkYhD3thurj", "role": "assistant", "content": [{"type": "text", "text": "The weather in Seattle is currently rainy with a temperature of 14\u00b0C."}]}, {"createdAt": "2025-04-04T20:49:25Z", "role": "user", "content": [{"type": "text", "text": "How are popular places to visit in Seattle ?"}]}, {"createdAt": "2025-04-04T20:49:28Z", "run_id": "run_Hf1BY0vcUXPM72lYpOCgjm8s", "role": "assistant", "content": [{"type": "text", "text": "Seattle is known for various popular tourist attractions, including:\n\n1. **Space Needle**: An iconic observation tower that offers panoramic views of the city and surrounding mountains.\n2. **Pike Place Market**: A bustling market famous for its fresh produce, gourmet foods, and unique shops, as well as the original Starbucks.\n3. **Chihuly Garden and Glass**: An exhibition showcasing the stunning glass works of artist Dale Chihuly, including a beautiful indoor and outdoor garden.\n4. **Seattle Aquarium**: A family-friendly attraction that features marine life native to the Pacific Northwest.\n5. **Museum of Pop Culture**: A museum dedicated to contemporary popular culture, featuring exhibits on music, science fiction, and more.\n6. **Seattle Art Museum**: A museum that houses an extensive collection of art from various cultures and time periods.\n7. **Kerry Park**: A small park with a renowned viewpoint of the Seattle skyline.\n8. **Seattle Great Wheel**: A large Ferris wheel located on Pier 57 that offers great views of Elliott Bay and the city.\n\nIf you need information about the opening hours for any specific attractions, let me know!"}]}, {"createdAt": "2025-04-04T20:49:58Z", "role": "user", "content": [{"type": "text", "text": "What are the opening hours of Space Needle ?"}]}, {"createdAt": "2025-04-04T20:50:00Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_o5oYob00xO5OFWiNEOyax8Xh", "name": "opening_hours", "arguments": {"tourist_destination": "Space Needle"}}]}, {"createdAt": "2025-04-04T20:50:02Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "tool_call_id": "call_o5oYob00xO5OFWiNEOyax8Xh", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"opening_hours": "9 AM - 11 PM"}}]}, {"createdAt": "2025-04-04T20:50:03Z", "run_id": "run_HNZQylSFiNSNTC4rDwDDbZOb", "role": "assistant", "content": [{"type": "text", "text": "The Space Needle is open from 9 AM to 11 PM."}]}, {"createdAt": "2025-04-04T20:50:27Z", "role": "user", "content": [{"type": "text", "text": "Can you email me all this info please ?"}]}], "response": [{"createdAt": "2025-04-04T20:50:30Z", "run_id": "run_xgvcILwpONBHBXRXHfayWpII", "role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_DnaETwKobebPZnVttxZzQnAM", "name": "send_email", "arguments": {"recipient": "your_email@example.com", "subject": "Seattle and London Weather & Attractions Info", "body": "Weather in:\n\nLondon: Cloudy, 18\u00b0C\nSeattle: Rainy, 14\u00b0C\n\nPopular Places to Visit in Seattle:\n1. Space Needle\n2. Pike Place Market\n3. Chihuly Garden and Glass\n4. Seattle Aquarium\n5. Museum of Pop Culture\n6. Seattle Art Museum\n7. Kerry Park\n8. Seattle Great Wheel\n\nOpening Hours:\n- Space Needle: 9 AM - 11 PM"}}]}, {"createdAt": "2025-04-04T20:50:33Z", "run_id": "run_xgvcILwpONBHBXRXHfayWpII", "tool_call_id": "call_DnaETwKobebPZnVttxZzQnAM", "role": "tool", "content": [{"type": "tool_result", "tool_result": {"message": "Email successfully sent to your_email@example.com."}}]}, {"createdAt": "2025-04-04T20:50:34Z", "run_id": "run_xgvcILwpONBHBXRXHfayWpII", "role": "assistant", "content": [{"type": "text", "text": "I have successfully emailed you the information about the weather in London and Seattle, popular places to visit in Seattle, and the opening hours of the Space Needle. If you need anything else, feel free to ask!"}]}], "tool_definitions": [{"name": "convert_temperature", "description": "Converts temperature from Celsius to Fahrenheit.", "parameters": {"type": "object", "properties": {"celsius": {"type": "number", "description": "Temperature in Celsius."}}}}, {"name": "send_email", "description": "Sends an email with the specified subject and body to the recipient.", "parameters": {"type": "object", "properties": {"recipient": {"type": "string", "description": "Email address of the recipient."}, "subject": {"type": "string", "description": "Subject of the email."}, "body": {"type": "string", "description": "Body content of the email."}}}}, {"name": "get_user_info", "description": "Retrieves user information based on user ID.", "parameters": {"type": "object", "properties": {"user_id": {"type": "integer", "description": "ID of the user."}}}}, {"name": "opening_hours", "description": "Fetches the opening hours of a tourist destination in Seattle.", "parameters": {"type": "object", "properties": {"tourist_destination": {"type": "string", "description": "The tourist destination to fetch opening hours for."}}}}, {"name": "toggle_flag", "description": "Toggles a boolean flag.", "parameters": {"type": "object", "properties": {"flag": {"type": "boolean", "description": "The flag to toggle."}}}}, {"name": "longest_word_in_sentences", "description": "Finds the longest word in each sentence.", "parameters": {"type": "object", "properties": {"sentences": {"type": "array", "items": {"type": "string"}, "description": "A list of sentences."}}}}, {"name": "merge_dicts", "description": "Merges two dictionaries.", "parameters": {"type": "object", "properties": {"dict1": {"type": "object", "description": "First dictionary."}, "dict2": {"type": "object", "description": "Second dictionary."}}}}, {"name": "calculate_sum", "description": "Calculates the sum of two integers.", "parameters": {"type": "object", "properties": {"a": {"type": "integer", "description": "First integer."}, "b": {"type": "integer", "description": "Second integer."}}}}, {"name": "process_records", "description": "Process a list of records, where each record is a dictionary with string keys and integer values.", "parameters": {"type": "object", "properties": {"records": {"type": "array", "items": {"type": "object"}, "description": "A list containing dictionaries that map strings to integers."}}}}, {"name": "fetch_current_datetime", "description": "Get the current time as a JSON string, optionally formatted.", "parameters": {"type": "object", "properties": {"format": {"type": ["string", "null"], "description": "The format in which to return the current time. Defaults to None, which uses a standard format."}}}}, {"name": "fetch_weather", "description": "Fetches the weather information for the specified location.", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}}}}]} From fda57c849125c944592f94172259d58fc95b6f93 Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Fri, 4 Apr 2025 16:22:04 -0700 Subject: [PATCH 05/11] Fixing agent overall notebook --- .../Evaluate_Azure_AI_Agent_Quality.ipynb | 27 +++++++++---------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index 580232b4..09772b97 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -70,8 +70,7 @@ "metadata": {}, "outputs": [], "source": [ - "import os\n", - "import json\n", + "import os, json\n", "from azure.ai.projects import AIProjectClient\n", "from azure.identity import DefaultAzureCredential\n", "from azure.ai.projects.models import FunctionTool, ToolSet\n", @@ -86,7 +85,7 @@ "\n", "AGENT_NAME = \"Seattle Tourist Assistant\"\n", "\n", - "# Add Tools to be used by Agent\n", + "# Add Tools to be used by Agent \n", "functions = FunctionTool(user_functions)\n", "\n", "toolset = ToolSet()\n", @@ -230,6 +229,7 @@ "metadata": {}, "outputs": [], "source": [ + "import json\n", "from azure.ai.evaluation import AIAgentConverter\n", "\n", "# Initialize the converter that will be backed by the project.\n", @@ -267,12 +267,7 @@ "metadata": {}, "outputs": [], "source": [ - "from azure.ai.evaluation import (\n", - " ToolCallAccuracyEvaluator,\n", - " AzureOpenAIModelConfiguration,\n", - " IntentResolutionEvaluator,\n", - " TaskAdherenceEvaluator,\n", - ")\n", + "from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration, IntentResolutionEvaluator, TaskAdherenceEvaluator, ViolenceEvaluator\n", "from pprint import pprint\n", "\n", "model_config = AzureOpenAIModelConfiguration(\n", @@ -282,17 +277,19 @@ " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", ")\n", "# Needed to use content safety evaluators\n", - "azure_ai_project = {\n", + "azure_ai_project={\n", " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", "}\n", "\n", "\n", - "# select evaluators relevant for your agent workflow\n", + "# measures how well an agent identifies a user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities\n", "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n", + "\n", "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", - "task_adherence = TaskAdherenceEvaluator(model_config=model_config)" + "\n", + "task_adherence = TaskAdherenceEvaluator(model_config=model_config)\n" ] }, { @@ -321,7 +318,7 @@ " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", - " },\n", + " }\n", ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] @@ -344,7 +341,7 @@ "# alternatively, you can use the following to get the evaluation results in memory\n", "\n", "# average scores across all runs\n", - "pprint(response[\"metrics\"])" + "pprint(response['metrics'])" ] } ], @@ -363,6 +360,8 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" } }, "nbformat": 4, From 0a5f06f1c52ecb4db57f4aba8d868253416023b3 Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Fri, 4 Apr 2025 16:29:10 -0700 Subject: [PATCH 06/11] Fixing linting issues --- ...AI_Judge_Evaluator_Intent_Resolution.ipynb | 137 ++++----- .../AI_Judge_Evaluator_Task_Adherence.ipynb | 161 +++++------ ...I_Judge_Evaluator_Tool_Call_Accuracy.ipynb | 269 ++++++++---------- ...dge_Evaluators_Response_Completeness.ipynb | 31 +- .../Evaluate_Azure_AI_Agent_Quality.ipynb | 18 +- .../Agent_Evaluation/user_functions.py | 11 +- 6 files changed, 275 insertions(+), 352 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb index 87b7413f..bcf3eb57 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb @@ -104,10 +104,11 @@ "metadata": {}, "outputs": [], "source": [ - "#Success example. Intent is identified and understood and the response correctly resolves user intent\n", - "result = intent_resolution_evaluator(query=\"What are the opening hours of the Eiffel Tower?\",\n", - " response=\"Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.\",\n", - " )\n", + "# Success example. Intent is identified and understood and the response correctly resolves user intent\n", + "result = intent_resolution_evaluator(\n", + " query=\"What are the opening hours of the Eiffel Tower?\",\n", + " response=\"Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.\",\n", + ")\n", "pprint(result)" ] }, @@ -117,10 +118,11 @@ "metadata": {}, "outputs": [], "source": [ - "#Failure example. Even though intent is correctly identified, the response does not resolve the user intent\n", - "result = intent_resolution_evaluator(query=\"What is the opening hours of the Eiffel Tower?\",\n", - " response=\"Please check the official website for the up-to-date information on Eiffel Tower opening hours.\",\n", - " )\n", + "# Failure example. Even though intent is correctly identified, the response does not resolve the user intent\n", + "result = intent_resolution_evaluator(\n", + " query=\"What is the opening hours of the Eiffel Tower?\",\n", + " response=\"Please check the official website for the up-to-date information on Eiffel Tower opening hours.\",\n", + ")\n", "pprint(result)" ] }, @@ -138,20 +140,17 @@ "outputs": [], "source": [ "query = [\n", - " {\n", - " \"role\": \"system\",\n", - " \"content\": \"You are a friendly and helpful customer service agent.\"\n", - " },\n", + " {\"role\": \"system\", \"content\": \"You are a friendly and helpful customer service agent.\"},\n", " {\n", " \"createdAt\": \"2025-03-14T06:14:20Z\",\n", " \"role\": \"user\",\n", " \"content\": [\n", " {\n", " \"type\": \"text\",\n", - " \"text\": \"Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?\"\n", + " \"text\": \"Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?\",\n", " }\n", - " ]\n", - " }\n", + " ],\n", + " },\n", "]\n", "\n", "response = [\n", @@ -159,12 +158,7 @@ " \"createdAt\": \"2025-03-14T06:14:30Z\",\n", " \"run_id\": \"0\",\n", " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"text\",\n", - " \"text\": \"Hello! Let me quickly look up your account details.\"\n", - " }\n", - " ]\n", + " \"content\": [{\"type\": \"text\", \"text\": \"Hello! Let me quickly look up your account details.\"}],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:14:35Z\",\n", @@ -175,23 +169,16 @@ " \"type\": \"tool_call\",\n", " \"tool_call_id\": \"tool_call_20250310_001\",\n", " \"name\": \"get_orders\",\n", - " \"arguments\": {\n", - " \"account_number\": \"888\"\n", - " }\n", + " \"arguments\": {\"account_number\": \"888\"},\n", " }\n", - " ]\n", + " ],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:14:40Z\",\n", " \"run_id\": \"0\",\n", " \"tool_call_id\": \"tool_call_20250310_001\",\n", " \"role\": \"tool\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_result\",\n", - " \"tool_result\": \"[{ \\\"order_id\\\": \\\"123\\\" }, { \\\"order_id\\\": \\\"124\\\" }]\"\n", - " }\n", - " ]\n", + " \"content\": [{\"type\": \"tool_result\", \"tool_result\": '[{ \"order_id\": \"123\" }, { \"order_id\": \"124\" }]'}],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:14:45Z\",\n", @@ -200,9 +187,9 @@ " \"content\": [\n", " {\n", " \"type\": \"text\",\n", - " \"text\": \"Thanks for your patience. I see two orders on your account. Let me fetch the details for both.\"\n", + " \"text\": \"Thanks for your patience. I see two orders on your account. Let me fetch the details for both.\",\n", " }\n", - " ]\n", + " ],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:14:50Z\",\n", @@ -213,19 +200,15 @@ " \"type\": \"tool_call\",\n", " \"tool_call_id\": \"tool_call_20250310_002\",\n", " \"name\": \"get_order\",\n", - " \"arguments\": {\n", - " \"order_id\": \"123\"\n", - " }\n", + " \"arguments\": {\"order_id\": \"123\"},\n", " },\n", " {\n", " \"type\": \"tool_call\",\n", " \"tool_call_id\": \"tool_call_20250310_003\",\n", " \"name\": \"get_order\",\n", - " \"arguments\": {\n", - " \"order_id\": \"124\"\n", - " }\n", - " }\n", - " ]\n", + " \"arguments\": {\"order_id\": \"124\"},\n", + " },\n", + " ],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:14:55Z\",\n", @@ -235,9 +218,9 @@ " \"content\": [\n", " {\n", " \"type\": \"tool_result\",\n", - " \"tool_result\": \"{ \\\"order\\\": { \\\"id\\\": \\\"123\\\", \\\"status\\\": \\\"shipped\\\", \\\"delivery_date\\\": \\\"2025-03-15\\\" } }\"\n", + " \"tool_result\": '{ \"order\": { \"id\": \"123\", \"status\": \"shipped\", \"delivery_date\": \"2025-03-15\" } }',\n", " }\n", - " ]\n", + " ],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:15:00Z\",\n", @@ -247,9 +230,9 @@ " \"content\": [\n", " {\n", " \"type\": \"tool_result\",\n", - " \"tool_result\": \"{ \\\"order\\\": { \\\"id\\\": \\\"124\\\", \\\"status\\\": \\\"delayed\\\", \\\"expected_delivery\\\": \\\"2025-03-20\\\" } }\"\n", + " \"tool_result\": '{ \"order\": { \"id\": \"124\", \"status\": \"delayed\", \"expected_delivery\": \"2025-03-20\" } }',\n", " }\n", - " ]\n", + " ],\n", " },\n", " {\n", " \"createdAt\": \"2025-03-14T06:15:05Z\",\n", @@ -258,14 +241,14 @@ " \"content\": [\n", " {\n", " \"type\": \"text\",\n", - " \"text\": \"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?\"\n", + " \"text\": \"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?\",\n", " }\n", - " ]\n", - " }\n", + " ],\n", + " },\n", "]\n", "\n", - "#please note that the tool definitions are not strictly required, and that some of the tools below are not used in the example above and that is ok.\n", - "#if context length is a concern you can remove the unused tool definitions or even the tool definitions altogether as the impact to the intent resolution evaluation is usual minimal.\n", + "# please note that the tool definitions are not strictly required, and that some of the tools below are not used in the example above and that is ok.\n", + "# if context length is a concern you can remove the unused tool definitions or even the tool definitions altogether as the impact to the intent resolution evaluation is usual minimal.\n", "tool_definitions = [\n", " {\n", " \"name\": \"get_orders\",\n", @@ -273,38 +256,25 @@ " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", - " \"account_number\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The account number to get the orders for.\"\n", - " }\n", - " }\n", - " }\n", + " \"account_number\": {\"type\": \"string\", \"description\": \"The account number to get the orders for.\"}\n", + " },\n", + " },\n", " },\n", " {\n", " \"name\": \"get_order\",\n", " \"description\": \"Get the details of a specific order.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", - " \"properties\": {\n", - " \"order_id\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The order ID to get the details for.\"\n", - " }\n", - " }\n", - " }\n", + " \"properties\": {\"order_id\": {\"type\": \"string\", \"description\": \"The order ID to get the details for.\"}},\n", + " },\n", " },\n", " {\n", " \"name\": \"initiate_return\",\n", " \"description\": \"Initiate the return process for an order.\",\n", " \"parameters\": {\n", " \"type\": \"object\",\n", - " \"properties\": {\n", - " \"order_id\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The order ID for the return process.\"\n", - " }\n", - " }\n", - " }\n", + " \"properties\": {\"order_id\": {\"type\": \"string\", \"description\": \"The order ID for the return process.\"}},\n", + " },\n", " },\n", " {\n", " \"name\": \"update_shipping_address\",\n", @@ -312,23 +282,18 @@ " \"parameters\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", - " \"account_number\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The account number to update.\"\n", - " },\n", - " \"new_address\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The new shipping address.\"\n", - " }\n", - " }\n", - " }\n", - " }\n", + " \"account_number\": {\"type\": \"string\", \"description\": \"The account number to update.\"},\n", + " \"new_address\": {\"type\": \"string\", \"description\": \"The new shipping address.\"},\n", + " },\n", + " },\n", + " },\n", "]\n", "\n", - "result = intent_resolution_evaluator(query = query,\n", - " response = response,\n", - " tool_definitions = tool_definitions,\n", - " )\n", + "result = intent_resolution_evaluator(\n", + " query=query,\n", + " response=response,\n", + " tool_definitions=tool_definitions,\n", + ")\n", "pprint(result)" ] }, @@ -364,7 +329,7 @@ " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", - " }\n", + " },\n", ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb index e0ab3fcf..1748414f 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb @@ -126,10 +126,11 @@ "metadata": {}, "outputs": [], "source": [ - "#Failure example, there's only a vague adherence to the task\n", - "result = task_adherence_evaluator(query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", - " response=\"Make sure to water your roses regularly and trim them occasionally.\",\n", - " )\n", + "# Failure example, there's only a vague adherence to the task\n", + "result = task_adherence_evaluator(\n", + " query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", + " response=\"Make sure to water your roses regularly and trim them occasionally.\",\n", + ")\n", "pprint(result)" ] }, @@ -139,10 +140,11 @@ "metadata": {}, "outputs": [], "source": [ - "#Success example, full adherence to the task\n", - "result = task_adherence_evaluator(query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", - " response=\"For optimal summer care of your rose garden, start by watering deeply early in the morning to ensure the roots are well-hydrated without encouraging fungal growth. Apply a 2-3 inch layer of organic mulch around the base of the plants to conserve moisture and regulate soil temperature. Fertilize with a balanced rose fertilizer every 4–6 weeks to support healthy growth. Prune away any dead or diseased wood to promote good air circulation, and inspect regularly for pests such as aphids or spider mites, treating them promptly with an appropriate organic insecticidal soap. Finally, ensure that your roses receive at least 6 hours of direct sunlight daily for robust flowering.\",\n", - " )\n", + "# Success example, full adherence to the task\n", + "result = task_adherence_evaluator(\n", + " query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", + " response=\"For optimal summer care of your rose garden, start by watering deeply early in the morning to ensure the roots are well-hydrated without encouraging fungal growth. Apply a 2-3 inch layer of organic mulch around the base of the plants to conserve moisture and regulate soil temperature. Fertilize with a balanced rose fertilizer every 4–6 weeks to support healthy growth. Prune away any dead or diseased wood to promote good air circulation, and inspect regularly for pests such as aphids or spider mites, treating them promptly with an appropriate organic insecticidal soap. Finally, ensure that your roses receive at least 6 hours of direct sunlight daily for robust flowering.\",\n", + ")\n", "pprint(result)" ] }, @@ -160,90 +162,75 @@ "outputs": [], "source": [ "query = [\n", + " {\"role\": \"system\", \"content\": \"You are an expert in literature and at provid can provide book recommendations.\"},\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:00Z\",\n", + " \"role\": \"user\",\n", + " \"content\": [\n", + " {\"type\": \"text\", \"text\": \"I love historical fiction. Can you recommend a good book from that genre?\"}\n", + " ],\n", + " },\n", + "]\n", + "\n", + "response = [\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:05Z\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [{\"type\": \"text\", \"text\": \"Let me fetch a recommendation for historical fiction.\"}],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:10Z\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", " {\n", - " \"role\": \"system\",\n", - " \"content\": \"You are an expert in literature and at provid can provide book recommendations.\"\n", - " },\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"tool_call_20250314_001\",\n", + " \"name\": \"get_book\",\n", + " \"arguments\": {\"genre\": \"historical fiction\"},\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:15Z\",\n", + " \"role\": \"tool\",\n", + " \"tool_call_id\": \"tool_call_20250314_001\",\n", + " \"content\": [\n", " {\n", - " \"createdAt\": \"2025-03-14T08:00:00Z\",\n", - " \"role\": \"user\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"text\",\n", - " \"text\": \"I love historical fiction. Can you recommend a good book from that genre?\"\n", - " }\n", - " ]\n", + " \"type\": \"tool_result\",\n", + " \"tool_result\": '{ \"book\": { \"title\": \"The Pillars of the Earth\", \"author\": \"Ken Follett\", \"summary\": \"A captivating tale set in medieval England that weaves historical events with personal drama.\" } }',\n", " }\n", - " ]\n", - "\n", - "response = [\n", - " {\n", - " \"createdAt\": \"2025-03-14T08:00:05Z\",\n", - " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"text\",\n", - " \"text\": \"Let me fetch a recommendation for historical fiction.\"\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-14T08:00:10Z\",\n", - " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_call\",\n", - " \"tool_call_id\": \"tool_call_20250314_001\",\n", - " \"name\": \"get_book\",\n", - " \"arguments\": {\n", - " \"genre\": \"historical fiction\"\n", - " }\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-14T08:00:15Z\",\n", - " \"role\": \"tool\",\n", - " \"tool_call_id\": \"tool_call_20250314_001\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_result\",\n", - " \"tool_result\": \"{ \\\"book\\\": { \\\"title\\\": \\\"The Pillars of the Earth\\\", \\\"author\\\": \\\"Ken Follett\\\", \\\"summary\\\": \\\"A captivating tale set in medieval England that weaves historical events with personal drama.\\\" } }\"\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-14T08:00:20Z\",\n", - " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"text\",\n", - " \"text\": \"Based on our records, I recommend 'The Pillars of the Earth' by Ken Follett. This novel is an excellent example of historical fiction with a rich narrative and well-developed characters. Would you like more details or another suggestion?\"\n", - " }\n", - " ]\n", - " }\n", - " ]\n", + " ],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-14T08:00:20Z\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"Based on our records, I recommend 'The Pillars of the Earth' by Ken Follett. This novel is an excellent example of historical fiction with a rich narrative and well-developed characters. Would you like more details or another suggestion?\",\n", + " }\n", + " ],\n", + " },\n", + "]\n", "\n", "tool_definitions = [\n", - " {\n", - " \"name\": \"get_book\",\n", - " \"description\": \"Retrieve a book recommendation for a specified genre.\",\n", - " \"parameters\": {\n", - " \"type\": \"object\",\n", - " \"properties\": {\n", - " \"genre\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The genre for which a book recommendation is requested.\"\n", - " }\n", - " }\n", - " }\n", - " }\n", - " ]\n", + " {\n", + " \"name\": \"get_book\",\n", + " \"description\": \"Retrieve a book recommendation for a specified genre.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"genre\": {\"type\": \"string\", \"description\": \"The genre for which a book recommendation is requested.\"}\n", + " },\n", + " },\n", + " }\n", + "]\n", "\n", - "result = task_adherence_evaluator(query=query,\n", - " response=response,\n", - " tool_definitions=tool_definitions,\n", - " )\n", + "result = task_adherence_evaluator(\n", + " query=query,\n", + " response=response,\n", + " tool_definitions=tool_definitions,\n", + ")\n", "pprint(result)" ] }, @@ -279,7 +266,7 @@ " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", - " }\n", + " },\n", ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb index 5cc3edea..910dad7e 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb @@ -78,7 +78,7 @@ "outputs": [], "source": [ "import os\n", - "from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration\n", + "from azure.ai.evaluation import ToolCallAccuracyEvaluator, AzureOpenAIModelConfiguration\n", "from pprint import pprint\n", "\n", "model_config = AzureOpenAIModelConfiguration(\n", @@ -114,28 +114,21 @@ "source": [ "query = \"How is the weather in Seattle ?\"\n", "tool_call = {\n", - " \"type\": \"tool_call\",\n", - " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", - " \"name\": \"fetch_weather\",\n", - " \"arguments\": {\n", - " \"location\": \"Seattle\"\n", - " }\n", - " }\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\"location\": \"Seattle\"},\n", + "}\n", "\n", "tool_definition = {\n", - " \"id\": \"fetch_weather\",\n", - " \"name\": \"fetch_weather\",\n", - " \"description\": \"Fetches the weather information for the specified location.\",\n", - " \"parameters\": {\n", - " \"type\": \"object\",\n", - " \"properties\": {\n", - " \"location\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The location to fetch weather for.\"\n", - " }\n", - " }\n", - " }\n", - " }" + " \"id\": \"fetch_weather\",\n", + " \"name\": \"fetch_weather\",\n", + " \"description\": \"Fetches the weather information for the specified location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\"location\": {\"type\": \"string\", \"description\": \"The location to fetch weather for.\"}},\n", + " },\n", + "}" ] }, { @@ -162,37 +155,30 @@ "outputs": [], "source": [ "query = \"How is the weather in Seattle ?\"\n", - "tool_calls = [{\n", - " \"type\": \"tool_call\",\n", - " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", - " \"name\": \"fetch_weather\",\n", - " \"arguments\": {\n", - " \"location\": \"Seattle\"\n", - " }\n", - " },\n", - " {\n", - " \"type\": \"tool_call\",\n", - " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", - " \"name\": \"fetch_weather\",\n", - " \"arguments\": {\n", - " \"location\": \"London\"\n", - " }\n", - " }]\n", + "tool_calls = [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\"location\": \"Seattle\"},\n", + " },\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\"location\": \"London\"},\n", + " },\n", + "]\n", "\n", "tool_definition = {\n", - " \"id\": \"fetch_weather\",\n", - " \"name\": \"fetch_weather\",\n", - " \"description\": \"Fetches the weather information for the specified location.\",\n", - " \"parameters\": {\n", - " \"type\": \"object\",\n", - " \"properties\": {\n", - " \"location\": {\n", - " \"type\": \"string\",\n", - " \"description\": \"The location to fetch weather for.\"\n", - " }\n", - " }\n", - " }\n", - " }" + " \"id\": \"fetch_weather\",\n", + " \"name\": \"fetch_weather\",\n", + " \"description\": \"Fetches the weather information for the specified location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\"location\": {\"type\": \"string\", \"description\": \"The location to fetch weather for.\"}},\n", + " },\n", + "}" ] }, { @@ -221,114 +207,86 @@ "source": [ "query = \"Can you send me an email with weather information for Seattle?\"\n", "response = [\n", - " {\n", - " \"createdAt\": \"2025-03-26T17:27:35Z\",\n", - " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", - " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_call\",\n", - " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", - " \"name\": \"fetch_weather\",\n", - " \"arguments\": {\n", - " \"location\": \"Seattle\"\n", - " }\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-26T17:27:37Z\",\n", - " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", - " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", - " \"role\": \"tool\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_result\",\n", - " \"tool_result\": {\n", - " \"weather\": \"Rainy, 14\\u00b0C\"\n", - " }\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-26T17:27:38Z\",\n", - " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", - " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_call\",\n", - " \"tool_call_id\": \"call_iq9RuPxqzykebvACgX8pqRW2\",\n", - " \"name\": \"send_email\",\n", - " \"arguments\": {\n", - " \"recipient\": \"your_email@example.com\",\n", - " \"subject\": \"Weather Information for Seattle\",\n", - " \"body\": \"The current weather in Seattle is rainy with a temperature of 14\\u00b0C.\"\n", - " }\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-26T17:27:41Z\",\n", - " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", - " \"tool_call_id\": \"call_iq9RuPxqzykebvACgX8pqRW2\",\n", - " \"role\": \"tool\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"tool_result\",\n", - " \"tool_result\": {\n", - " \"message\": \"Email successfully sent to your_email@example.com.\"\n", - " }\n", - " }\n", - " ]\n", - " },\n", - " {\n", - " \"createdAt\": \"2025-03-26T17:27:42Z\",\n", - " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", - " \"role\": \"assistant\",\n", - " \"content\": [\n", - " {\n", - " \"type\": \"text\",\n", - " \"text\": \"I have successfully sent you an email with the weather information for Seattle. The current weather is rainy with a temperature of 14\\u00b0C.\"\n", - " }\n", - " ]\n", - " }\n", - " ]\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:35Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"name\": \"fetch_weather\",\n", + " \"arguments\": {\"location\": \"Seattle\"},\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:37Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"tool_call_id\": \"call_CUdbkBfvVBla2YP3p24uhElJ\",\n", + " \"role\": \"tool\",\n", + " \"content\": [{\"type\": \"tool_result\", \"tool_result\": {\"weather\": \"Rainy, 14\\u00b0C\"}}],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:38Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"tool_call\",\n", + " \"tool_call_id\": \"call_iq9RuPxqzykebvACgX8pqRW2\",\n", + " \"name\": \"send_email\",\n", + " \"arguments\": {\n", + " \"recipient\": \"your_email@example.com\",\n", + " \"subject\": \"Weather Information for Seattle\",\n", + " \"body\": \"The current weather in Seattle is rainy with a temperature of 14\\u00b0C.\",\n", + " },\n", + " }\n", + " ],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:41Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"tool_call_id\": \"call_iq9RuPxqzykebvACgX8pqRW2\",\n", + " \"role\": \"tool\",\n", + " \"content\": [\n", + " {\"type\": \"tool_result\", \"tool_result\": {\"message\": \"Email successfully sent to your_email@example.com.\"}}\n", + " ],\n", + " },\n", + " {\n", + " \"createdAt\": \"2025-03-26T17:27:42Z\",\n", + " \"run_id\": \"run_zblZyGCNyx6aOYTadmaqM4QN\",\n", + " \"role\": \"assistant\",\n", + " \"content\": [\n", + " {\n", + " \"type\": \"text\",\n", + " \"text\": \"I have successfully sent you an email with the weather information for Seattle. The current weather is rainy with a temperature of 14\\u00b0C.\",\n", + " }\n", + " ],\n", + " },\n", + "]\n", "\n", "tool_definitions = [\n", " {\n", - "\t\t\"name\": \"fetch_weather\",\n", - "\t\t\"description\": \"Fetches the weather information for the specified location.\",\n", - "\t\t\"parameters\": {\n", - "\t\t\t\"type\": \"object\",\n", - "\t\t\t\"properties\": {\n", - "\t\t\t\t\"location\": {\n", - "\t\t\t\t\t\"type\": \"string\",\n", - "\t\t\t\t\t\"description\": \"The location to fetch weather for.\"\n", - "\t\t\t\t}\n", - "\t\t\t}\n", - "\t\t}\n", - "\t},\n", + " \"name\": \"fetch_weather\",\n", + " \"description\": \"Fetches the weather information for the specified location.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\"location\": {\"type\": \"string\", \"description\": \"The location to fetch weather for.\"}},\n", + " },\n", + " },\n", " {\n", - "\t\t\"name\": \"send_email\",\n", - "\t\t\"description\": \"Sends an email with the specified subject and body to the recipient.\",\n", - "\t\t\"parameters\": {\n", - "\t\t\t\"type\": \"object\",\n", - "\t\t\t\"properties\": {\n", - "\t\t\t\t\"recipient\": {\n", - "\t\t\t\t\t\"type\": \"string\",\n", - "\t\t\t\t\t\"description\": \"Email address of the recipient.\"\n", - "\t\t\t\t},\n", - "\t\t\t\t\"subject\": {\n", - "\t\t\t\t\t\"type\": \"string\",\n", - "\t\t\t\t\t\"description\": \"Subject of the email.\"\n", - "\t\t\t\t},\n", - "\t\t\t\t\"body\": {\n", - "\t\t\t\t\t\"type\": \"string\",\n", - "\t\t\t\t\t\"description\": \"Body content of the email.\"\n", - "\t\t\t\t}\n", - "\t\t\t}\n", - "\t\t}\n", - "\t}\n", + " \"name\": \"send_email\",\n", + " \"description\": \"Sends an email with the specified subject and body to the recipient.\",\n", + " \"parameters\": {\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"recipient\": {\"type\": \"string\", \"description\": \"Email address of the recipient.\"},\n", + " \"subject\": {\"type\": \"string\", \"description\": \"Subject of the email.\"},\n", + " \"body\": {\"type\": \"string\", \"description\": \"Body content of the email.\"},\n", + " },\n", + " },\n", + " },\n", "]" ] }, @@ -359,6 +317,7 @@ "outputs": [], "source": [ "from azure.ai.evaluation import evaluate\n", + "\n", "# This sample files contains the evaluation data in JSONL format. Where each line is a run from agent.\n", "# This was saved using agent thread and converter.\n", "file_name = \"evaluation_data.jsonl\"\n", @@ -373,7 +332,7 @@ " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", - " }\n", + " },\n", ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb index a4377ea2..b13e0e23 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb @@ -72,7 +72,7 @@ "metadata": {}, "outputs": [], "source": [ - "from azure.ai.evaluation import ResponseCompletenessEvaluator , AzureOpenAIModelConfiguration\n", + "from azure.ai.evaluation import ResponseCompletenessEvaluator, AzureOpenAIModelConfiguration\n", "from pprint import pprint\n", "import os\n", "\n", @@ -81,7 +81,7 @@ " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", - ")\n" + ")" ] }, { @@ -90,7 +90,7 @@ "metadata": {}, "outputs": [], "source": [ - "from azure.ai.evaluation import ResponseCompletenessEvaluator , AzureOpenAIModelConfiguration\n", + "from azure.ai.evaluation import ResponseCompletenessEvaluator, AzureOpenAIModelConfiguration\n", "from pprint import pprint\n", "\n", "response_completeness_evaluator = ResponseCompletenessEvaluator(model_config=model_config)" @@ -116,10 +116,10 @@ "metadata": {}, "outputs": [], "source": [ - "# agent response is complete \n", + "# agent response is complete\n", "result = response_completeness_evaluator(\n", " response=\"Itinery: Day 1 check out the downtown district of the city on train; for Day 2, we can rest in hotel.\",\n", - " ground_truth=\"Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel.\"\n", + " ground_truth=\"Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel.\",\n", ")\n", "result" ] @@ -133,7 +133,7 @@ "# agent response is incomplete\n", "result = response_completeness_evaluator(\n", " response=\"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025.\",\n", - " ground_truth=\"The order with ID 124 is delayed and should now arrive by March 20, 2025.\"\n", + " ground_truth=\"The order with ID 124 is delayed and should now arrive by March 20, 2025.\",\n", ")\n", "result" ] @@ -159,8 +159,8 @@ " \"ground_truth\": \"The temperature of Seattle now is 50 degrees. It will be recommended to bring a jacket in the evening.\",\n", " },\n", " {\n", - " \"response\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback.\",\n", - " \"ground_truth\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback by EOD.\",\n", + " \"response\": 'The email draft \"Project Plan\" is attached. Please review and provide feedback.',\n", + " \"ground_truth\": 'The email draft \"Project Plan\" is attached. Please review and provide feedback by EOD.',\n", " },\n", " {\n", " \"response\": \"Based on the retrieved documents, the shareholder meeting discussed the operational efficiency of the company and financing options.\",\n", @@ -169,14 +169,14 @@ " {\n", " \"response\": \"The calendar API returns an error code 500. Please check the server logs for more details.\",\n", " \"ground_truth\": \"The meeting is scheduled for 2 PM tomorrow. Please confirm your availability by EOD.\",\n", - " }\n", + " },\n", "]\n", "\n", "file_path = \"response_completeness_data.jsonl\"\n", "\n", - "with open(file_path, 'w') as file:\n", + "with open(file_path, \"w\") as file:\n", " for line in data:\n", - " file.write(json.dumps(line) + '\\n')\n" + " file.write(json.dumps(line) + \"\\n\")" ] }, { @@ -195,13 +195,12 @@ "metadata": {}, "outputs": [], "source": [ - "\n", "from azure.ai.evaluation import evaluate\n", "\n", - "azure_ai_project={\n", - " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", - " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", - " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", + "azure_ai_project = {\n", + " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", + " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", + " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", "}\n", "\n", "response = evaluate(\n", diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index 09772b97..0a5a8cdf 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -85,7 +85,7 @@ "\n", "AGENT_NAME = \"Seattle Tourist Assistant\"\n", "\n", - "# Add Tools to be used by Agent \n", + "# Add Tools to be used by Agent\n", "functions = FunctionTool(user_functions)\n", "\n", "toolset = ToolSet()\n", @@ -267,7 +267,13 @@ "metadata": {}, "outputs": [], "source": [ - "from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration, IntentResolutionEvaluator, TaskAdherenceEvaluator, ViolenceEvaluator\n", + "from azure.ai.evaluation import (\n", + " ToolCallAccuracyEvaluator,\n", + " AzureOpenAIModelConfiguration,\n", + " IntentResolutionEvaluator,\n", + " TaskAdherenceEvaluator,\n", + " ViolenceEvaluator,\n", + ")\n", "from pprint import pprint\n", "\n", "model_config = AzureOpenAIModelConfiguration(\n", @@ -277,7 +283,7 @@ " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", ")\n", "# Needed to use content safety evaluators\n", - "azure_ai_project={\n", + "azure_ai_project = {\n", " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", @@ -289,7 +295,7 @@ "\n", "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", "\n", - "task_adherence = TaskAdherenceEvaluator(model_config=model_config)\n" + "task_adherence = TaskAdherenceEvaluator(model_config=model_config)" ] }, { @@ -318,7 +324,7 @@ " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", - " }\n", + " },\n", ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] @@ -341,7 +347,7 @@ "# alternatively, you can use the following to get the evaluation results in memory\n", "\n", "# average scores across all runs\n", - "pprint(response['metrics'])" + "pprint(response[\"metrics\"])" ] } ], diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py index 2f720de6..1b869500 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py @@ -41,11 +41,17 @@ def fetch_weather(location: str) -> str: """ # In a real-world scenario, you'd integrate with a weather API. # Here, we'll mock the response. - mock_weather_data = {"New York": "Sunny, 25°C", "London": "Cloudy, 18°C", "Tokyo": "Rainy, 22°C", "Seattle": "Rainy, 14°C"} + mock_weather_data = { + "New York": "Sunny, 25°C", + "London": "Cloudy, 18°C", + "Tokyo": "Rainy, 22°C", + "Seattle": "Rainy, 14°C", + } weather = mock_weather_data.get(location, "Weather data not available for this location.") weather_json = json.dumps({"weather": weather}) return weather_json + def opening_hours(tourist_destination: str) -> str: """ Fetches the opening hours of a tourist destination in Seattle. @@ -60,12 +66,13 @@ def opening_hours(tourist_destination: str) -> str: "Space Needle": "9 AM - 11 PM", "Pike Place Market": "9 AM - 6 PM", "Museum of Pop Culture": "10 AM - 5 PM", - "Seattle Aquarium": "9:30 AM - 6 PM" + "Seattle Aquarium": "9:30 AM - 6 PM", } opening_hours = mock_opening_hours_data.get(tourist_destination, "Opening hours not available for this location.") opening_hours_json = json.dumps({"opening_hours": opening_hours}) return opening_hours_json + def send_email(recipient: str, subject: str, body: str) -> str: """ Sends an email with the specified subject and body to the recipient. From ccad7a35c0aae5e453376effa0e6151651473919 Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Fri, 4 Apr 2025 16:56:03 -0700 Subject: [PATCH 07/11] Fixing issues --- ...AI_Judge_Evaluator_Intent_Resolution.ipynb | 7 ------- .../AI_Judge_Evaluator_Task_Adherence.ipynb | 2 +- .../Evaluate_Azure_AI_Agent_Quality.ipynb | 2 +- .../Agent_Evaluation/user_functions.py | 20 ++++++------------- 4 files changed, 8 insertions(+), 23 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb index bcf3eb57..4f72d75e 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb @@ -333,13 +333,6 @@ ")\n", "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb index 1748414f..11834a5c 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb @@ -143,7 +143,7 @@ "# Success example, full adherence to the task\n", "result = task_adherence_evaluator(\n", " query=\"What are the best practices for maintaining a healthy rose garden during the summer?\",\n", - " response=\"For optimal summer care of your rose garden, start by watering deeply early in the morning to ensure the roots are well-hydrated without encouraging fungal growth. Apply a 2-3 inch layer of organic mulch around the base of the plants to conserve moisture and regulate soil temperature. Fertilize with a balanced rose fertilizer every 4–6 weeks to support healthy growth. Prune away any dead or diseased wood to promote good air circulation, and inspect regularly for pests such as aphids or spider mites, treating them promptly with an appropriate organic insecticidal soap. Finally, ensure that your roses receive at least 6 hours of direct sunlight daily for robust flowering.\",\n", + " response=\"For optimal summer care of your rose garden, start by watering deeply early in the morning to ensure the roots are well-hydrated without encouraging fungal growth. Apply a 2-3 inch layer of organic mulch around the base of the plants to conserve moisture and regulate soil temperature. Fertilize with a balanced rose fertilizer every 4 to 6 weeks to support healthy growth. Prune away any dead or diseased wood to promote good air circulation, and inspect regularly for pests such as aphids or spider mites, treating them promptly with an appropriate organic insecticidal soap. Finally, ensure that your roses receive at least 6 hours of direct sunlight daily for robust flowering.\",\n", ")\n", "pprint(result)" ] diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index 0a5a8cdf..604ea7fc 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -290,7 +290,7 @@ "}\n", "\n", "\n", - "# measures how well an agent identifies a user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities\n", + "# measures how well an agent identifies a user’s request, including how well it scopes the users intent, asks clarifying questions, and reminds end users of its scope of capabilities\n", "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n", "\n", "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py index 1b869500..cac3f4f1 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/user_functions.py @@ -22,13 +22,9 @@ def fetch_current_datetime(format: Optional[str] = None) -> str: current_time = datetime.datetime.now() # Use the provided format if available, else use a default format - if format: - time_format = format - else: - time_format = "%Y-%m-%d %H:%M:%S" + time_format = format if format else "%Y-%m-%d %H:%M:%S" - time_json = json.dumps({"current_time": current_time.strftime(time_format)}) - return time_json + return json.dumps({"current_time": current_time.strftime(time_format)}) def fetch_weather(location: str) -> str: @@ -48,8 +44,7 @@ def fetch_weather(location: str) -> str: "Seattle": "Rainy, 14°C", } weather = mock_weather_data.get(location, "Weather data not available for this location.") - weather_json = json.dumps({"weather": weather}) - return weather_json + return json.dumps({"weather": weather}) def opening_hours(tourist_destination: str) -> str: @@ -69,8 +64,7 @@ def opening_hours(tourist_destination: str) -> str: "Seattle Aquarium": "9:30 AM - 6 PM", } opening_hours = mock_opening_hours_data.get(tourist_destination, "Opening hours not available for this location.") - opening_hours_json = json.dumps({"opening_hours": opening_hours}) - return opening_hours_json + return json.dumps({"opening_hours": opening_hours}) def send_email(recipient: str, subject: str, body: str) -> str: @@ -89,8 +83,7 @@ def send_email(recipient: str, subject: str, body: str) -> str: print(f"Subject: {subject}") print(f"Body:\n{body}") - message_json = json.dumps({"message": f"Email successfully sent to {recipient}."}) - return message_json + return json.dumps({"message": f"Email successfully sent to {recipient}."}) def send_email_using_recipient_name(recipient: str, subject: str, body: str) -> str: @@ -109,8 +102,7 @@ def send_email_using_recipient_name(recipient: str, subject: str, body: str) -> print(f"Subject: {subject}") print(f"Body:\n{body}") - message_json = json.dumps({"message": f"Email successfully sent to {recipient}."}) - return message_json + return json.dumps({"message": f"Email successfully sent to {recipient}."}) def calculate_sum(a: int, b: int) -> str: From 6e12d29e81483b453841faf96a663dbed4b19c7e Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Fri, 4 Apr 2025 17:18:25 -0700 Subject: [PATCH 08/11] Fixing ruff issues --- .../AI_Judge_Evaluators_Response_Completeness.ipynb | 5 +++-- .../Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb | 2 -- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb index b13e0e23..c1de2a38 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb @@ -152,6 +152,7 @@ "outputs": [], "source": [ "import json\n", + "from pathlib import Path\n", "\n", "data = [\n", " {\n", @@ -174,7 +175,7 @@ "\n", "file_path = \"response_completeness_data.jsonl\"\n", "\n", - "with open(file_path, \"w\") as file:\n", + "with Path.open(file_path, \"w\") as file:\n", " for line in data:\n", " file.write(json.dumps(line) + \"\\n\")" ] @@ -217,7 +218,7 @@ ], "metadata": { "kernelspec": { - "display_name": "test_agent_evaluator_prp", + "display_name": "test_agent_eval_sample", "language": "python", "name": "python3" }, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index 604ea7fc..9562f21e 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -289,8 +289,6 @@ " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", "}\n", "\n", - "\n", - "# measures how well an agent identifies a user’s request, including how well it scopes the users intent, asks clarifying questions, and reminds end users of its scope of capabilities\n", "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n", "\n", "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", From 21616dcdcb7c3b0f3f3ba58ded99f1bf7f87d09b Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Fri, 4 Apr 2025 17:28:20 -0700 Subject: [PATCH 09/11] Fixing nb clean issues --- .../AI_Judge_Evaluator_Intent_Resolution.ipynb | 3 +-- .../Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb | 3 +-- .../AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb | 3 +-- .../AI_Judge_Evaluators_Response_Completeness.ipynb | 3 +-- .../Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb | 3 +-- 5 files changed, 5 insertions(+), 10 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb index 4f72d75e..b86aab62 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb @@ -350,8 +350,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.9" + "pygments_lexer": "ipython3" } }, "nbformat": 4, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb index 11834a5c..8a1f9350 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Task_Adherence.ipynb @@ -287,8 +287,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.9" + "pygments_lexer": "ipython3" } }, "nbformat": 4, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb index 910dad7e..c43b1aca 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Tool_Call_Accuracy.ipynb @@ -353,8 +353,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.9" + "pygments_lexer": "ipython3" } }, "nbformat": 4, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb index c1de2a38..a41712d3 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb @@ -231,8 +231,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.9" + "pygments_lexer": "ipython3" } }, "nbformat": 4, diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index 9562f21e..ace73d22 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -364,8 +364,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.9" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From 7b8ef40a2404bb8a0613dd675ca30c5b18a32627 Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Mon, 7 Apr 2025 10:34:59 -0700 Subject: [PATCH 10/11] Removing old data file --- .../Agent_Evaluation/agent_messages.jsonl | 10 ---------- 1 file changed, 10 deletions(-) delete mode 100644 scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl deleted file mode 100644 index bc2ceb73..00000000 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/agent_messages.jsonl +++ /dev/null @@ -1,10 +0,0 @@ -{"messages": [{"role": "system", "createdAt": 1700000000, "content": "You are a friendly and helpful customer service agent."}, {"role": "user", "createdAt": 1700000060, "content": [{"type": "text", "text": "Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000070, "content": [{"type": "text", "text": "Hello! Let me quickly look up your account details."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000075, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_001", "type": "function", "function": {"name": "get_orders", "arguments": {"account_number": "888"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700000080, "tool_call_id": "tool_call_20250310_001", "content": [{"type": "tool_result", "tool_result": "[{ \"order_id\": \"123\" }, { \"order_id\": \"124\" }]"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000085, "content": [{"type": "text", "text": "Thanks for your patience. I see two orders on your account. Let me fetch the details for both."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_002", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "123"}}}}, {"type": "tool_call", "tool_call": {"id": "tool_call_20250310_003", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "124"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700000095, "tool_call_id": "tool_call_20250310_002", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"123\", \"status\": \"shipped\", \"delivery_date\": \"2025-03-15\" } }"}]}, {"run_id": 0, "role": "tool", "createdAt": 1700000100, "tool_call_id": "tool_call_20250310_003", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"124\", \"status\": \"delayed\", \"expected_delivery\": \"2025-03-20\" } }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700000105, "content": [{"type": "text", "text": "The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?"}]}, {"role": "user", "createdAt": 1700000200, "content": [{"type": "text", "text": "Yes, I need help with returning an item from order 124."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700000205, "content": [{"type": "text", "text": "I can assist you with the return process. Let me check the details of order 124 for you."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700000210, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_004", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "124"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700000215, "tool_call_id": "tool_call_20250310_004", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"124\", \"status\": \"delayed\", \"return_eligible\": true } }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700000220, "content": [{"type": "text", "text": "Order 124 is eligible for a return. I will initiate the return process. Please confirm if you want to proceed with the return."}]}, {"role": "user", "createdAt": 1700000225, "content": [{"type": "text", "text": "Yes, please proceed."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700000230, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_005", "type": "function", "function": {"name": "initiate_return", "arguments": {"order_id": "124"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700000235, "tool_call_id": "tool_call_20250310_005", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"return initiated\", \"return_id\": \"R789\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700000240, "content": [{"type": "text", "text": "Your return for order 124 has been initiated. Your return ID is R789. Is there anything else I can help you with?"}]}, {"role": "user", "createdAt": 1700000300, "content": [{"type": "text", "text": "Actually, I want to update my shipping address on my account."}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700000305, "content": [{"type": "text", "text": "Let me update your shipping address. Can you please provide your new address?"}]}, {"role": "user", "createdAt": 1700000310, "content": [{"type": "text", "text": "It's 123 New Street, Lisbon, Portugal."}]}, {"run_id": 4, "role": "assistant", "createdAt": 1700000315, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_006", "type": "function", "function": {"name": "update_shipping_address", "arguments": {"account_number": "888", "new_address": "123 New Street, Lisbon, Portugal"}}}}]}, {"run_id": 4, "role": "tool", "createdAt": 1700000320, "tool_call_id": "tool_call_20250310_006", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"address updated\" }"}]}, {"run_id": 4, "role": "assistant", "createdAt": 1700000325, "content": [{"type": "text", "text": "Your shipping address has been updated successfully. Is there anything else I can help you with?"}]}, {"role": "user", "createdAt": 1700000330, "content": [{"type": "text", "text": "No, that's all. Thank you for your help!"}]}, {"run_id": 5, "role": "assistant", "createdAt": 1700000335, "content": [{"type": "text", "text": "You're welcome! Have a great day."}]}], "tools": [{"id": "tool_123", "name": "get_orders", "description": "Get the list of orders for a given account number.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to get the orders for."}}}}, {"id": "tool_124", "name": "get_order", "description": "Get the details of a specific order.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "The order ID to get the details for."}}}}, {"id": "tool_125", "name": "initiate_return", "description": "Initiate the return process for an order.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "The order ID for the return process."}}}}, {"id": "tool_126", "name": "update_shipping_address", "description": "Update the shipping address for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to update."}, "new_address": {"type": "string", "description": "The new shipping address."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700010000, "content": "You are a knowledgeable banking support agent with expertise in transaction inquiries and fund management."}, {"role": "user", "createdAt": 1700010050, "content": [{"type": "text", "text": "Hello, I need a detailed update on my account #777. Please provide my current balance and recent transaction history."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700010055, "content": [{"type": "text", "text": "Hi there! I\u2019m checking your account details right now. One moment please."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700010060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_032", "type": "function", "function": {"name": "get_balance", "arguments": {"account_number": "777"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700010065, "tool_call_id": "tool_call_20250310_032", "content": [{"type": "tool_result", "tool_result": "{ \"balance\": \"$5,678.90\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700010070, "content": [{"type": "text", "text": "Your current balance is $5,678.90. Would you also like to see your last 10 transactions?"}]}, {"role": "user", "createdAt": 1700010080, "content": [{"type": "text", "text": "Yes, please provide the details."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700010085, "content": [{"type": "text", "text": "Retrieving your last 10 transactions now..."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700010090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_033", "type": "function", "function": {"name": "get_transactions", "arguments": {"account_number": "777", "limit": 10}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700010095, "tool_call_id": "tool_call_20250310_033", "content": [{"type": "tool_result", "tool_result": "[ {\"date\": \"2025-03-09\", \"description\": \"Grocery Store\", \"amount\": \"-$123.45\"}, {\"date\": \"2025-03-08\", \"description\": \"Salary Credit\", \"amount\": \"+$3,000.00\"}, {\"date\": \"2025-03-07\", \"description\": \"Online Shopping\", \"amount\": \"-$250.00\"}, {\"date\": \"2025-03-06\", \"description\": \"Restaurant\", \"amount\": \"-$75.00\"}, {\"date\": \"2025-03-05\", \"description\": \"Utility Bill\", \"amount\": \"-$150.00\"}, {\"date\": \"2025-03-04\", \"description\": \"Transfer Received\", \"amount\": \"+$500.00\"}, {\"date\": \"2025-03-03\", \"description\": \"ATM Withdrawal\", \"amount\": \"-$200.00\"}, {\"date\": \"2025-03-02\", \"description\": \"Fuel\", \"amount\": \"-$60.00\"}, {\"date\": \"2025-03-01\", \"description\": \"Coffee Shop\", \"amount\": \"-$15.00\"}, {\"date\": \"2025-02-28\", \"description\": \"Book Store\", \"amount\": \"-$30.00\"} ]"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700010100, "content": [{"type": "text", "text": "Here are your last 10 transactions. Would you like to perform any further actions, such as transferring funds or reviewing a specific transaction?"}]}, {"role": "user", "createdAt": 1700010110, "content": [{"type": "text", "text": "Actually, I\u2019d like to transfer $200 to account #888."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700010115, "content": [{"type": "text", "text": "Initiating a transfer of $200 to account #888 now. Please hold on."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700010120, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_034", "type": "function", "function": {"name": "transfer_funds", "arguments": {"from_account": "777", "to_account": "888", "amount": "200"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700010125, "tool_call_id": "tool_call_20250310_034", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"transfer successful\", \"transaction_id\": \"TX987\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700010130, "content": [{"type": "text", "text": "The transfer was successful. Your transaction ID is TX987. Is there anything else I can assist you with regarding your account?"}]}], "tools": [{"id": "tool_127", "name": "get_balance", "description": "Retrieve the account balance for a given account number.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to check the balance for."}}}}, {"id": "tool_138", "name": "get_transactions", "description": "Retrieve recent transactions for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number to retrieve transactions for."}, "limit": {"type": "integer", "description": "The number of transactions to retrieve."}}}}, {"id": "tool_139", "name": "transfer_funds", "description": "Transfer funds between accounts.", "parameters": {"type": "object", "properties": {"from_account": {"type": "string", "description": "The source account number."}, "to_account": {"type": "string", "description": "The destination account number."}, "amount": {"type": "string", "description": "The amount to transfer."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700020000, "content": "You are a technical support agent specializing in connectivity and device troubleshooting."}, {"role": "user", "createdAt": 1700020050, "content": [{"type": "text", "text": "My home WiFi has been dropping intermittently and my laptop won\u2019t reconnect automatically. What can I do?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700020055, "content": [{"type": "text", "text": "Hello, I understand how frustrating connectivity issues can be. Let\u2019s run through some troubleshooting steps. First, I\u2019ll check the status of your router remotely."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700020060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_040", "type": "function", "function": {"name": "check_router_status", "arguments": {"device_id": "router_002"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700020065, "tool_call_id": "tool_call_20250310_040", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"online\", \"firmware_version\": \"1.2.3\", \"signal\": \"moderate\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700020070, "content": [{"type": "text", "text": "Your router appears to be online, though the signal strength is moderate. I recommend rebooting your router. Would you like me to initiate a remote reboot?"}]}, {"role": "user", "createdAt": 1700020080, "content": [{"type": "text", "text": "Yes, please reboot it and let me know if that fixes the problem."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700020085, "content": [{"type": "text", "text": "Rebooting your router now."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700020090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_041", "type": "function", "function": {"name": "reboot_router", "arguments": {"device_id": "router_002"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700020095, "tool_call_id": "tool_call_20250310_041", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"rebooted\", \"reboot_time\": \"2025-03-10T02:00:00Z\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700020100, "content": [{"type": "text", "text": "Your router has been rebooted. Please try reconnecting your laptop. Also, I suggest updating your WiFi adapter drivers if the issue persists."}]}, {"role": "user", "createdAt": 1700020150, "content": [{"type": "text", "text": "The connection seems stable now, but my laptop still sometimes fails to reconnect automatically. Can you assist with that?"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700020155, "content": [{"type": "text", "text": "Certainly. I\u2019ll check if there are any firmware updates available for your laptop\u2019s WiFi adapter."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700020160, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_042", "type": "function", "function": {"name": "check_adapter_firmware", "arguments": {"device_id": "laptop_wifi_001"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700020165, "tool_call_id": "tool_call_20250310_042", "content": [{"type": "tool_result", "tool_result": "{ \"firmware_update\": \"available\", \"version\": \"2.0.1\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700020170, "content": [{"type": "text", "text": "A firmware update (version 2.0.1) is available for your WiFi adapter. I recommend installing it. Would you like instructions on how to update?"}]}, {"role": "user", "createdAt": 1700020180, "content": [{"type": "text", "text": "Yes, please provide detailed update instructions."}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700020185, "content": [{"type": "text", "text": "Sure. First, visit the manufacturer\u2019s support website, download the latest driver, and then run the installer as an administrator. After installation, restart your laptop. Please let me know if you need any more help."}]}], "tools": [{"id": "tool_140", "name": "check_router_status", "description": "Check the status of a router device.", "parameters": {"type": "object", "properties": {"device_id": {"type": "string", "description": "The ID of the router to check."}}}}, {"id": "tool_141", "name": "reboot_router", "description": "Reboot the specified router.", "parameters": {"type": "object", "properties": {"device_id": {"type": "string", "description": "The ID of the router to reboot."}}}}, {"id": "tool_142", "name": "check_adapter_firmware", "description": "Check if a firmware update is available for a WiFi adapter.", "parameters": {"type": "object", "properties": {"device_id": {"type": "string", "description": "The ID of the WiFi adapter."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700030000, "content": "You are a subscription support agent. Your role is to help customers with subscription cancellations, renewals, and upgrades."}, {"role": "user", "createdAt": 1700030050, "content": [{"type": "text", "text": "Hi, I want to cancel my premium magazine subscription. I've been thinking about switching to a digital-only option. Can you help me with the cancellation and also provide details on the digital plan?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700030055, "content": [{"type": "text", "text": "I\u2019m sorry to hear you\u2019re considering cancellation. I\u2019ll first process your cancellation request for the premium subscription and then provide you with details about our digital plan."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700030060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_050", "type": "function", "function": {"name": "cancel_subscription", "arguments": {"subscription_id": "PREMIUM789"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700030065, "tool_call_id": "tool_call_20250310_050", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"cancelled\", \"refund\": \"$10.00 credited\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700030070, "content": [{"type": "text", "text": "Your premium subscription has been cancelled and a refund of $10.00 has been credited. Now, regarding our digital plan, it offers full access to all online content at a reduced rate. Would you like me to send you more detailed information about the digital plan and how to subscribe?"}]}, {"role": "user", "createdAt": 1700030080, "content": [{"type": "text", "text": "Yes, please send me all the details including pricing and any special offers."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700030085, "content": [{"type": "text", "text": "Our digital plan costs $4.99 per month, and if you subscribe within the next 7 days, you'll get an additional 10% discount for the first three months. It includes access to digital issues, exclusive online articles, and interactive content. Would you like to subscribe to the digital plan now?"}]}, {"role": "user", "createdAt": 1700030090, "content": [{"type": "text", "text": "I\u2019m interested. Please sign me up for the digital plan."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700030095, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_051", "type": "function", "function": {"name": "subscribe_digital", "arguments": {"account_number": "555", "plan": "digital", "discount": "10%"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700030100, "tool_call_id": "tool_call_20250310_051", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"subscribed\", \"subscription_id\": \"DIGI123\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700030105, "content": [{"type": "text", "text": "You are now subscribed to the digital plan. Your subscription ID is DIGI123. Is there anything else I can help you with today?"}]}], "tools": [{"id": "tool_130", "name": "cancel_subscription", "description": "Cancel a subscription given its ID.", "parameters": {"type": "object", "properties": {"subscription_id": {"type": "string", "description": "The subscription ID to cancel."}}}}, {"id": "tool_131", "name": "subscribe_digital", "description": "Subscribe to a digital plan.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number of the customer."}, "plan": {"type": "string", "description": "The subscription plan to subscribe to."}, "discount": {"type": "string", "description": "Any discount applied to the subscription."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700040000, "content": "You are an appointment scheduling assistant for a car service center."}, {"role": "user", "createdAt": 1700040050, "content": [{"type": "text", "text": "Good morning, I need to book an appointment for my car service. My preferred date is 2025-04-05 at 09:30 AM, but I\u2019m flexible if that slot isn\u2019t available."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700040055, "content": [{"type": "text", "text": "Good morning! I\u2019ll check our available slots for that day and get back to you shortly. Please hold on a moment."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700040060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_060", "type": "function", "function": {"name": "check_availability", "arguments": {"service": "car_service", "date": "2025-04-05", "preferred_time": "09:30 AM"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700040065, "tool_call_id": "tool_call_20250310_060", "content": [{"type": "tool_result", "tool_result": "{ \"available_slots\": [\"09:30 AM\", \"10:00 AM\", \"10:30 AM\"] }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700040070, "content": [{"type": "text", "text": "We have several slots available on April 5th: 09:30 AM, 10:00 AM, and 10:30 AM. Which one would you prefer?"}]}, {"role": "user", "createdAt": 1700040080, "content": [{"type": "text", "text": "I\u2019d like the 10:00 AM slot, please."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700040085, "content": [{"type": "text", "text": "Great choice! I am now scheduling your appointment for car service on April 5, 2025 at 10:00 AM."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700040090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_061", "type": "function", "function": {"name": "schedule_appointment", "arguments": {"service": "car_service", "date": "2025-04-05", "time": "10:00 AM", "account_number": "777"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700040095, "tool_call_id": "tool_call_20250310_061", "content": [{"type": "tool_result", "tool_result": "{ \"appointment_id\": \"APT456\", \"status\": \"scheduled\", \"confirmation\": \"Your appointment is confirmed for 10:00 AM on 2025-04-05.\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700040100, "content": [{"type": "text", "text": "Your appointment is confirmed. Your appointment ID is APT456. Would you like me to send you a reminder email or SMS?"}]}, {"role": "user", "createdAt": 1700040110, "content": [{"type": "text", "text": "Yes, please send me an email reminder."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700040115, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_062", "type": "function", "function": {"name": "send_reminder", "arguments": {"appointment_id": "APT456", "method": "email", "email": "user@example.com"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700040120, "tool_call_id": "tool_call_20250310_062", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"reminder sent\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700040125, "content": [{"type": "text", "text": "Your email reminder has been sent. Is there anything else I can help you with regarding your appointment?"}]}], "tools": [{"id": "tool_131", "name": "check_availability", "description": "Check available appointment slots for a given service and date.", "parameters": {"type": "object", "properties": {"service": {"type": "string", "description": "The service type."}, "date": {"type": "string", "description": "The desired appointment date."}, "preferred_time": {"type": "string", "description": "The preferred appointment time."}}}}, {"id": "tool_132", "name": "schedule_appointment", "description": "Schedule an appointment for a specified service.", "parameters": {"type": "object", "properties": {"service": {"type": "string", "description": "The type of service."}, "date": {"type": "string", "description": "The date of the appointment."}, "time": {"type": "string", "description": "The time of the appointment."}, "account_number": {"type": "string", "description": "Customer account number."}}}}, {"id": "tool_133", "name": "send_reminder", "description": "Send a reminder for an appointment.", "parameters": {"type": "object", "properties": {"appointment_id": {"type": "string", "description": "The appointment ID."}, "method": {"type": "string", "description": "The reminder method (email or SMS)."}, "email": {"type": "string", "description": "Email address if method is email."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700050000, "content": "You are a travel support agent handling flight bookings and modifications."}, {"role": "user", "createdAt": 1700050050, "content": [{"type": "text", "text": "I booked flight AB123 a few days ago. Can you check its status and help me with a few modifications?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700050055, "content": [{"type": "text", "text": "Certainly, let me pull up your flight details for AB123."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700050060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_070", "type": "function", "function": {"name": "get_flight_status", "arguments": {"flight_number": "AB123"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700050065, "tool_call_id": "tool_call_20250310_070", "content": [{"type": "tool_result", "tool_result": "{ \"flight\": \"AB123\", \"status\": \"on time\", \"departure\": \"2025-04-01T14:00:00Z\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700050070, "content": [{"type": "text", "text": "Your flight AB123 is currently on time and is scheduled to depart at 2:00 PM UTC. What modifications would you like to make?"}]}, {"role": "user", "createdAt": 1700050080, "content": [{"type": "text", "text": "I want to change my seat to an aisle seat, and also add an extra baggage allowance."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050085, "content": [{"type": "text", "text": "Okay, I will update your seat preference to an aisle seat and request an extra baggage allowance. Please hold on."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_071", "type": "function", "function": {"name": "update_seat", "arguments": {"flight_number": "AB123", "seat": "aisle"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700050095, "tool_call_id": "tool_call_20250310_071", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"seat updated\", \"seat\": \"aisle\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050100, "content": [{"type": "text", "text": "Next, I will add the extra baggage allowance to your booking."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050105, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_072", "type": "function", "function": {"name": "add_baggage", "arguments": {"flight_number": "AB123", "baggage": "extra"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700050110, "tool_call_id": "tool_call_20250310_072", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"baggage updated\", \"allowance\": \"extra\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700050115, "content": [{"type": "text", "text": "Your seat preference and extra baggage allowance have been updated. Is there anything else you\u2019d like to adjust for your flight booking?"}]}, {"role": "user", "createdAt": 1700050120, "content": [{"type": "text", "text": "No, that\u2019s all. Thank you for your help."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700050125, "content": [{"type": "text", "text": "You're welcome! Have a safe and pleasant journey."}]}], "tools": [{"id": "tool_140", "name": "get_flight_status", "description": "Retrieve the status of a flight.", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string", "description": "Flight number to check."}}}}, {"id": "tool_141", "name": "update_seat", "description": "Update the seat preference for a flight booking.", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string", "description": "Flight number."}, "seat": {"type": "string", "description": "Desired seat type."}}}}, {"id": "tool_142", "name": "add_baggage", "description": "Add extra baggage allowance to a flight booking.", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string", "description": "Flight number."}, "baggage": {"type": "string", "description": "Type of baggage update."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700060000, "content": "You are a product information specialist with detailed knowledge about electronics."}, {"role": "user", "createdAt": 1700060050, "content": [{"type": "text", "text": "I\u2019m considering buying the new X100 smartphone. Can you give me a detailed breakdown of its features, warranty, and how it compares to the previous model?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700060055, "content": [{"type": "text", "text": "Certainly! I will retrieve all the technical specifications, warranty details, and comparison data for the X100 smartphone."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700060060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_080", "type": "function", "function": {"name": "get_product_details", "arguments": {"product_id": "X100"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700060065, "tool_call_id": "tool_call_20250310_080", "content": [{"type": "tool_result", "tool_result": "{ \"product_id\": \"X100\", \"features\": [\"6.5-inch display\", \"108MP camera\", \"5G support\", \"5000mAh battery\", \"AI-enhanced photography\"], \"warranty\": \"2 years\", \"price\": \"$999\" }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700060070, "content": [{"type": "text", "text": "The X100 smartphone features a 6.5-inch display, a 108MP camera, 5G connectivity, and a robust 5000mAh battery. It comes with a 2-year warranty and is priced at $999. Compared to the previous model, the improvements include a higher resolution camera and faster processing speeds. Would you like additional details on any specific feature?"}]}, {"role": "user", "createdAt": 1700060080, "content": [{"type": "text", "text": "Yes, please tell me more about the camera improvements and any available promotions."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700060085, "content": [{"type": "text", "text": "The X100\u2019s 108MP camera uses advanced AI algorithms for better low-light performance and faster image processing. Additionally, there is a promotional offer providing a free wireless charger with purchase until the end of the month. Do you need assistance with placing an order or more technical specs?"}]}, {"role": "user", "createdAt": 1700060090, "content": [{"type": "text", "text": "That sounds great. I\u2019ll think about it. Can you email me the full spec sheet?"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700060095, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_082", "type": "function", "function": {"name": "send_product_spec", "arguments": {"product_id": "X100", "email": "user@example.com"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700060100, "tool_call_id": "tool_call_20250310_082", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"spec sheet sent\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700060105, "content": [{"type": "text", "text": "The full specification sheet has been sent to your email. Let me know if you have any further questions."}]}], "tools": [{"id": "tool_134", "name": "get_product_details", "description": "Retrieve detailed information for a given product.", "parameters": {"type": "object", "properties": {"product_id": {"type": "string", "description": "The ID of the product."}}}}, {"id": "tool_135", "name": "send_product_spec", "description": "Send the product specification sheet to the specified email.", "parameters": {"type": "object", "properties": {"product_id": {"type": "string", "description": "The ID of the product."}, "email": {"type": "string", "description": "The email address to send the spec sheet to."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700070000, "content": "You are a billing support agent with a focus on resolving disputes and processing refunds."}, {"role": "user", "createdAt": 1700070050, "content": [{"type": "text", "text": "I\u2019ve noticed several discrepancies on my bill for account #777. There\u2019s a $50 charge for a service fee and a $30 charge I don\u2019t recognize. Can you review my billing details and help me dispute these charges?"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700070055, "content": [{"type": "text", "text": "I understand your concern. I\u2019ll review your billing details now and identify the charges in question."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700070060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_090", "type": "function", "function": {"name": "get_billing_details", "arguments": {"account_number": "777"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700070065, "tool_call_id": "tool_call_20250310_090", "content": [{"type": "tool_result", "tool_result": "{ \"billing\": { \"charges\": [{ \"item\": \"service fee\", \"amount\": \"$50\" }, { \"item\": \"mystery charge\", \"amount\": \"$30\" }, { \"item\": \"subscription\", \"amount\": \"$15\" }] } }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700070070, "content": [{"type": "text", "text": "I see the $50 service fee and the $30 mystery charge. I can initiate a dispute for these two charges. Would you like me to process refunds for both, or would you like further clarification first?"}]}, {"role": "user", "createdAt": 1700070080, "content": [{"type": "text", "text": "Please process refunds for both charges."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700070085, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_091", "type": "function", "function": {"name": "initiate_refund", "arguments": {"account_number": "777", "charges": ["$50", "$30"]}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700070090, "tool_call_id": "tool_call_20250310_091", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"refunds initiated\", \"refund_ids\": [\"RF555\", \"RF556\"] }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700070095, "content": [{"type": "text", "text": "Refunds have been initiated for the disputed charges. Your refund IDs are RF555 and RF556. Please allow 3-5 business days for processing. Is there anything else you need help with regarding your billing?"}]}], "tools": [{"id": "tool_136", "name": "get_billing_details", "description": "Retrieve billing details for a given account number.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "Account number to check billing details for."}}}}, {"id": "tool_137", "name": "initiate_refund", "description": "Initiate refunds for disputed charges.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "Account number associated with the charges."}, "charges": {"type": "array", "description": "List of charge amounts to refund."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700080000, "content": "You are a returns support agent specializing in faulty or damaged product returns."}, {"role": "user", "createdAt": 1700080050, "content": [{"type": "text", "text": "I received my order #555 last week, but the product (Widget Pro) is defective. It keeps malfunctioning and shows error codes. I\u2019d like to return it and get either a replacement or a refund."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700080055, "content": [{"type": "text", "text": "I\u2019m sorry to hear about the issue with your Widget Pro. Let me review your order details and check if it\u2019s eligible for return."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700080060, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_100", "type": "function", "function": {"name": "get_order", "arguments": {"order_id": "555"}}}}]}, {"run_id": 0, "role": "tool", "createdAt": 1700080065, "tool_call_id": "tool_call_20250310_100", "content": [{"type": "tool_result", "tool_result": "{ \"order\": { \"id\": \"555\", \"status\": \"delivered\", \"item\": \"Widget Pro\", \"defective\": true, \"purchase_date\": \"2025-03-03\" } }"}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700080070, "content": [{"type": "text", "text": "I have verified that your Widget Pro is defective and eligible for return. Would you prefer a replacement or a refund?"}]}, {"role": "user", "createdAt": 1700080080, "content": [{"type": "text", "text": "I\u2019d like a refund, please. Also, please guide me through the return shipping process."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700080085, "content": [{"type": "text", "text": "Understood. I will initiate the refund process and send you the return shipping label via email."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700080090, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_101", "type": "function", "function": {"name": "initiate_return", "arguments": {"order_id": "555", "return_type": "refund"}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700080095, "tool_call_id": "tool_call_20250310_101", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"return initiated\", \"return_id\": \"R7890\", \"shipping_label_url\": \"http://example.com/label/R7890\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700080100, "content": [{"type": "text", "text": "Your return has been initiated and your refund will be processed once we receive the product. A shipping label has been sent to your email. Please print it and attach it to the return package. Do you need further assistance?"}]}, {"role": "user", "createdAt": 1700080110, "content": [{"type": "text", "text": "No, that covers everything. Thank you for your help."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700080115, "content": [{"type": "text", "text": "You're welcome! If you have any more issues in the future, feel free to reach out."}]}], "tools": [{"id": "tool_138", "name": "get_order", "description": "Retrieve order details for a given order ID.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "Order ID to retrieve details for."}}}}, {"id": "tool_139", "name": "initiate_return", "description": "Initiate the return process for an order, specifying the type of return.", "parameters": {"type": "object", "properties": {"order_id": {"type": "string", "description": "Order ID for the return process."}, "return_type": {"type": "string", "description": "Type of return (refund or replacement)."}}}}]} -{"messages": [{"role": "system", "createdAt": 1700090000, "content": "You are an account settings support agent with expertise in updating personal details and security configurations."}, {"role": "user", "createdAt": 1700090050, "content": [{"type": "text", "text": "Hi, I need to update my personal details including my phone number, email, and also change my password. I\u2019d like to ensure that my security questions are also updated."}]}, {"run_id": 0, "role": "assistant", "createdAt": 1700090055, "content": [{"type": "text", "text": "Certainly. Let\u2019s start by updating your contact details. Could you please provide your new phone number and email address?"}]}, {"role": "user", "createdAt": 1700090060, "content": [{"type": "text", "text": "My new phone number is 123-456-7890 and my new email is newemail@example.com."}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700090065, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_110", "type": "function", "function": {"name": "update_contact_details", "arguments": {"account_number": "777", "new_details": {"phone": "123-456-7890", "email": "newemail@example.com"}}}}}]}, {"run_id": 1, "role": "tool", "createdAt": 1700090070, "tool_call_id": "tool_call_20250310_110", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"contact details updated\" }"}]}, {"run_id": 1, "role": "assistant", "createdAt": 1700090075, "content": [{"type": "text", "text": "Your contact details have been updated. Now, please provide your current password and the new password you wish to use."}]}, {"role": "user", "createdAt": 1700090080, "content": [{"type": "text", "text": "My current password is OldPass123 and I\u2019d like to change it to NewSecurePass456."}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700090085, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_111", "type": "function", "function": {"name": "change_password", "arguments": {"account_number": "777", "current_password": "OldPass123", "new_password": "NewSecurePass456"}}}}]}, {"run_id": 2, "role": "tool", "createdAt": 1700090090, "tool_call_id": "tool_call_20250310_111", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"password changed\" }"}]}, {"run_id": 2, "role": "assistant", "createdAt": 1700090095, "content": [{"type": "text", "text": "Lastly, let\u2019s update your security questions. Please provide a new question and answer."}]}, {"role": "user", "createdAt": 1700090100, "content": [{"type": "text", "text": "My new security question is: 'What is the name of your first pet?' and the answer is 'Fluffy'."}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700090105, "content": [{"type": "tool_call", "tool_call": {"id": "tool_call_20250310_112", "type": "function", "function": {"name": "update_security_question", "arguments": {"account_number": "777", "security_question": "What is the name of your first pet?", "security_answer": "Fluffy"}}}}]}, {"run_id": 3, "role": "tool", "createdAt": 1700090110, "tool_call_id": "tool_call_20250310_112", "content": [{"type": "tool_result", "tool_result": "{ \"status\": \"security question updated\" }"}]}, {"run_id": 3, "role": "assistant", "createdAt": 1700090115, "content": [{"type": "text", "text": "Your account settings have been fully updated. If you need any more changes, feel free to ask."}]}, {"role": "user", "createdAt": 1700090120, "content": [{"type": "text", "text": "Thank you very much for your assistance."}]}, {"run_id": 4, "role": "assistant", "createdAt": 1700090125, "content": [{"type": "text", "text": "You're welcome! Have a great day."}]}], "tools": [{"id": "tool_140", "name": "update_contact_details", "description": "Update the contact details (phone, email) for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number."}, "new_details": {"type": "object", "description": "New contact details."}}}}, {"id": "tool_141", "name": "change_password", "description": "Change the password for a given account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number."}, "current_password": {"type": "string", "description": "The current password."}, "new_password": {"type": "string", "description": "The new password."}}}}, {"id": "tool_142", "name": "update_security_question", "description": "Update the security question and answer for an account.", "parameters": {"type": "object", "properties": {"account_number": {"type": "string", "description": "The account number."}, "security_question": {"type": "string", "description": "The new security question."}, "security_answer": {"type": "string", "description": "The answer to the security question."}}}}]} \ No newline at end of file From 956f6de9657e2377e7e129d3e98ac4f4dec5ac06 Mon Sep 17 00:00:00 2001 From: Ankit Singhal Date: Mon, 7 Apr 2025 11:28:16 -0700 Subject: [PATCH 11/11] Running pre commit hools --- .../AI_Judge_Evaluator_Intent_Resolution.ipynb | 1 - .../AI_Judge_Evaluators_Response_Completeness.ipynb | 3 +-- .../Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb | 4 +--- 3 files changed, 2 insertions(+), 6 deletions(-) diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb index b86aab62..2dcd8d01 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluator_Intent_Resolution.ipynb @@ -70,7 +70,6 @@ "source": [ "import os\n", "from azure.ai.evaluation import AzureOpenAIModelConfiguration\n", - "from azure.identity import DefaultAzureCredential\n", "from azure.ai.evaluation import IntentResolutionEvaluator\n", "from pprint import pprint\n", "\n", diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb index a41712d3..8139a32f 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/AI_Judge_Evaluators_Response_Completeness.ipynb @@ -90,8 +90,7 @@ "metadata": {}, "outputs": [], "source": [ - "from azure.ai.evaluation import ResponseCompletenessEvaluator, AzureOpenAIModelConfiguration\n", - "from pprint import pprint\n", + "from azure.ai.evaluation import AzureOpenAIModelConfiguration\n", "\n", "response_completeness_evaluator = ResponseCompletenessEvaluator(model_config=model_config)" ] diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb index ace73d22..2cb0c29d 100644 --- a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb +++ b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_Azure_AI_Agent_Quality.ipynb @@ -70,7 +70,7 @@ "metadata": {}, "outputs": [], "source": [ - "import os, json\n", + "import os\n", "from azure.ai.projects import AIProjectClient\n", "from azure.identity import DefaultAzureCredential\n", "from azure.ai.projects.models import FunctionTool, ToolSet\n", @@ -229,7 +229,6 @@ "metadata": {}, "outputs": [], "source": [ - "import json\n", "from azure.ai.evaluation import AIAgentConverter\n", "\n", "# Initialize the converter that will be backed by the project.\n", @@ -272,7 +271,6 @@ " AzureOpenAIModelConfiguration,\n", " IntentResolutionEvaluator,\n", " TaskAdherenceEvaluator,\n", - " ViolenceEvaluator,\n", ")\n", "from pprint import pprint\n", "\n",