Add SK AzureAIAgent notebook (#274)

ahibrahimm · web-flow · commit 214c8314a5cd · 2025-08-11T17:38:37.000Z
* add notebook

* fix unused arg

* clean nb
diff --git a/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_SK_Azure_AI_Agent.ipynb b/scenarios/evaluate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_SK_Azure_AI_Agent.ipynb
@@ -0,0 +1,312 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "bf5280e2",
+   "metadata": {},
+   "source": [
+    "# Evaluate Semantic Kernel Azure AI Agents in Azure AI Foundry"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0330c099",
+   "metadata": {},
+   "source": [
+    "## Objective\n",
+    "\n",
+    "This sample demonstrates how to evaluate an AI agent (Azure AI Agent Service) on these important aspects of your agentic workflow:\n",
+    "\n",
+    "- Intent Resolution: Measures how well the agent identifies the user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities.\n",
+    "- Tool Call Accuracy: Evaluates the agent's ability to select the appropriate tools, and process correct parameters from previous steps.\n",
+    "- Task Adherence: Measures how well the agent’s response adheres to its assigned tasks, according to its system message and prior steps."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b364c694",
+   "metadata": {},
+   "source": [
+    "## Time\n",
+    "You can expect to complete this sample in approximately 20 minutes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbf5ecbb",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "### Packages\n",
+    "- `semantic-kernel` installed (`pip install semantic-kernel`)\n",
+    "- `azure-ai-evaluation` SDK installed\n",
+    "\n",
+    "Before running the sample:\n",
+    "```bash\n",
+    "pip install semantic-kernel azure-ai-projects azure-identity azure-ai-evaluation\n",
+    "```\n",
+    "\n",
+    "### Azure Resources\n",
+    "- An Azure OpenAI resource with a deployment configured\n",
+    "- An Azure AI Foundry project\n",
+    "\n",
+    "### Environment Variables\n",
+    "\n",
+    "- For **Foundry Agent service**:\n",
+    "  - **`AZURE_AI_AGENT_ENDPOINT`** – Endpoint of your Azure AI Foundry project.\n",
+    "  - **`AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME`** – Deployment name of the model used by the Foundry Agent.\n",
+    "\n",
+    "- For **evaluating agents**:\n",
+    "  - **`AZURE_OPENAI_ENDPOINT`** – Azure OpenAI endpoint used for evaluation.\n",
+    "  - **`AZURE_OPENAI_API_KEY`** – Azure OpenAI API key used for evaluation.\n",
+    "  - **`AZURE_OPENAI_CHAT_DEPLOYMENT_NAME`** – Deployment name of the chat model used for evaluation.\n",
+    "  - **`AZURE_OPENAI_API_VERSION`** – Azure OpenAI API version used for evaluation (e.g., `2024-05-01-preview`).\n",
+    "\n",
+    "- For **Azure AI Foundry** (Bonus):\n",
+    "  - **`AZURE_AI_AGENT_ENDPOINT`** – Endpoint of your Azure AI Foundry project."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba1d6576",
+   "metadata": {},
+   "source": [
+    "### Create an Azure AI Agent with a plugin - [reference](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-types/azure-ai-agent?pivots=programming-language-python)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7dc6ce40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import Annotated\n",
+    "\n",
+    "from azure.identity import DefaultAzureCredential\n",
+    "\n",
+    "from semantic_kernel.agents import AzureAIAgent, AzureAIAgentSettings\n",
+    "from semantic_kernel.functions import kernel_function\n",
+    "\n",
+    "\n",
+    "# Define a sample plugin for the sample\n",
+    "class MenuPlugin:\n",
+    "    \"\"\"A sample Menu Plugin used for the concept sample.\"\"\"\n",
+    "\n",
+    "    @kernel_function(description=\"Provides a list of specials from the menu.\")\n",
+    "    def get_specials(self) -> Annotated[str, \"Returns the specials from the menu.\"]:\n",
+    "        return \"\"\"\n",
+    "        Special Soup: Clam Chowder\n",
+    "        Special Salad: Cobb Salad\n",
+    "        Special Drink: Chai Tea\n",
+    "        \"\"\"\n",
+    "\n",
+    "    @kernel_function(description=\"Provides the price of the requested menu item.\")\n",
+    "    def get_item_price(\n",
+    "        self, menu_item: Annotated[str, \"The name of the menu item.\"]\n",
+    "    ) -> Annotated[str, \"Returns the price of the menu item.\"]:\n",
+    "        _ = menu_item  # This is just to simulate a function that uses the input.\n",
+    "        return \"$9.99\"\n",
+    "\n",
+    "\n",
+    "# Create an agent\n",
+    "creds = DefaultAzureCredential()\n",
+    "project_client = AzureAIAgent.create_client(credential=creds)\n",
+    "\n",
+    "deployment_name = AzureAIAgentSettings().model_deployment_name\n",
+    "agent_definition = await project_client.agents.create_agent(\n",
+    "    model=deployment_name,\n",
+    "    name=\"Host\",\n",
+    "    instructions=\"Answer questions about the menu.\",\n",
+    ")\n",
+    "\n",
+    "agent = AzureAIAgent(\n",
+    "    client=project_client,\n",
+    "    definition=agent_definition,\n",
+    "    plugins=[MenuPlugin()],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca0a35a0",
+   "metadata": {},
+   "source": [
+    "### Invoke the agent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3b7b9ba3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "USER_INPUTS = [\n",
+    "    \"Hello\",\n",
+    "    \"What is the special soup?\",\n",
+    "    \"What is the special drink?\",\n",
+    "    \"How much is it?\",\n",
+    "    \"Thank you\",\n",
+    "]\n",
+    "\n",
+    "thread = None\n",
+    "for user_input in USER_INPUTS:\n",
+    "    print(f\"## User: {user_input}\")\n",
+    "    response = await agent.get_response(messages=user_input, thread=thread)\n",
+    "    print(f\"## {response.name}: {response.content}\")\n",
+    "    thread = response.thread"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2586d3e5",
+   "metadata": {},
+   "source": [
+    "### Converter: Get data from agent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7813b5eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.evaluation import AIAgentConverter\n",
+    "from azure.ai.projects import AIProjectClient\n",
+    "\n",
+    "# Print the thread ID for reference\n",
+    "print(thread.id)\n",
+    "\n",
+    "# The AIAgentConverter requires a sync project client\n",
+    "ai_agent_settings = AzureAIAgentSettings()\n",
+    "sync_project_client = AIProjectClient(\n",
+    "    endpoint=ai_agent_settings.endpoint,\n",
+    "    credential=DefaultAzureCredential(),\n",
+    ")\n",
+    "\n",
+    "converter = AIAgentConverter(sync_project_client)\n",
+    "\n",
+    "file_name = \"evaluation_data.jsonl\"\n",
+    "# Save the agent thread data to a JSONL file (all turns)\n",
+    "evaluation_data = converter.prepare_evaluation_data([thread.id], filename=file_name)\n",
+    "# print(json.dumps(evaluation_data, indent=4))\n",
+    "len(evaluation_data)  # number of turns in the thread"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8bf87cab",
+   "metadata": {},
+   "source": [
+    "### Setting up evaluator\n",
+    "\n",
+    "We will select the following evaluators to assess the different aspects relevant for agent quality: \n",
+    "\n",
+    "- [Intent resolution](https://aka.ms/intentresolution-sample): measures the extent of which an agent identifies the correct intent from a user query. Scale: integer 1-5. Higher is better.\n",
+    "- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps. Scale: float 0-1. Higher is better.\n",
+    "- [Task adherence](https://aka.ms/taskadherence-sample): measures the extent of which an agent’s final response adheres to the task based on its system message and a user query. Scale: integer 1-5. Higher is better.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e6ee09df",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pprint import pprint\n",
+    "\n",
+    "from azure.ai.evaluation import (\n",
+    "    AzureOpenAIModelConfiguration,\n",
+    "    IntentResolutionEvaluator,\n",
+    "    TaskAdherenceEvaluator,\n",
+    "    ToolCallAccuracyEvaluator,\n",
+    ")\n",
+    "\n",
+    "from semantic_kernel.connectors.ai.open_ai import AzureOpenAISettings\n",
+    "\n",
+    "azure_openai_settings = AzureOpenAISettings()\n",
+    "if not azure_openai_settings.endpoint:\n",
+    "    raise ValueError(\"Azure OpenAI endpoint is not set in the environment variables.\")\n",
+    "if not azure_openai_settings.api_key:\n",
+    "    raise ValueError(\"Azure OpenAI API key is not set in the environment variables.\")\n",
+    "if not azure_openai_settings.chat_deployment_name:\n",
+    "    raise ValueError(\"Azure OpenAI chat deployment name is not set in the environment variables.\")\n",
+    "\n",
+    "\n",
+    "model_config = AzureOpenAIModelConfiguration(\n",
+    "    azure_endpoint=str(azure_openai_settings.endpoint),\n",
+    "    api_key=azure_openai_settings.api_key.get_secret_value(),\n",
+    "    api_version=azure_openai_settings.api_version,\n",
+    "    azure_deployment=azure_openai_settings.chat_deployment_name,\n",
+    ")\n",
+    "\n",
+    "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n",
+    "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n",
+    "task_adherence = TaskAdherenceEvaluator(model_config=model_config)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a3d235",
+   "metadata": {},
+   "source": [
+    "### Run Evaluator"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31eb7ecb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azure.ai.evaluation import evaluate\n",
+    "\n",
+    "response = evaluate(\n",
+    "    data=file_name,\n",
+    "    evaluators={\n",
+    "        \"tool_call_accuracy\": tool_call_accuracy,\n",
+    "        \"intent_resolution\": intent_resolution,\n",
+    "        \"task_adherence\": task_adherence,\n",
+    "    },\n",
+    "    azure_ai_project=ai_agent_settings.endpoint,\n",
+    ")\n",
+    "pprint(f\"AI Foundary URL: {response.get('studio_url')}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac38d924",
+   "metadata": {},
+   "source": [
+    "## Inspect results on Azure AI Foundry\n",
+    "\n",
+    "Go to AI Foundry URL for rich Azure AI Foundry data visualization to inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "env",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}