-
Notifications
You must be signed in to change notification settings - Fork 313
Add SK ChatCompletionAgent notebook #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kdestin
merged 5 commits into
Azure-Samples:main
from
ahibrahimm:ahibrahim/sk-conv-sample
Aug 11, 2025
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
357 changes: 357 additions & 0 deletions
357
...ate/Supported_Evaluation_Metrics/Agent_Evaluation/Evaluate_SK_Chat_Completion_Agent.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,357 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "bf5280e2", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Evaluate Semantic Kernel AI (ChatCompletion) Agents in Azure AI Foundry" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "0330c099", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Objective\n", | ||
| "\n", | ||
| "This sample demonstrates how to evaluate Semantic Kernel AI ChatCompletionAgents in Azure AI Foundry. It provides a step-by-step guide to set up the environment, create an agent, and evaluate its performance." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "b364c694", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Time\n", | ||
| "You can expect to complete this sample in approximately 20 minutes." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "919c6017", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Prerequisites\n", | ||
| "### Packages\n", | ||
| "- `semantic-kernel` installed (`pip install semantic-kernel`)\n", | ||
| "- `azure-ai-evaluation` SDK installed\n", | ||
| "- An Azure OpenAI resource with a deployment configured\n", | ||
| "\n", | ||
| "Before running the sample:\n", | ||
| "```bash\n", | ||
| "pip install semantic-kernel azure-ai-projects azure-identity azure-ai-evaluation\n", | ||
| "```\n", | ||
| "\n", | ||
| "### Environment Variables\n", | ||
| "- For **AzureChatService** (Semantic Kernel Agent):\n", | ||
| " - **`api_key`** – Azure OpenAI API key used by the agent.\n", | ||
| " - **`chat_deployment_name`** – Name of the deployed chat model (e.g., `gpt-35-turbo`) used by the agent.\n", | ||
| " - **`endpoint`** – Azure OpenAI endpoint URL (e.g., `https://<your-resource>.openai.azure.com/`).\n", | ||
| "- For **LLM Evaluation**:\n", | ||
| " - **`AZURE_OPENAI_ENDPOINT`** – Azure OpenAI endpoint to be used by the evaluation LLM.\n", | ||
| " - **`AZURE_OPENAI_API_KEY`** – Azure OpenAI API key for evaluation.\n", | ||
| " - **`AZURE_OPENAI_API_VERSION`** – API version (e.g., `2024-05-01-preview`) for the evaluation LLM.\n", | ||
| " - **`MODEL_DEPLOYMENT_NAME`** – Deployment name of the model used for evaluation*, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project*.\n", | ||
| "- For Azure AI Foundry (Bonus):\n", | ||
| " - **`AZURE_SUBSCRIPTION_ID`** – Your Azure subscription ID where the AI Foundry project is hosted.\n", | ||
| " - **`PROJECT_NAME`** – Name of the Azure AI Foundry project.\n", | ||
| " - **`RESOURCE_GROUP_NAME`** – Resource group containing your AI Foundry project." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "ba1d6576", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Create a AzureChatCompletion service - [reference](https://learn.microsoft.com/en-us/semantic-kernel/concepts/ai-services/chat-completion/?tabs=csharp-AzureOpenAI%2Cpython-AzureOpenAI%2Cjava-AzureOpenAI&pivots=programming-language-python)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "7dc6ce40", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion\n", | ||
| "\n", | ||
| "# You can do the following if you have set the necessary environment variables or created a .env file\n", | ||
| "chat_completion_service = AzureChatCompletion(service_id=\"my-service-id\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "ef319288", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Create a ChatCompletionAgent - [reference](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-types/chat-completion-agent?pivots=programming-language-python)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "76781359", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from semantic_kernel.functions import kernel_function\n", | ||
| "from typing import Annotated\n", | ||
| "\n", | ||
| "\n", | ||
| "# This is a sample plugin that provides tools\n", | ||
| "class MenuPlugin:\n", | ||
| " \"\"\"A sample Menu Plugin used for the concept sample.\"\"\"\n", | ||
| "\n", | ||
| " @kernel_function(description=\"Provides a list of specials from the menu.\")\n", | ||
| " def get_specials(self) -> Annotated[str, \"Returns the specials from the menu.\"]:\n", | ||
| " return \"\"\"\n", | ||
| " Special Soup: Clam Chowder\n", | ||
| " Special Salad: Cobb Salad\n", | ||
| " Special Drink: Chai Tea\n", | ||
| " \"\"\"\n", | ||
| "\n", | ||
| " @kernel_function(description=\"Provides the price of the requested menu item.\")\n", | ||
| " def get_item_price(\n", | ||
| " self, menu_item: Annotated[str, \"The name of the menu item.\"]\n", | ||
| " ) -> Annotated[str, \"Returns the price of the menu item.\"]:\n", | ||
| " _ = menu_item # This is just to simulate a function that uses the input.\n", | ||
| " return \"$9.99\"" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "d6abead3", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from semantic_kernel.agents import ChatCompletionAgent\n", | ||
| "\n", | ||
| "# Create the agent by directly providing the chat completion service\n", | ||
| "agent = ChatCompletionAgent(\n", | ||
| " service=chat_completion_service,\n", | ||
| " name=\"Chef\",\n", | ||
| " instructions=\"Answer questions about the menu.\",\n", | ||
| " plugins=[MenuPlugin()],\n", | ||
| ")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "3b7b9ba3", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "thread = None\n", | ||
| "\n", | ||
| "user_inputs = [\n", | ||
| " \"Hello\",\n", | ||
| " \"What is the special drink today?\",\n", | ||
| " \"What does that cost?\",\n", | ||
| " \"Thank you\",\n", | ||
| "]\n", | ||
| "\n", | ||
| "for user_input in user_inputs:\n", | ||
| " response = await agent.get_response(messages=user_input, thread=thread)\n", | ||
| " print(f\"## User: {user_input}\")\n", | ||
| " print(f\"## {response.name}: {response}\\n\")\n", | ||
| " thread = response.thread" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "2586d3e5", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Converter" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "fcd6ac41", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from azure.ai.evaluation import SKAgentConverter\n", | ||
| "\n", | ||
| "# Get the avaiable turn indices for the thread,\n", | ||
| "# useful for selecting a specific turn for evaluation\n", | ||
| "turn_indices = await SKAgentConverter._get_thread_turn_indices(thread=thread)\n", | ||
| "print(f\"Available turn indices: {turn_indices}\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "d1d4ae12", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "converter = SKAgentConverter()\n", | ||
| "\n", | ||
| "# Get a single agent run data\n", | ||
| "evaluation_data_single_run = await converter.convert(\n", | ||
| " thread=thread,\n", | ||
| " turn_index=2, # Specify the turn index you want to evaluate\n", | ||
| " agent=agent, # Pass it to include the instructions and plugins in the evaluation data\n", | ||
| ")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "7813b5eb", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import json\n", | ||
| "\n", | ||
| "file_name = \"evaluation_data.jsonl\"\n", | ||
| "# Save the agent thread data to a JSONL file (all turns)\n", | ||
| "evaluation_data = await converter.prepare_evaluation_data(threads=[thread], filename=file_name, agent=agent)\n", | ||
| "# print(json.dumps(evaluation_data, indent=4))\n", | ||
| "len(evaluation_data) # number of turns in the thread" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "8bf87cab", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Setting up evaluator\n", | ||
| "\n", | ||
| "We will select the following evaluators to assess the different aspects relevant for agent quality: \n", | ||
| "\n", | ||
| "- [Intent resolution](https://aka.ms/intentresolution-sample): measures the extent of which an agent identifies the correct intent from a user query. Scale: integer 1-5. Higher is better.\n", | ||
| "- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps. Scale: float 0-1. Higher is better.\n", | ||
| "- [Task adherence](https://aka.ms/taskadherence-sample): measures the extent of which an agent’s final response adheres to the task based on its system message and a user query. Scale: integer 1-5. Higher is better.\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "e6ee09df", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import os\n", | ||
| "from pprint import pprint\n", | ||
| "\n", | ||
| "from azure.ai.evaluation import (\n", | ||
| " ToolCallAccuracyEvaluator,\n", | ||
| " AzureOpenAIModelConfiguration,\n", | ||
| " IntentResolutionEvaluator,\n", | ||
| " TaskAdherenceEvaluator,\n", | ||
| ")\n", | ||
| "\n", | ||
| "model_config = AzureOpenAIModelConfiguration(\n", | ||
| " azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n", | ||
| " api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n", | ||
| " api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n", | ||
| " azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n", | ||
| ")\n", | ||
| "\n", | ||
| "intent_resolution = IntentResolutionEvaluator(model_config=model_config)\n", | ||
| "\n", | ||
| "tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)\n", | ||
| "\n", | ||
| "task_adherence = TaskAdherenceEvaluator(model_config=model_config)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "80bd50ff", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Test a single evaluation run\n", | ||
| "evaluator = ToolCallAccuracyEvaluator(model_config=model_config)\n", | ||
| "\n", | ||
| "# evaluation_data_single_run.keys() # query, response, tool_definitions\n", | ||
| "res = evaluator(**evaluation_data_single_run)\n", | ||
| "print(json.dumps(res, indent=4))" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "06bab561", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "#### Bonus - run on perviously saved file for all turns" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "c0530c0d", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from azure.ai.evaluation import evaluate\n", | ||
| "\n", | ||
| "response = evaluate(\n", | ||
| " data=file_name,\n", | ||
| " evaluators={\n", | ||
| " \"tool_call_accuracy\": tool_call_accuracy,\n", | ||
| " \"intent_resolution\": intent_resolution,\n", | ||
| " \"task_adherence\": task_adherence,\n", | ||
| " },\n", | ||
| " azure_ai_project={\n", | ||
| " \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n", | ||
| " \"project_name\": os.environ[\"PROJECT_NAME\"],\n", | ||
| " \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n", | ||
| " },\n", | ||
| ")\n", | ||
| "\n", | ||
| "pprint(f'AI Foundary URL: {response.get(\"studio_url\")}')" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "ac38d924", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Inspect results on Azure AI Foundry\n", | ||
| "\n", | ||
| "Go to AI Foundry URL for rich Azure AI Foundry data visualization to inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "225ae69a", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# alternatively, you can use the following to get the evaluation results in memory\n", | ||
| "\n", | ||
| "# average scores across all runs\n", | ||
| "pprint(response[\"metrics\"])" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.