Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Response Completeness Evaluator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting Started\n",
"\n",
"This sample demonstrates how to use Response Completeness Evaluator on agent's response when ground truth is provided. This evaluator is helpful when you have ground truth to assess the quality of the agent's final response. \n",
"\n",
"## Time\n",
"\n",
"You should expect to spend about 20 minutes running this notebook. \n",
"\n",
"## Before you begin\n",
"For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities. \n",
"\n",
"### Prerequisite\n",
"```bash\n",
"pip install azure-ai-projects azure-identity azure-ai-evaluation\n",
"```\n",
"Set these environment variables with your own values:\n",
"1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.\n",
"2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the \"Name\" column in the \"Models + endpoints\" tab in your Azure AI Foundry project.\n",
"3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.\n",
"4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.\n",
"5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.\n",
"6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project\n",
"7) **PROJECT_NAME** - Azure AI Project Name\n",
"8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Response Completeness evaluator assesses the quality of an agent response by examining how well it aligns with the provided ground truth. The evaluation is based on the following scoring system:\n",
"\n",
"<pre>\n",
"Score 1: Fully incomplete: The response misses all necessary and relevant information compared to the ground truth.\n",
"Score 2: Barely complete: The response contains only a small percentage of the necessary information.\n",
"Score 3: Moderately complete: The response includes about half of the necessary information.\n",
"Score 4: Mostly complete: The response contains most of the necessary information, with only minor omissions.\n",
"Score 5: Fully complete: The response perfectly matches all necessary and relevant information from the ground truth.\n",
"</pre>\n",
"\n",
"The evaluation requires the following inputs:\n",
"\n",
"- Response: The response to be evaluated. (string)\n",
"- Ground Truth: The correct and complete information against which the response is compared. (string)\n",
"\n",
"The evaluator uses these inputs to determine the completeness score, ensuring that the response meaningfully addresses the query while adhering to the provided definitions and data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize Completeness Evaluator\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.evaluation import ResponseCompletenessEvaluator , AzureOpenAIModelConfiguration\n",
"from pprint import pprint\n",
"import os\n",
"\n",
"model_config = AzureOpenAIModelConfiguration(\n",
" azure_endpoint=os.environ[\"AZURE_OPENAI_ENDPOINT\"],\n",
" api_key=os.environ[\"AZURE_OPENAI_API_KEY\"],\n",
" api_version=os.environ[\"AZURE_OPENAI_API_VERSION\"],\n",
" azure_deployment=os.environ[\"MODEL_DEPLOYMENT_NAME\"],\n",
")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.evaluation import ResponseCompletenessEvaluator , AzureOpenAIModelConfiguration\n",
"from pprint import pprint\n",
"\n",
"response_completeness_evaluator = ResponseCompletenessEvaluator(model_config=model_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Samples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Evaluating for a ground_truth and response"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# agent response is complete \n",
"result = response_completeness_evaluator(\n",
" response=\"Itinery: Day 1 check out the downtown district of the city on train; for Day 2, we can rest in hotel.\",\n",
" ground_truth=\"Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel.\"\n",
")\n",
"result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# agent response is incomplete\n",
"result = response_completeness_evaluator(\n",
" response=\"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025.\",\n",
" ground_truth=\"The order with ID 124 is delayed and should now arrive by March 20, 2025.\"\n",
")\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare ground truth for agent response\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"data = [\n",
" {\n",
" \"response\": \"The temperature of Seattle now is 70 degrees. Based on the temperature, having an outdoor office party is recommended.\",\n",
" \"ground_truth\": \"The temperature of Seattle now is 50 degrees. It will be recommended to bring a jacket in the evening.\",\n",
" },\n",
" {\n",
" \"response\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback.\",\n",
" \"ground_truth\": \"The email draft \\\"Project Plan\\\" is attached. Please review and provide feedback by EOD.\",\n",
" },\n",
" {\n",
" \"response\": \"Based on the retrieved documents, the shareholder meeting discussed the operational efficiency of the company and financing options.\",\n",
" \"ground_truth\": \"The shareholder meeting discussed the compensation package of the company CEO.\",\n",
" },\n",
" {\n",
" \"response\": \"The calendar API returns an error code 500. Please check the server logs for more details.\",\n",
" \"ground_truth\": \"The meeting is scheduled for 2 PM tomorrow. Please confirm your availability by EOD.\",\n",
" }\n",
"]\n",
"\n",
"file_path = \"response_completeness_data.jsonl\"\n",
"\n",
"with open(file_path, 'w') as file:\n",
" for line in data:\n",
" file.write(json.dumps(line) + '\\n')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Batch evaluate and visualize results on Azure AI Foundry\n",
"Batch evaluate to leverage asynchronous evaluation on a dataset. \n",
"\n",
"Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"from azure.ai.evaluation import evaluate\n",
"\n",
"azure_ai_project={\n",
" \"subscription_id\": os.environ[\"AZURE_SUBSCRIPTION_ID\"],\n",
" \"project_name\": os.environ[\"PROJECT_NAME\"],\n",
" \"resource_group_name\": os.environ[\"RESOURCE_GROUP_NAME\"],\n",
"}\n",
"\n",
"response = evaluate(\n",
" data=file_path,\n",
" evaluators={\n",
" \"response_completeness\": response_completeness_evaluator,\n",
" },\n",
" azure_ai_project=azure_ai_project,\n",
")\n",
"\n",
"pprint(f'AI Foundry URL: {response.get(\"studio_url\")}')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "test_agent_evaluator_prp",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading
Loading