Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Azure AI Search configuration
AZURE_AI_SEARCH_ENDPOINT=
AZURE_AI_SEARCH_KEY=

# Optional configuration for integrated vectorization
AZURE_AI_SEARCH_VECTORIZER_ENDPOINT=
AZURE_AI_SEARCH_VECTORIZER_KEY=
AZURE_AI_SEARCH_VECTORIZER_MODEL_NAME=
AZURE_AI_SEARCH_VECTORIZER_DEPLOYMENT_NAME=
AZURE_AI_SEARCH_VECTORIZER_MODEL_DIMENSIONS=

# Azure AI Project config
AZURE_SUBSCRIPTION_ID=
AZURE_RESOURCE_GROUP=
AZURE_PROJECT_NAME=
AZURE_PROJECT_CONNECTION_STRING=

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8f372fc3-d2ad-49a8-87da-e3b092b6b5ae",
"metadata": {},
"source": [
"# Document Retrieval Evaluation in Azure AI Foundry\n",
"\n",
"## Summary\n",
"This notebook sample demonstrates how to perform evaluation of an Azure AI Search index using Azure AI Evaluation. The evaluator used in this example, `DocumentRetrievalEvaluator` requires a list of ground truth labeled documents (sometimes referred to as \"qrels\") and a list of actual search results obtained from a search index as inputs for calculating the evaluation metrics.\n",
"\n",
"This sample will use data prepared in advance to show how to get started quickly with the Azure AI Evaluation service. To better understand the data processing workflow, take a look at the [full sample](Document_Retrieval_Evaluation_Full_Sample.ipynb).\n",
"\n",
"### Explanation of Document Retrieval Metrics \n",
"The metrics that will be generated in the output of the evaluator include:\n",
"\n",
"| Metric | Category | Description |\n",
"|-----------------------|---------------------|-------------------------------------------------------------------------------------------------|\n",
"| Fidelity | Search Fidelity | How well the top n retrieved chunks reflect the content for a given query; number of good documents returned out of the total number of known good documents in a dataset |\n",
"| NDCG | Search NDCG | How good are the rankings to an ideal order where all relevant items are at the top of the list. |\n",
"| XDCG | Search XDCG | How good the results are in the top-k documents regardless of scoring of other index documents |\n",
"| Max Relevance N | Search Max Relevance | Maximum relevance in the top-k chunks |\n",
"| Holes | Search Label Sanity | Number of documents with missing query relevance judgments (Ground truth) | \n",
"\n",
"It's important to note that some metrics, particularly NDCG, XDCG and Fidelity, are sensitive to holes. Ideally the count of holes for a given evaluation should be zero, otherwise results for these metrics may not be accurate. It is recommended to iteratively check results against current known ground truth to fill holes to improve accuracy of the evaluation metrics. This process is not covered explicitly in the sample but is important to mention."
]
},
{
"cell_type": "markdown",
"id": "de22faf1-878a-4619-aaf0-e26a05a19867",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"id": "90c33b6e-be25-4331-a6ca-8156fe6c11b2",
"metadata": {},
"source": [
"### Prerequisites\n",
"Before running this notebook, be sure you have fulfilled the following prerequisites:\n",
"* Create or get access to an [Azure Subscription](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/initial-subscriptions), and assign yourself the Owner or Contributor role for creating resources in this subscription.\n",
"* `az` CLI is installed in the current environment, and you have run `az login` to gain access to your resources. \n",
"* Create an [Azure AI Foundry project](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/create-projects?tabs=ai-studio)."
]
},
{
"cell_type": "markdown",
"id": "d7da7c8c-e0bb-47d0-9f1e-72e1e545f95f",
"metadata": {},
"source": [
"### Install Python requirements\n",
"Run the following command to install the python requirements for this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0137b5aa-9837-4d74-b9a3-96b36e6c301a",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!pip install -r requirements.txt\n",
"!pip freeze"
]
},
{
"cell_type": "markdown",
"id": "e452701f-1faa-4715-8bd9-ad5a4231bd6f",
"metadata": {},
"source": [
"### Import all modules\n",
"For convenience, all modules needed for the rest of the notebook can be imported all at once."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba11028d-33ca-49e9-b106-86f0e4316182",
"metadata": {},
"outputs": [],
"source": [
"# Standard library\n",
"import json\n",
"import logging\n",
"import pathlib, os\n",
"import pandas as pd\n",
"import random\n",
"import string\n",
"import time\n",
"\n",
"# Azure SDK\n",
"from azure.ai.evaluation import DocumentRetrievalEvaluator\n",
"from azure.ai.projects.models import (\n",
" Dataset,\n",
" Evaluation,\n",
" EvaluatorConfiguration,\n",
")\n",
"from azure.ai.projects import AIProjectClient\n",
"from azure.core.credentials import AzureKeyCredential\n",
"from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n",
"\n",
"# Other open source packages\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"id": "efa63828-44da-4b4a-9270-ac31288d9477",
"metadata": {},
"source": [
"### Load resource connection configuration\n",
"The following cell will load the necessary resource connection configuration for the sample. Copy the contents of `.env.sample` into a new file named `.env`, and fill in the values corresponding to your own service resources."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c802f08c-0423-49bc-a7b3-389e9628fd7e",
"metadata": {},
"outputs": [],
"source": [
"load_dotenv()"
]
},
{
"cell_type": "markdown",
"id": "a12d8a40-8a00-4979-9400-9b36ed9a6ee5",
"metadata": {},
"source": [
"### Create client objects for managing resources and set other helpful variables\n",
"We will also create all of the client objects and other variables needed for the rest of the notebook in the following cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50aab177-5a0b-48b8-b721-62d9197cc1a2",
"metadata": {},
"outputs": [],
"source": [
"# Create the Azure AI Project client\n",
"project_client = AIProjectClient.from_connection_string(\n",
" credential=DefaultAzureCredential(),\n",
" conn_str=os.environ[\"AZURE_PROJECT_CONNECTION_STRING\"]\n",
")\n",
"\n",
"# Set other helpful variables\n",
"data_directory = os.path.join(\".\")\n",
"data_ids = None"
]
},
{
"cell_type": "markdown",
"id": "3e4618d1-0015-4cc4-a2bd-d77cd85184f9",
"metadata": {},
"source": [
"## Dataset Preparation"
]
},
{
"cell_type": "markdown",
"id": "8b25e4e3-d5a4-41d8-8f41-daef9fd1f18f",
"metadata": {},
"source": [
"### Upload dataset to Azure AI Foundry\n",
"To run an evaluation in the cloud, we need to uploud our evaluation data to the specified Azure AI Foundry project.\n",
"\n",
"This sample includes a few small dataset examples that were prepared in advance using a subset of the TREC-COVID open-source dataset. The sample files contain search results and ground truth labels for each query provided in the dataset, where search results were obtained using Azure AI Search. Each file contains results using a different search configuration: text-only, vector-only, semantic, and semantic-vector hybrid search.\n",
"\n",
"In the next cell, we'll upload each file to the Azure AI Foundry project, and later use them to run evaluation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3a0102b-1738-4115-a25e-83a924c65d9e",
"metadata": {},
"outputs": [],
"source": [
"evaluation_configs = []\n",
"for config_name, sample_data_file_name in [\n",
" (\"Text Search\", \"evaluate-trec-covid-text.jsonl\"),\n",
" (\"Semantic Search\", \"evaluate-trec-covid-semantic.jsonl\"),\n",
" (\"Vector Search\", \"evaluate-trec-covid-vector.jsonl\"),\n",
" (\"Hybrid Search\", \"evaluate-trec-covid-hybrid.jsonl\")\n",
"]:\n",
" print(f\"Uploading file '{sample_data_file_name}'\")\n",
" data_id, _ = project_client.upload_file(sample_data_file_name)\n",
"\n",
" print(f\"File {sample_data_file_name} was uploaded successfully!\")\n",
" print(f\"Data ID: {data_id}\")\n",
"\n",
" evaluation_configs.append((config_name, data_id))"
]
},
{
"cell_type": "markdown",
"id": "cb22d6c7-91f2-47f1-a07e-caf42855e6cb",
"metadata": {},
"source": [
"## Run document retrieval evaluation\n",
"After our datasets are uploaded, we will configure and run the document retrieval evaluator for each uploaded dataset. The init params `ground_truth_label_min` and `ground_truth_label_max` help us to configure the qrels scaling for some metrics which depend on a count of labels, such as Fidelity. In this case, the TREC-COVID dataset ground truth set has 0, 1, and 2 as possible labels, so we set the values of those init params accordingly."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92853ecf-61b2-4ce9-81d2-b2afcb784529",
"metadata": {},
"outputs": [],
"source": [
"def run_evaluation(evaluation_name, evaluation_description, dataset_id):\n",
" # Create an evaluation\n",
" evaluation = Evaluation(\n",
" display_name=evaluation_name,\n",
" description=evaluation_description,\n",
" data=Dataset(id=dataset_id),\n",
" evaluators={\n",
" \"documentretrievalevaluator\": EvaluatorConfiguration(\n",
" id=DocumentRetrievalEvaluator().id,\n",
" data_mapping={\n",
" \"retrieval_ground_truth\": \"${data.retrieval_ground_truth}\",\n",
" \"retrieved_documents\": \"${data.retrieved_documents}\"\n",
" },\n",
" init_params={\n",
" \"ground_truth_label_min\": 0,\n",
" \"ground_truth_label_max\": 2\n",
" }\n",
" )\n",
" },\n",
" )\n",
"\n",
" # Create evaluation\n",
" evaluation_response = project_client.evaluations.create(\n",
" evaluation=evaluation,\n",
" )\n",
"\n",
" # Get evaluation\n",
" get_evaluation_response = project_client.evaluations.get(evaluation_response.id)\n",
"\n",
" print(\"----------------------------------------------------------------\")\n",
" print(\"Created evaluation, evaluation ID: \", get_evaluation_response.id)\n",
" print(\"Evaluation status: \", get_evaluation_response.status)\n",
" print(\"AI project URI: \", get_evaluation_response.properties[\"AiStudioEvaluationUri\"])\n",
" print(\"----------------------------------------------------------------\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "501552c3-722d-4912-9393-e8c63257f523",
"metadata": {},
"outputs": [],
"source": [
"for (config_name, data_id) in evaluation_configs:\n",
" run_evaluation(f\"TREC-COVID evaluation - {config_name}\", \"Document retrieval evaluation using the TREC-COVID dataset from BeIR\", data_id)"
]
},
{
"cell_type": "markdown",
"id": "7c616e48-2fab-4493-977e-ec51c7e6951b",
"metadata": {},
"source": [
"## Comparing results\n",
"\n",
"Once the evaluations are complete, you can compare the results by clicking the \"Evaluations\" tab on the left-side of the Azure AI Foundry project page, select the runs for comparison, and then click the \"Compare\" button to see metric results side-by-side.\n",
"\n",
"![Azure AI Foundry project evaluations page](eval-results-select.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
azure-ai-ml
azure-ai-projects
azure-core
azure-identity
azure-search-documents==11.6.0b10
azure-ai-evaluation
beir
dotenv
jupyter
pandas
openai
tiktoken