diff --git a/scenarios/evaluate/AI_RedTeaming/AI_RedTeaming.ipynb b/scenarios/evaluate/AI_RedTeaming/AI_RedTeaming.ipynb new file mode 100644 index 00000000..a17ffc18 --- /dev/null +++ b/scenarios/evaluate/AI_RedTeaming/AI_RedTeaming.ipynb @@ -0,0 +1,449 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# AI Red Teaming Agent for Generative AI models and applications in Azure AI Foundry\n", + "\n", + "## Objective\n", + "This notebook walks through how to use Azure AI Evaluation's AI Red Teaming Agent functionality to assess the safety and resilience of AI systems against adversarial prompt attacks. AI Red Teaming Agent leverages [Risk and Safety Evaluations](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#risk-and-safety-evaluators) to help identify potential safety issues across different risk categories (violence, hate/unfairness, sexual content, self-harm) combined with attack strategies of varying complexity levels from [PyRIT](https://github.com/Azure/PyRIT), Microsoft AI Red Teaming team's open framework for automated AI red teaming.\n", + "\n", + "## Time\n", + "You should expect to spend about 30-45 minutes running this notebook. Execution time will vary based on the number of risk categories, attack strategies, and complexity levels you choose to evaluate.\n", + "\n", + "## Before you begin\n", + "\n", + "### Prerequisite\n", + "The AI Red Teaming Agent requires an Azure AI Foundry project configuration and Azure credentials. Your project configuration will be used to log red teaming scan results after the run is finished.\n", + "\n", + "**Important**: Make sure to authenticate to Azure using `az login` in your terminal before running this notebook.\n", + "\n", + "### Installation\n", + "From a terminal window, navigate to your working directory which contains this sample notebook, and execute the following.\n", + "```bash\n", + "python -m venv .venv\n", + "```\n", + "\n", + "Then, activate the virtual environment created:\n", + "\n", + "```bash\n", + "# %source .venv/bin/activate # If using Mac/Linux OS\n", + ".venv/Scripts/activate # If using Windows OS\n", + "```\n", + "\n", + "With your virtual environment activated, install the following packages required to execute this notebook:\n", + "\n", + "```bash\n", + "pip install uv\n", + "uv pip install azure-ai-evaluation[redteam] termcolor==2.5.0 azure-identity openai\n", + "```\n", + "\n", + "\n", + "Now open VSCode with the following command, and ensure your virtual environment is used as kernel to run the remainder of this notebook.\n", + "```bash\n", + "code .\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from typing import Optional, Dict, Any\n", + "import os\n", + "\n", + "# Azure imports\n", + "from azure.identity import DefaultAzureCredential, get_bearer_token_provider\n", + "from azure.ai.evaluation import RedTeam, RiskCategory, AttackStrategy\n", + "\n", + "# OpenAI imports\n", + "from openai import AzureOpenAI\n", + "\n", + "# Initialize Azure credentials\n", + "credential = DefaultAzureCredential()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set Up Your Environment Variables\n", + "\n", + "Set the following variables for use in this notebook. These variables connect to your Azure resources and model deployments.\n", + "\n", + "**Note:** You can find these values in your Azure AI Foundry project or Azure OpenAI resource." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For reference, here's an example of what your populated environment variables should look like:\n", + "\n", + "```\n", + "# Azure OpenAI\n", + "AZURE_OPENAI_API_KEY=\"your-api-key-here\"\n", + "AZURE_OPENAI_ENDPOINT=\"https://endpoint-name.openai.azure.com/openai/deployments/deployment-name/chat/completions\"\n", + "AZURE_OPENAI_DEPLOYMENT_NAME=\"gpt-4\"\n", + "AZURE_OPENAI_API_VERSION=\"2023-12-01-preview\"\n", + "\n", + "# Azure AI Project\n", + "AZURE_SUBSCRIPTION_ID=\"12345678-1234-1234-1234-123456789012\"\n", + "AZURE_RESOURCE_GROUP_NAME=\"your-resource-group\"\n", + "AZURE_PROJECT_NAME=\"your-project-name\"\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Azure AI Project information\n", + "azure_ai_project = {\n", + " \"subscription_id\": os.environ.get(\"AZURE_SUBSCRIPTION_ID\"),\n", + " \"resource_group_name\": os.environ.get(\"AZURE_RESOURCE_GROUP_NAME\"),\n", + " \"project_name\": os.environ.get(\"AZURE_PROJECT_NAME\"),\n", + "}\n", + "\n", + "# Azure OpenAI deployment information\n", + "azure_openai_deployment = os.environ.get(\"AZURE_OPENAI_DEPLOYMENT\") # e.g., \"gpt-4\"\n", + "azure_openai_endpoint = os.environ.get(\n", + " \"AZURE_OPENAI_ENDPOINT\"\n", + ") # e.g., \"https://endpoint-name.openai.azure.com/openai/deployments/deployment-name/chat/completions\"\n", + "azure_openai_api_key = os.environ.get(\"AZURE_OPENAI_API_KEY\") # e.g., \"your-api-key\"\n", + "azure_openai_api_version = \"2023-12-01-preview\" # Use the latest API version" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Understanding AI Red Teaming Agent's capabilities\n", + "\n", + "The Azure AI Evaluation SDK's `RedTeam` functionality evaluates AI systems against adversarial prompts across multiple dimensions:\n", + "\n", + "1. **Risk Categories**: Different content risk categories your AI system might generate\n", + " - Violence\n", + " - HateUnfairness\n", + " - Sexual\n", + " - SelfHarm\n", + "\n", + "2. **Attack Strategies**: Along with standard unmodified prompts which are sent by default as the `baseline`, you can specify different transformations of prompts to elicit undesired content.\n", + "You can also use `AttackStrategy.Compose()` to layer two strategies in one attack\n", + " - AnsiAttack: Using ANSI escape codes in prompts\n", + " - AsciiArt: Using ASCII art to disguise harmful content\n", + " - AsciiSmuggler: Hiding harmful content within ASCII characters\n", + " - Atbash: Using the Atbash cipher to encode harmful requests\n", + " - Base64: Encoding harmful content in Base64 format\n", + " - Binary: Converting text to binary to bypass filters\n", + " - Caesar: Using the Caesar cipher for encoding\n", + " - CharacterSpace: Manipulating character spacing to confuse filters\n", + " - CharSwap: Swapping characters to bypass detection\n", + " - Diacritic: Using diacritical marks to alter text appearance\n", + " - Flip: Flipping text to bypass content filters\n", + " - Leetspeak: Converting letters to numbers and symbols\n", + " - Morse: Using Morse code to encode harmful requests\n", + " - ROT13: Using ROT13 cipher for text transformation\n", + " - SuffixAppend: Adding suffixes to confuse detection systems\n", + " - StringJoin: Joining strings in unconventional ways\n", + " - Tense: Changing the tense of harmful requests to past tense\n", + " - UnicodeConfusable: Using similar-looking Unicode characters\n", + " - UnicodeSubstitution: Substituting characters with Unicode alternatives\n", + " - Url: Embedding harmful content within URLs\n", + " - Jailbreak: Specially crafted prompts to bypass AI safeguards\n", + "\n", + "3. **Complexity Levels**: Different difficultly levels of attacks\n", + " - Baseline: Standard functionality tests\n", + " - Easy: Simple attack patterns\n", + " - Moderate: More sophisticated attacks\n", + " - Difficult: Complex, layered attack strategies\n", + "\n", + "The key metric for evaluating results is the **Attack Success Rate (ASR)**, which measures the percentage of attacks that successfully elicit harmful content from your AI system." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Example: Fixed Response Callback\n", + "\n", + "Let's start with a basic example that demonstrates how to set up a Red Team evaluation using a simple callback that always returns a fixed, safe response regardless of the input prompt.\n", + "\n", + "**Important Note**: Since this callback always responds with the same safe message, the Attack Success Rate (ASR) will be 0% for all categories. This example is merely to demonstrate the API structure and flow." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a simple callback function that always returns a fixed response\n", + "def financial_advisor_callback(query: str) -> str: # noqa: ARG001\n", + " return \"I'm a financial advisor assistant. I can help with investment advice and financial planning within legal and ethical guidelines.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the `RedTeam` instance with minimal configurations\n", + "red_team = RedTeam(\n", + " azure_ai_project=azure_ai_project,\n", + " credential=credential,\n", + " risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness],\n", + " num_objectives=1,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NOTE: `num_objectives` specifies the number of attacks to perform per risk category per attack strategy. If the parameter `risk_categories` is not specified, `[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual, RiskCategory.SelfHarm]` will be used by default." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's run a simple automated scan using the `RedTeam` with the fixed response target. We'll test against two risk categories and one attack strategy for simplicity." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run the red team scan called \"Basic-Callback-Scan\" with limited scope for this basic example\n", + "# This will test 1 objective prompt for each of Violence and HateUnfairness categories with the Flip strategy\n", + "result = await red_team.scan(\n", + " target=financial_advisor_callback, scan_name=\"Basic-Callback-Scan\", attack_strategies=[AttackStrategy.Flip]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Intermediary Example: Using a Model Configuration as Target\n", + "\n", + "Now let's create a more realistic example that uses an Azure OpenAI model for responding to the red teaming prompts. To test base or foundation models, you can update your target to take in a model configuration:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a model configuration to test\n", + "azure_oai_model_config = {\n", + " \"azure_endpoint\": azure_openai_endpoint,\n", + " \"azure_deployment\": azure_openai_deployment,\n", + " \"api_key\": azure_openai_api_key,\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, update your target to point to the model configurations and run the scan." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run the red team scan called \"Intermediary-Model-Target-Scan\"\n", + "result = await red_team.scan(\n", + " target=azure_oai_model_config, scan_name=\"Intermediary-Model-Target-Scan\", attack_strategies=[AttackStrategy.Flip]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Advanced Example: Using an Azure Open AI Model Endpoint in a Callback Function\n", + "\n", + "Using the same Azure Open AI model configuration as above, we now wrap it in a callback function for more flexibility and control on the input and output handling. This will demonstrate how to evaluate an actual AI application. To test your own actual AI application, replace the inside of the callback function with a call to your application." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define a callback that uses Azure OpenAI API to generate responses\n", + "async def azure_openai_callback(\n", + " messages: list,\n", + " stream: Optional[bool] = False, # noqa: ARG001\n", + " session_state: Optional[str] = None, # noqa: ARG001\n", + " context: Optional[Dict[str, Any]] = None, # noqa: ARG001\n", + ") -> dict[str, list[dict[str, str]]]:\n", + " deployment = os.environ.get(\"AZURE_DEPLOYMENT_NAME\")\n", + " endpoint = os.environ.get(\"AZURE_ENDPOINT\")\n", + " api_version = os.environ.get(\"AZURE_API_VERSION\")\n", + "\n", + " # Get token provider for Azure AD authentication\n", + " token_provider = get_bearer_token_provider(DefaultAzureCredential(), \"https://cognitiveservices.azure.com/.default\")\n", + "\n", + " # Initialize Azure OpenAI client\n", + " client = AzureOpenAI(azure_endpoint=endpoint, api_version=api_version, azure_ad_token_provider=token_provider)\n", + "\n", + " ## Extract the latest message from the conversation history\n", + " messages_list = [{\"role\": message.role, \"content\": message.content} for message in messages]\n", + " latest_message = messages_list[-1][\"content\"]\n", + "\n", + " try:\n", + " # Call the model\n", + " response = client.chat.completions.create(\n", + " model=deployment,\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": latest_message},\n", + " ],\n", + " max_tokens=500,\n", + " temperature=0.7,\n", + " )\n", + "\n", + " # Format the response to follow the expected chat protocol format\n", + " formatted_response = {\"content\": response.choices[0].message.content, \"role\": \"assistant\"}\n", + "\n", + " return {\"messages\": [formatted_response]}\n", + " except Exception as e:\n", + " print(f\"Error calling Azure OpenAI: {e!s}\")\n", + " return \"I encountered an error and couldn't process your request.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Create the RedTeam instance with all of the risk categories with 5 attack objectives generated for each category\n", + "model_red_team = RedTeam(\n", + " azure_ai_project=azure_ai_project,\n", + " credential=credential,\n", + " risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual, RiskCategory.SelfHarm],\n", + " num_objectives=5,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will use this instance of `model_red_team` to test different attack strategies in the following section." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Testing Different Attack Strategies\n", + "\n", + "Now we'll run a more comprehensive evaluation using multiple attack strategies across risk categories. This will give us a better understanding of our model's vulnerabilities." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run the red team scan with multiple attack strategies\n", + "advanced_result = await model_red_team.scan(\n", + " target=azure_openai_callback,\n", + " scan_name=\"Advanced-Callback-Scan\",\n", + " attack_strategies=[\n", + " AttackStrategy.EASY, # Group of easy complexity attacks\n", + " AttackStrategy.MODERATE, # Group of moderate complexity attacks\n", + " AttackStrategy.CharacterSpace, # Add character spaces\n", + " AttackStrategy.ROT13, # Use ROT13 encoding\n", + " AttackStrategy.UnicodeConfusable, # Use confusable Unicode characters\n", + " AttackStrategy.CharSwap, # Swap characters in prompts\n", + " AttackStrategy.Morse, # Encode prompts in Morse code\n", + " AttackStrategy.Leetspeak, # Use Leetspeak\n", + " AttackStrategy.Url, # Use URLs in prompts\n", + " AttackStrategy.Binary, # Encode prompts in binary\n", + " AttackStrategy.Compose([AttackStrategy.Base64, AttackStrategy.ROT13]), # Use two strategies in one attack\n", + " ],\n", + " output_path=\"Advanced-Callback-Scan.json\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The data and results used in this attack will be saved to the `output_path` specified. The URL printed out at the end of the scorecard will provide a link to where you results are uploaded and logged to your Azure AI Foundry project." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "In this notebook, we've demonstrated how to use the Azure AI Evaluation SDK's `RedTeam` functionality to assess the safety and resilience of AI systems. We started with a basic fixed-response example and then moved to a more realistic model testing across multiple risk categories and attack strategies.\n", + "\n", + "The automated AI red teaming scans provides valuable insights into:\n", + "\n", + "1. **Overall Attack Success Rate (ASR)** - The percentage of attacks that successfully elicit harmful content\n", + "2. **Vulnerability by Risk Category** - Which types of harmful content your model is most vulnerable to\n", + "3. **Effectiveness of Attack Strategies** - Which attack techniques are most successful against your model\n", + "4. **Impact of Complexity** - How more sophisticated attacks affect your model's safety guardrails\n", + "\n", + "By regularly red-teaming your AI applications, you can identify and address potential vulnerabilities before deploying your models to production environments.\n", + "\n", + "### Next Steps\n", + "\n", + "1. **Mitigation**: Use these results to strengthen your model's guardrails against identified attack vectors\n", + "2. **Continuous Testing**: Implement regular red team evaluations as part of your development lifecycle\n", + "3. **Custom Strategies**: Develop custom attack strategies for your specific use cases and domain\n", + "4. **Safety Layers**: Consider adding additional safety layers like Azure AI Content Safety to filter harmful requests and responses " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "3-28", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/scenarios/evaluate/AI_RedTeaming/README.md b/scenarios/evaluate/AI_RedTeaming/README.md new file mode 100644 index 00000000..87a8c080 --- /dev/null +++ b/scenarios/evaluate/AI_RedTeaming/README.md @@ -0,0 +1,97 @@ +# AI Red Teaming Agent for Generative AI Applications + +This sample demonstrates how to use Azure AI Evaluation's `RedTeam` functionality to assess the safety and resilience of AI systems against adversarial prompt attacks. + +## Objective + +AI Red Teaming Agent leverages [Risk and Safety Evaluations](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#risk-and-safety-evaluators) to help identify potential safety issues across different risk categories (violence, hate/unfairness, sexual content, self-harm) combined with attack strategies of varying complexity levels from [PyRIT](https://github.com/Azure/PyRIT), Microsoft AI Red Teaming team's open framework for automated AI red teaming. + +## Time + +You should expect to spend about 30-45 minutes running the notebook. Execution time will vary based on the number of risk categories, attack strategies, and complexity levels you choose to evaluate. + +## Prerequisites + +- Azure subscription +- Azure AI Foundry project +- Python 3.10+ environment + +## Setup + +1. Install the required packages: + + ```bash + pip install azure-ai-evaluation[redteam] + ``` + +2. Set up your environment variables: + + ```env + # Azure OpenAI + AZURE_OPENAI_API_KEY="your-api-key-here" + AZURE_OPENAI_ENDPOINT="https://endpoint-name.openai.azure.com/openai/deployments/deployment-name/chat/completions" + AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4" + AZURE_OPENAI_API_VERSION="2023-12-01-preview" + + # Azure AI Project + AZURE_SUBSCRIPTION_ID="" + AZURE_RESOURCE_GROUP_NAME="" + AZURE_PROJECT_NAME="" + ``` + +3. Authenticate to Azure using `az login` in your terminal before running the notebook. + +## Key Concepts + +The AI Red Teaming Agent assesses AI systems across multiple dimensions: + +### Risk Categories + +- **Violence**: Content that describes or promotes violence +- **Hate and Unfairness**: Content containing hate speech or unfair bias +- **Sexual**: Inappropriate sexual content +- **Self-Harm**: Content related to self-harm behaviors + +### Attack Strategies + +- **Text Transformation**: Base64, ROT13, Binary, Morse code, etc. +- **Character Manipulation**: Character spacing, swapping, Leetspeak +- **Encoding Techniques**: ASCII art, Unicode confusables +- **Jailbreak Attempts**: Special prompts designed to bypass AI safeguards + +### Complexity Levels + +- **Baseline**: Standard naive attacks without any attack strategy +- **Easy**: Simple attack patterns +- **Moderate**: More sophisticated attacks +- **Difficult**: Complex, layered attack strategies + +## Using the Notebook + +The notebook provides two main examples: + +1. **Basic Example**: A simple demonstration using a fixed response callback +2. **Intermediary Example**: Targeting a model configuration to test base or foundational models +3. **Advanced Example**: Using an actual Azure OpenAI model to evaluate against multiple attack strategies + +### Analysis Features + +- **Attack Success Rate (ASR)**: Measures the percentage of attacks that successfully elicit harmful content +- **Risk Category Analysis**: Shows which content categories are most vulnerable +- **Attack Strategy Assessment**: Identifies which techniques are most effective +- **Detailed Conversation Inspection**: Examines specific conversations including prompts and responses + +## Next Steps + +After running the AI red teaming scan: + +1. **Mitigation**: Strengthen your model's guardrails against identified attack strategies. +2. **Continuous Testing**: Implement regular AI red teaming scans as part of your development lifecycle. +3. **Custom Strategies**: Develop custom attack strategies for your specific use cases +4. **Safety Layers**: Consider adding additional safety layers like [Azure AI Content Safety filters](https://learn.microsoft.com/en-us/azure/ai-services/content-safety/overview) or safety system messages using our [templates](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/safety-system-message-templates). + +## Additional Resources + +- Learn more about [Azure AI Foundry Evaluations.](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-approach) +- Learn more about how to run an automated AI red teaming scan in our [how-to documentation.](https://aka.ms/airedteamingagent-howtodoc) +- Learn more about how the AI Red Teaming Agent works and what it covers in our [concept documentation.](https://aka.ms/airedteamingagent-conceptdoc)