[evaluation] Fix multi-turn red team attacks broken by PyRIT 0.11 (#46444)

slister1001 · Copilot · web-flow · commit 439f471e0688 · 2026-04-21T16:05:45.000-04:00
* [evaluation] Fix multi-turn red team attacks broken by PyRIT 0.11 PyRIT 0.11 introduced two bugs in RedTeamingAttack that prevent multi-turn red team attacks from running: 1. RedTeamingAttack._setup_async() adds prepended_conversation messages to the adversarial chat memory BEFORE calling set_system_prompt(). The default PromptChatTarget.set_system_prompt then raises `RuntimeError: Conversation already exists, system prompt needs to be set at the beginning`. CrescendoAttack avoids this by embedding context in the system prompt template, so it does not regress. 2. RedTeamingAttack._generate_next_prompt_async returns context.next_message directly without calling .duplicate_message(). PromptNormalizer.send_prompt_async then deepcopies the message but preserves the MessagePiece id, so re-inserting the same id into memory raises sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id. Notably PromptSendingAttack._build_message uses .duplicate_message() — the intended pattern. Together these two bugs cause Foundry red team eval runs with the MultiTurn (and other RedTeamingAttack-based) strategies to silently fail and surface only baseline results in the UI. Workaround applied at the SDK level to avoid bumping PyRIT: - Bug #1: instance-scoped patch of set_system_prompt on the adversarial chat target. The patched version inserts the system message into memory via add_message_to_memory when prior messages exist, instead of raising. Scope is limited to the AzureRAIServiceTarget instance created by the scan, so no global PyRIT class is mutated. - Bug #2: module-level monkey-patch of RedTeamingAttack._generate_next_prompt_async that wraps the returned message in .duplicate_message(). The patch is idempotent and applied once at SDK module load. Verified locally with both a callback target and an Azure OpenAI model target (gpt-4o-mini): multi-turn attacks now execute the full conversation loop and produce expected results in violence_multi_turn_results.jsonl and final_results.json. Work item: https://msdata.visualstudio.com/Vienna/_workitems/edit/5166253 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update CHANGELOG for multi-turn PyRIT fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Move changelog entry to 1.16.6 and bump version 1.16.5 is already released (2026-04-08). Put this fix under 1.16.6 (Unreleased) to match in-flight version increment PR #46222. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Copilot review: version-guard PyRIT 0.11 patches and add unit tests - Add _is_affected_pyrit_version() check; both patches early-return for pyrit versions other than 0.11.x so a future fix or signature change isn't masked. - Add tests/unittests/test_redteam/test_pyrit_workarounds.py covering: * patch is applied (marker on RedTeamingAttack._generate_next_prompt_async) * patched method calls .duplicate_message() on the returned message * None pass-through (no AttributeError) * idempotent re-application (no double-wrapping) The set_system_prompt instance patch remains covered by the local E2E reproductions (callback target + gpt-4o-mini); unit testing it would require constructing an AzureRAIServiceTarget with full memory plumbing and offers little value beyond E2E. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Strengthen multi-turn E2E test + re-record on PyRIT 0.11 Previous assertion (len(conversation) >= 2) was too weak to catch the PyRIT 0.11 set_system_prompt bug — any attack silently dropped from attack_details still passed. Now require multi_turn attack present + >= 4 messages per multi-turn conversation. Re-recorded against patched SDK so playback validates the fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
diff --git a/sdk/evaluation/azure-ai-evaluation/CHANGELOG.md b/sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
@@ -1,6 +1,18 @@
 # Release History
 
-## 1.16.5 (Unreleased)
+## 1.16.6 (Unreleased)
+
+### Features Added
+
+### Breaking Changes
+
+### Bugs Fixed
+
+- Fixed multi-turn red team attacks (`RedTeamingAttack`-based strategies like `MultiTurn`) failing silently with PyRIT 0.11. Two bugs were patched at the SDK level: (1) `RedTeamingAttack._setup_async` raised `RuntimeError: Conversation already exists` because it seeded prepended conversation messages before calling `set_system_prompt`; now patched per-instance on the adversarial chat target to tolerate existing conversation history. (2) `RedTeamingAttack._generate_next_prompt_async` returned `context.next_message` without calling `.duplicate_message()`, causing `sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id` on the second turn; now patched at module load with an idempotent wrapper that duplicates the message before returning.
+
+### Other Changes
+
+## 1.16.5 (2026-04-08)
 
 ### Features Added
 
diff --git a/sdk/evaluation/azure-ai-evaluation/assets.json b/sdk/evaluation/azure-ai-evaluation/assets.json
@@ -2,5 +2,5 @@
   "AssetsRepo": "Azure/azure-sdk-assets",
   "AssetsRepoPrefixPath": "python",
   "TagPrefix": "python/evaluation/azure-ai-evaluation",
-  "Tag": "python/evaluation/azure-ai-evaluation_baead44c3f"
+  "Tag": "python/evaluation/azure-ai-evaluation_67d91b0617"
 }
diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_version.py b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_version.py
@@ -3,4 +3,4 @@
 # ---------------------------------------------------------
 # represents upcoming version
 
-VERSION = "1.16.5"
+VERSION = "1.16.6"
diff --git a/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_red_team.py b/sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_red_team.py
@@ -25,6 +25,117 @@ def _safe_tqdm_write(msg: str) -> None:
         tqdm.write(msg.encode(sys.stdout.encoding or "utf-8", errors="replace").decode(sys.stdout.encoding or "utf-8"))
 
 
+def _is_affected_pyrit_version() -> bool:
+    """Return True if the installed PyRIT version contains the bugs these patches work around.
+
+    The bugs targeted by ``_patch_set_system_prompt_for_prepended_conversations`` and
+    ``_patch_red_teaming_attack_duplicate_message`` were introduced in PyRIT 0.11. If a future
+    PyRIT release fixes the bugs (or changes the method semantics/signature), we should not silently
+    keep patching — that would risk masking a real fix or breaking on a renamed/refactored method.
+    """
+    try:
+        from importlib.metadata import version as _pkg_version, PackageNotFoundError
+    except ImportError:  # pragma: no cover - importlib.metadata is stdlib on 3.8+
+        return False
+    try:
+        installed = _pkg_version("pyrit")
+    except PackageNotFoundError:
+        return False
+    return installed.startswith("0.11.")
+
+
+def _patch_set_system_prompt_for_prepended_conversations(target, logger) -> None:
+    """Patch ``set_system_prompt`` on a PromptChatTarget instance to tolerate existing conversations.
+
+    Workaround for PyRIT 0.11 bug in ``RedTeamingAttack._setup_async()``: it adds
+    ``prepended_conversation`` messages to the adversarial chat target's memory BEFORE calling
+    ``set_system_prompt()``. The default ``PromptChatTarget.set_system_prompt`` then raises
+    ``RuntimeError("Conversation already exists, system prompt needs to be set at the beginning")``,
+    which kills multi-turn red teaming attacks that have any context (e.g. seed-based context items).
+
+    ``CrescendoAttack`` avoids this by embedding prepended conversation as text inside the system
+    prompt template, so it never triggers the bug.
+
+    This patch replaces ``set_system_prompt`` on the given instance with a tolerant version that
+    inserts the system message into memory even when prior messages exist, instead of raising.
+    Scope is limited to the instance passed in (no global monkey-patch of PyRIT classes).
+    """
+    if not _is_affected_pyrit_version():
+        return
+    try:
+        from pyrit.models import MessagePiece
+    except ImportError:
+        logger.warning("Could not import MessagePiece from pyrit.models; skipping set_system_prompt patch.")
+        return
+
+    def _tolerant_set_system_prompt(
+        *,
+        system_prompt: str,
+        conversation_id: str,
+        attack_identifier=None,
+        labels=None,
+    ) -> None:
+        existing = target._memory.get_conversation(conversation_id=conversation_id)
+        if existing:
+            logger.debug(
+                "Adversarial chat conversation %s already has %d message(s) (prepended_conversation). "
+                "Inserting system prompt without raising (PyRIT 0.11 RedTeamingAttack workaround).",
+                conversation_id,
+                len(existing),
+            )
+        target._memory.add_message_to_memory(
+            request=MessagePiece(
+                role="system",
+                conversation_id=conversation_id,
+                original_value=system_prompt,
+                converted_value=system_prompt,
+                prompt_target_identifier=target.get_identifier(),
+                attack_identifier=attack_identifier,
+                labels=labels,
+            ).to_message()
+        )
+
+    target.set_system_prompt = _tolerant_set_system_prompt
+
+
+def _patch_red_teaming_attack_duplicate_message() -> None:
+    """Module-level monkey-patch for PyRIT 0.11 ``RedTeamingAttack._generate_next_prompt_async``.
+
+    In PyRIT 0.11, ``RedTeamingAttack._generate_next_prompt_async`` returns ``context.next_message``
+    directly without calling ``.duplicate_message()``. ``PromptSendingAttack._build_message`` (the
+    single-turn counterpart) uses ``context.next_message.duplicate_message()`` — the intended
+    pattern. The missing duplicate causes ``sqlite3.IntegrityError: UNIQUE constraint failed:
+    PromptMemoryEntries.id`` when ``PromptNormalizer.send_prompt_async`` deepcopies the message
+    (preserving the ``MessagePiece`` id) and the normalizer inserts it into memory — any repeat
+    insertion of an identical id hits the UNIQUE constraint and crashes the attack.
+
+    This patch wraps the method so the returned message always carries fresh piece ids.
+    The patch is idempotent and applied once at SDK module load.
+    """
+    if not _is_affected_pyrit_version():
+        return
+    try:
+        from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
+    except ImportError:
+        return
+
+    original = getattr(RedTeamingAttack, "_generate_next_prompt_async", None)
+    if original is None or getattr(original, "_az_eval_patched", False):
+        return
+
+    async def _patched_generate_next_prompt_async(self, context):
+        msg = await original(self, context)
+        if msg is not None and hasattr(msg, "duplicate_message"):
+            return msg.duplicate_message()
+        return msg
+
+    _patched_generate_next_prompt_async._az_eval_patched = True  # type: ignore[attr-defined]
+    RedTeamingAttack._generate_next_prompt_async = _patched_generate_next_prompt_async
+
+
+_patch_red_teaming_attack_duplicate_message()
+
+
 # Azure AI Evaluation imports
 from azure.ai.evaluation._constants import TokenScope
 from azure.ai.evaluation._common._experimental import experimental
@@ -1765,6 +1876,12 @@ async def _execute_attacks_with_foundry(
                 crescendo_format=is_crescendo,
             )
 
+            # Workaround for PyRIT 0.11 RedTeamingAttack._setup_async bug: it adds
+            # prepended_conversation to memory before calling set_system_prompt(),
+            # which then raises "Conversation already exists". Without this patch,
+            # multi-turn attacks fail whenever the seed has context items.
+            _patch_set_system_prompt_for_prepended_conversations(adversarial_chat, self.logger)
+
             foundry_manager = FoundryExecutionManager(
                 credential=self.credential,
                 azure_ai_project=self.azure_ai_project,
diff --git a/sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_red_team_foundry.py b/sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_red_team_foundry.py
@@ -910,11 +910,30 @@ def defensive_target(query: str) -> str:
         assert result.attack_details is not None
         assert len(result.attack_details) > 0
 
+        # Require that at least one multi_turn attack actually produced a result.
+        # Without this check, a silent failure during scoring/generation that drops
+        # the multi_turn attack from `attack_details` would still let the test pass
+        # (the loop below would only see baseline). Regressions in the multi-turn
+        # code path must surface as test failures.
+        multi_turn_attacks = [a for a in result.attack_details if a.get("attack_technique") == "multi_turn"]
+        assert len(multi_turn_attacks) >= 1, (
+            "Expected at least one multi_turn attack in result.attack_details, "
+            f"got attack_techniques={[a.get('attack_technique') for a in result.attack_details]}"
+        )
+
         for attack in result.attack_details:
             conversation = attack["conversation"]
             if attack["attack_technique"] == "multi_turn":
-                # Multi-turn attacks attempt multiple turns but may terminate early
-                assert len(conversation) >= 2, "Multi-turn attack should have at least 2 messages"
+                # Multi-turn attacks must execute at least 2 full turns (>= 4 messages).
+                # The previous threshold (>= 2) was satisfied by a single user/assistant
+                # exchange, so it would silently pass even when the multi-turn loop
+                # crashed after the first turn (e.g. the PyRIT 0.11
+                # `_generate_next_prompt_async` deepcopy / sqlite UNIQUE-constraint
+                # regression). Requiring >= 4 messages ensures `_generate_next_prompt_async`
+                # was invoked successfully a second time.
+                assert (
+                    len(conversation) >= 4
+                ), f"Multi-turn attack should produce at least 2 turns (>=4 messages); got {len(conversation)}"
             else:
                 assert len(conversation) >= 2
 
diff --git a/sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_pyrit_workarounds.py b/sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_pyrit_workarounds.py
@@ -0,0 +1,101 @@
+"""Regression tests for the PyRIT 0.11 multi-turn red team workarounds.
+
+These guard the monkey-patches in ``azure.ai.evaluation.red_team._red_team`` against
+silent regressions. See PR #46444 / Vienna#5166253.
+"""
+
+import asyncio
+from types import SimpleNamespace
+from unittest.mock import MagicMock
+
+import pytest
+
+from azure.ai.evaluation.red_team._red_team import (
+    _is_affected_pyrit_version,
+    _patch_red_teaming_attack_duplicate_message,
+)
+
+
+pytestmark = pytest.mark.skipif(
+    not _is_affected_pyrit_version(),
+    reason="Workarounds only apply to PyRIT 0.11.x",
+)
+
+
+def test_duplicate_message_patch_is_applied_to_red_teaming_attack():
+    """The module-level patch should mark RedTeamingAttack._generate_next_prompt_async as patched."""
+    from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
+
+    method = RedTeamingAttack._generate_next_prompt_async
+    assert getattr(method, "_az_eval_patched", False) is True, (
+        "Expected RedTeamingAttack._generate_next_prompt_async to be patched at SDK import time. "
+        "Without the patch, multi-turn attacks crash with sqlite3.IntegrityError on the second turn."
+    )
+
+
+def test_duplicate_message_patch_calls_duplicate_on_returned_message():
+    """The patched method must call .duplicate_message() on whatever the original returns.
+
+    PromptNormalizer.send_prompt_async deepcopies but preserves piece ids; without
+    .duplicate_message() the second turn re-inserts the same id and triggers
+    sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id.
+    """
+    from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
+
+    duplicated_sentinel = MagicMock(name="duplicated_message")
+    fake_msg = MagicMock(name="original_next_message")
+    fake_msg.duplicate_message.return_value = duplicated_sentinel
+
+    captured = {}
+
+    async def fake_original(self, context):
+        captured["called"] = True
+        return fake_msg
+
+    fake_original._az_eval_patched = False  # type: ignore[attr-defined]
+
+    saved = RedTeamingAttack._generate_next_prompt_async
+    try:
+        RedTeamingAttack._generate_next_prompt_async = fake_original
+        _patch_red_teaming_attack_duplicate_message()
+        result = asyncio.get_event_loop().run_until_complete(
+            RedTeamingAttack._generate_next_prompt_async(SimpleNamespace(), SimpleNamespace())
+        )
+    finally:
+        RedTeamingAttack._generate_next_prompt_async = saved
+
+    assert captured.get("called") is True, "Patch should delegate to the original method."
+    fake_msg.duplicate_message.assert_called_once()
+    assert result is duplicated_sentinel, "Patched method must return the duplicated message, not the original."
+
+
+def test_duplicate_message_patch_passes_through_none():
+    """If the original returns None, the patch must not crash trying to duplicate it."""
+    from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
+
+    async def fake_original(self, context):
+        return None
+
+    fake_original._az_eval_patched = False  # type: ignore[attr-defined]
+
+    saved = RedTeamingAttack._generate_next_prompt_async
+    try:
+        RedTeamingAttack._generate_next_prompt_async = fake_original
+        _patch_red_teaming_attack_duplicate_message()
+        result = asyncio.get_event_loop().run_until_complete(
+            RedTeamingAttack._generate_next_prompt_async(SimpleNamespace(), SimpleNamespace())
+        )
+    finally:
+        RedTeamingAttack._generate_next_prompt_async = saved
+
+    assert result is None
+
+
+def test_duplicate_message_patch_is_idempotent():
+    """Re-applying the patch on an already-patched method should be a no-op (no double-wrapping)."""
+    from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
+
+    method_before = RedTeamingAttack._generate_next_prompt_async
+    _patch_red_teaming_attack_duplicate_message()
+    method_after = RedTeamingAttack._generate_next_prompt_async
+    assert method_before is method_after, "Re-running the patch must not wrap the already-patched method a second time."

Original file line number	Diff line number	Diff line change
`@@ -2,5 +2,5 @@`
`2`	`2`	`"AssetsRepo": "Azure/azure-sdk-assets",`
`3`	`3`	`"AssetsRepoPrefixPath": "python",`
`4`	`4`	`"TagPrefix": "python/evaluation/azure-ai-evaluation",`
`5`		`- "Tag": "python/evaluation/azure-ai-evaluation_baead44c3f"`
	`5`	`+ "Tag": "python/evaluation/azure-ai-evaluation_67d91b0617"`
`6`	`6`	`}`