Skip to content

Commit 439f471

Browse files
slister1001Copilot
andauthored
[evaluation] Fix multi-turn red team attacks broken by PyRIT 0.11 (#46444)
* [evaluation] Fix multi-turn red team attacks broken by PyRIT 0.11 PyRIT 0.11 introduced two bugs in RedTeamingAttack that prevent multi-turn red team attacks from running: 1. RedTeamingAttack._setup_async() adds prepended_conversation messages to the adversarial chat memory BEFORE calling set_system_prompt(). The default PromptChatTarget.set_system_prompt then raises `RuntimeError: Conversation already exists, system prompt needs to be set at the beginning`. CrescendoAttack avoids this by embedding context in the system prompt template, so it does not regress. 2. RedTeamingAttack._generate_next_prompt_async returns context.next_message directly without calling .duplicate_message(). PromptNormalizer.send_prompt_async then deepcopies the message but preserves the MessagePiece id, so re-inserting the same id into memory raises sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id. Notably PromptSendingAttack._build_message uses .duplicate_message() — the intended pattern. Together these two bugs cause Foundry red team eval runs with the MultiTurn (and other RedTeamingAttack-based) strategies to silently fail and surface only baseline results in the UI. Workaround applied at the SDK level to avoid bumping PyRIT: - Bug #1: instance-scoped patch of set_system_prompt on the adversarial chat target. The patched version inserts the system message into memory via add_message_to_memory when prior messages exist, instead of raising. Scope is limited to the AzureRAIServiceTarget instance created by the scan, so no global PyRIT class is mutated. - Bug #2: module-level monkey-patch of RedTeamingAttack._generate_next_prompt_async that wraps the returned message in .duplicate_message(). The patch is idempotent and applied once at SDK module load. Verified locally with both a callback target and an Azure OpenAI model target (gpt-4o-mini): multi-turn attacks now execute the full conversation loop and produce expected results in violence_multi_turn_results.jsonl and final_results.json. Work item: https://msdata.visualstudio.com/Vienna/_workitems/edit/5166253 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update CHANGELOG for multi-turn PyRIT fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Move changelog entry to 1.16.6 and bump version 1.16.5 is already released (2026-04-08). Put this fix under 1.16.6 (Unreleased) to match in-flight version increment PR #46222. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Copilot review: version-guard PyRIT 0.11 patches and add unit tests - Add _is_affected_pyrit_version() check; both patches early-return for pyrit versions other than 0.11.x so a future fix or signature change isn't masked. - Add tests/unittests/test_redteam/test_pyrit_workarounds.py covering: * patch is applied (marker on RedTeamingAttack._generate_next_prompt_async) * patched method calls .duplicate_message() on the returned message * None pass-through (no AttributeError) * idempotent re-application (no double-wrapping) The set_system_prompt instance patch remains covered by the local E2E reproductions (callback target + gpt-4o-mini); unit testing it would require constructing an AzureRAIServiceTarget with full memory plumbing and offers little value beyond E2E. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Strengthen multi-turn E2E test + re-record on PyRIT 0.11 Previous assertion (len(conversation) >= 2) was too weak to catch the PyRIT 0.11 set_system_prompt bug — any attack silently dropped from attack_details still passed. Now require multi_turn attack present + >= 4 messages per multi-turn conversation. Re-recorded against patched SDK so playback validates the fix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 481e01f commit 439f471

6 files changed

Lines changed: 254 additions & 5 deletions

File tree

sdk/evaluation/azure-ai-evaluation/CHANGELOG.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,18 @@
11
# Release History
22

3-
## 1.16.5 (Unreleased)
3+
## 1.16.6 (Unreleased)
4+
5+
### Features Added
6+
7+
### Breaking Changes
8+
9+
### Bugs Fixed
10+
11+
- Fixed multi-turn red team attacks (`RedTeamingAttack`-based strategies like `MultiTurn`) failing silently with PyRIT 0.11. Two bugs were patched at the SDK level: (1) `RedTeamingAttack._setup_async` raised `RuntimeError: Conversation already exists` because it seeded prepended conversation messages before calling `set_system_prompt`; now patched per-instance on the adversarial chat target to tolerate existing conversation history. (2) `RedTeamingAttack._generate_next_prompt_async` returned `context.next_message` without calling `.duplicate_message()`, causing `sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id` on the second turn; now patched at module load with an idempotent wrapper that duplicates the message before returning.
12+
13+
### Other Changes
14+
15+
## 1.16.5 (2026-04-08)
416

517
### Features Added
618

sdk/evaluation/azure-ai-evaluation/assets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
"AssetsRepo": "Azure/azure-sdk-assets",
33
"AssetsRepoPrefixPath": "python",
44
"TagPrefix": "python/evaluation/azure-ai-evaluation",
5-
"Tag": "python/evaluation/azure-ai-evaluation_baead44c3f"
5+
"Tag": "python/evaluation/azure-ai-evaluation_67d91b0617"
66
}

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
# ---------------------------------------------------------
44
# represents upcoming version
55

6-
VERSION = "1.16.5"
6+
VERSION = "1.16.6"

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_red_team.py

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,117 @@ def _safe_tqdm_write(msg: str) -> None:
2525
tqdm.write(msg.encode(sys.stdout.encoding or "utf-8", errors="replace").decode(sys.stdout.encoding or "utf-8"))
2626

2727

28+
def _is_affected_pyrit_version() -> bool:
29+
"""Return True if the installed PyRIT version contains the bugs these patches work around.
30+
31+
The bugs targeted by ``_patch_set_system_prompt_for_prepended_conversations`` and
32+
``_patch_red_teaming_attack_duplicate_message`` were introduced in PyRIT 0.11. If a future
33+
PyRIT release fixes the bugs (or changes the method semantics/signature), we should not silently
34+
keep patching — that would risk masking a real fix or breaking on a renamed/refactored method.
35+
"""
36+
try:
37+
from importlib.metadata import version as _pkg_version, PackageNotFoundError
38+
except ImportError: # pragma: no cover - importlib.metadata is stdlib on 3.8+
39+
return False
40+
try:
41+
installed = _pkg_version("pyrit")
42+
except PackageNotFoundError:
43+
return False
44+
return installed.startswith("0.11.")
45+
46+
47+
def _patch_set_system_prompt_for_prepended_conversations(target, logger) -> None:
48+
"""Patch ``set_system_prompt`` on a PromptChatTarget instance to tolerate existing conversations.
49+
50+
Workaround for PyRIT 0.11 bug in ``RedTeamingAttack._setup_async()``: it adds
51+
``prepended_conversation`` messages to the adversarial chat target's memory BEFORE calling
52+
``set_system_prompt()``. The default ``PromptChatTarget.set_system_prompt`` then raises
53+
``RuntimeError("Conversation already exists, system prompt needs to be set at the beginning")``,
54+
which kills multi-turn red teaming attacks that have any context (e.g. seed-based context items).
55+
56+
``CrescendoAttack`` avoids this by embedding prepended conversation as text inside the system
57+
prompt template, so it never triggers the bug.
58+
59+
This patch replaces ``set_system_prompt`` on the given instance with a tolerant version that
60+
inserts the system message into memory even when prior messages exist, instead of raising.
61+
Scope is limited to the instance passed in (no global monkey-patch of PyRIT classes).
62+
"""
63+
if not _is_affected_pyrit_version():
64+
return
65+
try:
66+
from pyrit.models import MessagePiece
67+
except ImportError:
68+
logger.warning("Could not import MessagePiece from pyrit.models; skipping set_system_prompt patch.")
69+
return
70+
71+
def _tolerant_set_system_prompt(
72+
*,
73+
system_prompt: str,
74+
conversation_id: str,
75+
attack_identifier=None,
76+
labels=None,
77+
) -> None:
78+
existing = target._memory.get_conversation(conversation_id=conversation_id)
79+
if existing:
80+
logger.debug(
81+
"Adversarial chat conversation %s already has %d message(s) (prepended_conversation). "
82+
"Inserting system prompt without raising (PyRIT 0.11 RedTeamingAttack workaround).",
83+
conversation_id,
84+
len(existing),
85+
)
86+
target._memory.add_message_to_memory(
87+
request=MessagePiece(
88+
role="system",
89+
conversation_id=conversation_id,
90+
original_value=system_prompt,
91+
converted_value=system_prompt,
92+
prompt_target_identifier=target.get_identifier(),
93+
attack_identifier=attack_identifier,
94+
labels=labels,
95+
).to_message()
96+
)
97+
98+
target.set_system_prompt = _tolerant_set_system_prompt
99+
100+
101+
def _patch_red_teaming_attack_duplicate_message() -> None:
102+
"""Module-level monkey-patch for PyRIT 0.11 ``RedTeamingAttack._generate_next_prompt_async``.
103+
104+
In PyRIT 0.11, ``RedTeamingAttack._generate_next_prompt_async`` returns ``context.next_message``
105+
directly without calling ``.duplicate_message()``. ``PromptSendingAttack._build_message`` (the
106+
single-turn counterpart) uses ``context.next_message.duplicate_message()`` — the intended
107+
pattern. The missing duplicate causes ``sqlite3.IntegrityError: UNIQUE constraint failed:
108+
PromptMemoryEntries.id`` when ``PromptNormalizer.send_prompt_async`` deepcopies the message
109+
(preserving the ``MessagePiece`` id) and the normalizer inserts it into memory — any repeat
110+
insertion of an identical id hits the UNIQUE constraint and crashes the attack.
111+
112+
This patch wraps the method so the returned message always carries fresh piece ids.
113+
The patch is idempotent and applied once at SDK module load.
114+
"""
115+
if not _is_affected_pyrit_version():
116+
return
117+
try:
118+
from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
119+
except ImportError:
120+
return
121+
122+
original = getattr(RedTeamingAttack, "_generate_next_prompt_async", None)
123+
if original is None or getattr(original, "_az_eval_patched", False):
124+
return
125+
126+
async def _patched_generate_next_prompt_async(self, context):
127+
msg = await original(self, context)
128+
if msg is not None and hasattr(msg, "duplicate_message"):
129+
return msg.duplicate_message()
130+
return msg
131+
132+
_patched_generate_next_prompt_async._az_eval_patched = True # type: ignore[attr-defined]
133+
RedTeamingAttack._generate_next_prompt_async = _patched_generate_next_prompt_async
134+
135+
136+
_patch_red_teaming_attack_duplicate_message()
137+
138+
28139
# Azure AI Evaluation imports
29140
from azure.ai.evaluation._constants import TokenScope
30141
from azure.ai.evaluation._common._experimental import experimental
@@ -1765,6 +1876,12 @@ async def _execute_attacks_with_foundry(
17651876
crescendo_format=is_crescendo,
17661877
)
17671878

1879+
# Workaround for PyRIT 0.11 RedTeamingAttack._setup_async bug: it adds
1880+
# prepended_conversation to memory before calling set_system_prompt(),
1881+
# which then raises "Conversation already exists". Without this patch,
1882+
# multi-turn attacks fail whenever the seed has context items.
1883+
_patch_set_system_prompt_for_prepended_conversations(adversarial_chat, self.logger)
1884+
17681885
foundry_manager = FoundryExecutionManager(
17691886
credential=self.credential,
17701887
azure_ai_project=self.azure_ai_project,

sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_red_team_foundry.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -910,11 +910,30 @@ def defensive_target(query: str) -> str:
910910
assert result.attack_details is not None
911911
assert len(result.attack_details) > 0
912912

913+
# Require that at least one multi_turn attack actually produced a result.
914+
# Without this check, a silent failure during scoring/generation that drops
915+
# the multi_turn attack from `attack_details` would still let the test pass
916+
# (the loop below would only see baseline). Regressions in the multi-turn
917+
# code path must surface as test failures.
918+
multi_turn_attacks = [a for a in result.attack_details if a.get("attack_technique") == "multi_turn"]
919+
assert len(multi_turn_attacks) >= 1, (
920+
"Expected at least one multi_turn attack in result.attack_details, "
921+
f"got attack_techniques={[a.get('attack_technique') for a in result.attack_details]}"
922+
)
923+
913924
for attack in result.attack_details:
914925
conversation = attack["conversation"]
915926
if attack["attack_technique"] == "multi_turn":
916-
# Multi-turn attacks attempt multiple turns but may terminate early
917-
assert len(conversation) >= 2, "Multi-turn attack should have at least 2 messages"
927+
# Multi-turn attacks must execute at least 2 full turns (>= 4 messages).
928+
# The previous threshold (>= 2) was satisfied by a single user/assistant
929+
# exchange, so it would silently pass even when the multi-turn loop
930+
# crashed after the first turn (e.g. the PyRIT 0.11
931+
# `_generate_next_prompt_async` deepcopy / sqlite UNIQUE-constraint
932+
# regression). Requiring >= 4 messages ensures `_generate_next_prompt_async`
933+
# was invoked successfully a second time.
934+
assert (
935+
len(conversation) >= 4
936+
), f"Multi-turn attack should produce at least 2 turns (>=4 messages); got {len(conversation)}"
918937
else:
919938
assert len(conversation) >= 2
920939

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
"""Regression tests for the PyRIT 0.11 multi-turn red team workarounds.
2+
3+
These guard the monkey-patches in ``azure.ai.evaluation.red_team._red_team`` against
4+
silent regressions. See PR #46444 / Vienna#5166253.
5+
"""
6+
7+
import asyncio
8+
from types import SimpleNamespace
9+
from unittest.mock import MagicMock
10+
11+
import pytest
12+
13+
from azure.ai.evaluation.red_team._red_team import (
14+
_is_affected_pyrit_version,
15+
_patch_red_teaming_attack_duplicate_message,
16+
)
17+
18+
19+
pytestmark = pytest.mark.skipif(
20+
not _is_affected_pyrit_version(),
21+
reason="Workarounds only apply to PyRIT 0.11.x",
22+
)
23+
24+
25+
def test_duplicate_message_patch_is_applied_to_red_teaming_attack():
26+
"""The module-level patch should mark RedTeamingAttack._generate_next_prompt_async as patched."""
27+
from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
28+
29+
method = RedTeamingAttack._generate_next_prompt_async
30+
assert getattr(method, "_az_eval_patched", False) is True, (
31+
"Expected RedTeamingAttack._generate_next_prompt_async to be patched at SDK import time. "
32+
"Without the patch, multi-turn attacks crash with sqlite3.IntegrityError on the second turn."
33+
)
34+
35+
36+
def test_duplicate_message_patch_calls_duplicate_on_returned_message():
37+
"""The patched method must call .duplicate_message() on whatever the original returns.
38+
39+
PromptNormalizer.send_prompt_async deepcopies but preserves piece ids; without
40+
.duplicate_message() the second turn re-inserts the same id and triggers
41+
sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id.
42+
"""
43+
from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
44+
45+
duplicated_sentinel = MagicMock(name="duplicated_message")
46+
fake_msg = MagicMock(name="original_next_message")
47+
fake_msg.duplicate_message.return_value = duplicated_sentinel
48+
49+
captured = {}
50+
51+
async def fake_original(self, context):
52+
captured["called"] = True
53+
return fake_msg
54+
55+
fake_original._az_eval_patched = False # type: ignore[attr-defined]
56+
57+
saved = RedTeamingAttack._generate_next_prompt_async
58+
try:
59+
RedTeamingAttack._generate_next_prompt_async = fake_original
60+
_patch_red_teaming_attack_duplicate_message()
61+
result = asyncio.get_event_loop().run_until_complete(
62+
RedTeamingAttack._generate_next_prompt_async(SimpleNamespace(), SimpleNamespace())
63+
)
64+
finally:
65+
RedTeamingAttack._generate_next_prompt_async = saved
66+
67+
assert captured.get("called") is True, "Patch should delegate to the original method."
68+
fake_msg.duplicate_message.assert_called_once()
69+
assert result is duplicated_sentinel, "Patched method must return the duplicated message, not the original."
70+
71+
72+
def test_duplicate_message_patch_passes_through_none():
73+
"""If the original returns None, the patch must not crash trying to duplicate it."""
74+
from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
75+
76+
async def fake_original(self, context):
77+
return None
78+
79+
fake_original._az_eval_patched = False # type: ignore[attr-defined]
80+
81+
saved = RedTeamingAttack._generate_next_prompt_async
82+
try:
83+
RedTeamingAttack._generate_next_prompt_async = fake_original
84+
_patch_red_teaming_attack_duplicate_message()
85+
result = asyncio.get_event_loop().run_until_complete(
86+
RedTeamingAttack._generate_next_prompt_async(SimpleNamespace(), SimpleNamespace())
87+
)
88+
finally:
89+
RedTeamingAttack._generate_next_prompt_async = saved
90+
91+
assert result is None
92+
93+
94+
def test_duplicate_message_patch_is_idempotent():
95+
"""Re-applying the patch on an already-patched method should be a no-op (no double-wrapping)."""
96+
from pyrit.executor.attack.multi_turn.red_teaming import RedTeamingAttack
97+
98+
method_before = RedTeamingAttack._generate_next_prompt_async
99+
_patch_red_teaming_attack_duplicate_message()
100+
method_after = RedTeamingAttack._generate_next_prompt_async
101+
assert method_before is method_after, "Re-running the patch must not wrap the already-patched method a second time."

0 commit comments

Comments
 (0)