Skip to content

Commit 19378be

Browse files
slister1001Copilot
andauthored
Fix SDL false-pass by normalizing string context in attack objectives (#46445)
Pre-curated sensitive_data_leakage attack objectives store messages[0].context as a string (document text) with sibling context_type/tool_name fields. The _extract_context_items helper in the Foundry execution path only handled list and dict shapes, so the document was silently dropped. The context_type fallback then synthesized a context item from the user prompt, so the agent never saw the sensitive document content and could not leak it — the evaluator scored every attempt as a pass (100% false-negative rate). Fix: - Handle str context at both the per-message and top-level blocks. - Normalize raw string entries inside list-shaped context via a new _normalize_context_list helper. - Gate the context_type fallback so it only runs when no usable context was produced, covering both missing-key and context:null cases. Added unit tests covering string context, null fallback, list-of-strings normalization, top-level string context, and an integration test that runs the extracted items through DatasetConfigurationBuilder.add_objective_with_context and verifies the resulting context SeedPrompt carries the document text plus tool_name/context_type metadata. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 1e6077d commit 19378be

3 files changed

Lines changed: 493 additions & 13 deletions

File tree

sdk/evaluation/azure-ai-evaluation/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
### Bugs Fixed
1010

1111
- Fixed multi-turn red team attacks (`RedTeamingAttack`-based strategies like `MultiTurn`) failing silently with PyRIT 0.11. Two bugs were patched at the SDK level: (1) `RedTeamingAttack._setup_async` raised `RuntimeError: Conversation already exists` because it seeded prepended conversation messages before calling `set_system_prompt`; now patched per-instance on the adversarial chat target to tolerate existing conversation history. (2) `RedTeamingAttack._generate_next_prompt_async` returned `context.next_message` without calling `.duplicate_message()`, causing `sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id` on the second turn; now patched at module load with an idempotent wrapper that duplicates the message before returning.
12+
- Fixed `sensitive_data_leakage` red team attacks producing 100% false-pass rates. `_extract_context_items` in the Foundry execution path only handled `list` or `dict` shapes for `messages[0].context`; pre-curated SDL attack objectives store the document text as a `str` with sibling `context_type`/`tool_name` fields, so the document was silently dropped and a fallback synthesized a context item from the user prompt. The agent never received the sensitive document content and could not leak it, causing the evaluator to score every attempt as a pass. Added `str` handling (both message-level and top-level), normalized raw string entries inside list-shaped context, and gated the `context_type` fallback so it only runs when no usable context was extracted (including the `context: null` case).
1213

1314
### Other Changes
1415

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_execution_manager.py

Lines changed: 119 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,21 @@
2020
from ._strategy_mapping import StrategyMapper
2121

2222

23+
def _has_usable_content(item: Any) -> bool:
24+
"""Return True when ``item`` is a context dict whose ``content`` survives
25+
the downstream ``DatasetConfigurationBuilder.add_objective_with_context``
26+
falsy-content filter (it skips items where ``not content`` evaluates True).
27+
"""
28+
if not isinstance(item, dict):
29+
return False
30+
content = item.get("content")
31+
if content is None:
32+
return False
33+
if isinstance(content, str):
34+
return bool(content.strip())
35+
return bool(content)
36+
37+
2338
class FoundryExecutionManager:
2439
"""Manages Foundry-based red team execution.
2540
@@ -329,16 +344,60 @@ def _extract_context_items(self, obj: Dict[str, Any]) -> List[Dict[str, Any]]:
329344
if "messages" in obj and obj["messages"]:
330345
first_msg = obj["messages"][0]
331346
if isinstance(first_msg, dict):
332-
# Check for context in message
347+
# Check for context in message. Pre-curated attack objectives
348+
# (e.g. sensitive_data_leakage) store the context document as a
349+
# string alongside sibling ``context_type``/``tool_name`` fields
350+
# on the same message, so a string value must be normalized into
351+
# a context item that carries the document text (not the user
352+
# prompt). Missing the str branch here caused the document to be
353+
# silently dropped; the fallback below then synthesized an item
354+
# whose ``content`` was the user prompt, so downstream tool
355+
# injections returned the prompt instead of real context.
356+
# ``produced_message_context`` gates the ``context_type``
357+
# fallback below. We only set it when at least one extracted
358+
# item carries non-empty ``content`` — otherwise the downstream
359+
# ``DatasetConfigurationBuilder.add_objective_with_context``
360+
# would silently drop the empty item AND the suppressed
361+
# fallback would leave the agent with no context at all.
362+
# Drop unusable entries at extraction time (rather than
363+
# appending and letting ``_dataset_builder`` filter them
364+
# downstream) so the output of this method is the actual set
365+
# of context items the agent will receive — no junk dict/empty
366+
# string entries that confuse later consumers.
367+
produced_message_context = False
333368
if "context" in first_msg:
334369
ctx = first_msg["context"]
335370
if isinstance(ctx, list):
336-
context_items.extend(ctx)
337-
elif isinstance(ctx, dict):
371+
normalized = [
372+
item
373+
for item in self._normalize_context_list(
374+
ctx,
375+
first_msg.get("context_type"),
376+
first_msg.get("tool_name"),
377+
)
378+
if _has_usable_content(item)
379+
]
380+
context_items.extend(normalized)
381+
produced_message_context = bool(normalized)
382+
elif isinstance(ctx, dict) and _has_usable_content(ctx):
338383
context_items.append(ctx)
339-
340-
# Also check for separate context fields
341-
if "context_type" in first_msg:
384+
produced_message_context = True
385+
elif isinstance(ctx, str) and ctx.strip():
386+
context_items.append(
387+
{
388+
"content": ctx,
389+
"context_type": first_msg.get("context_type"),
390+
"tool_name": first_msg.get("tool_name"),
391+
}
392+
)
393+
produced_message_context = True
394+
# Fall back to synthesizing a context item from sibling fields
395+
# only when no usable ``context`` value was found above. This
396+
# preserves backward compatibility for objectives that carry
397+
# only ``context_type`` (or have ``context: null``) while
398+
# avoiding the duplicate-with-wrong-content append that caused
399+
# the SDL false-pass bug.
400+
if not produced_message_context and "context_type" in first_msg:
342401
context_items.append(
343402
{
344403
"content": first_msg.get("content", ""),
@@ -347,16 +406,67 @@ def _extract_context_items(self, obj: Dict[str, Any]) -> List[Dict[str, Any]]:
347406
}
348407
)
349408

350-
# Top-level context
409+
# Top-level context. Mirrors the per-message handling above: drop
410+
# unusable entries (empty strings, dicts/list items without usable
411+
# ``content``) so this method's output reflects only items the agent
412+
# will actually receive.
351413
if "context" in obj:
352414
ctx = obj["context"]
353415
if isinstance(ctx, list):
354-
context_items.extend(ctx)
355-
elif isinstance(ctx, dict):
416+
context_items.extend(
417+
item
418+
for item in self._normalize_context_list(ctx, obj.get("context_type"), obj.get("tool_name"))
419+
if _has_usable_content(item)
420+
)
421+
elif isinstance(ctx, dict) and _has_usable_content(ctx):
356422
context_items.append(ctx)
423+
elif isinstance(ctx, str) and ctx.strip():
424+
context_items.append(
425+
{
426+
"content": ctx,
427+
"context_type": obj.get("context_type"),
428+
"tool_name": obj.get("tool_name"),
429+
}
430+
)
357431

358432
return context_items
359433

434+
def _normalize_context_list(
435+
self,
436+
items: List[Any],
437+
default_context_type: Optional[str],
438+
default_tool_name: Optional[str],
439+
) -> List[Dict[str, Any]]:
440+
"""Normalize a list of context entries to dict shape.
441+
442+
Some objective shapes provide ``context`` as a list whose entries may
443+
be either dicts (already in the expected shape) or raw strings. Raw
444+
strings would otherwise propagate through and be dropped by downstream
445+
consumers (e.g. ``_dataset_builder``) that only iterate dict items.
446+
447+
:param items: Raw context entries.
448+
:type items: List[Any]
449+
:param default_context_type: ``context_type`` to apply to string entries.
450+
:type default_context_type: Optional[str]
451+
:param default_tool_name: ``tool_name`` to apply to string entries.
452+
:type default_tool_name: Optional[str]
453+
:return: Normalized list of dict context items.
454+
:rtype: List[Dict[str, Any]]
455+
"""
456+
normalized: List[Dict[str, Any]] = []
457+
for item in items:
458+
if isinstance(item, dict):
459+
normalized.append(item)
460+
elif isinstance(item, str) and item.strip():
461+
normalized.append(
462+
{
463+
"content": item,
464+
"context_type": default_context_type,
465+
"tool_name": default_tool_name,
466+
}
467+
)
468+
return normalized
469+
360470
def _group_results_by_strategy(
361471
self,
362472
orchestrator: ScenarioOrchestrator,

0 commit comments

Comments
 (0)