You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix SDL false-pass by normalizing string context in attack objectives (#46445)
Pre-curated sensitive_data_leakage attack objectives store messages[0].context
as a string (document text) with sibling context_type/tool_name fields. The
_extract_context_items helper in the Foundry execution path only handled list
and dict shapes, so the document was silently dropped. The context_type
fallback then synthesized a context item from the user prompt, so the agent
never saw the sensitive document content and could not leak it — the
evaluator scored every attempt as a pass (100% false-negative rate).
Fix:
- Handle str context at both the per-message and top-level blocks.
- Normalize raw string entries inside list-shaped context via a new
_normalize_context_list helper.
- Gate the context_type fallback so it only runs when no usable context was
produced, covering both missing-key and context:null cases.
Added unit tests covering string context, null fallback, list-of-strings
normalization, top-level string context, and an integration test that runs
the extracted items through DatasetConfigurationBuilder.add_objective_with_context
and verifies the resulting context SeedPrompt carries the document text plus
tool_name/context_type metadata.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy file name to clipboardExpand all lines: sdk/evaluation/azure-ai-evaluation/CHANGELOG.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@
9
9
### Bugs Fixed
10
10
11
11
- Fixed multi-turn red team attacks (`RedTeamingAttack`-based strategies like `MultiTurn`) failing silently with PyRIT 0.11. Two bugs were patched at the SDK level: (1) `RedTeamingAttack._setup_async` raised `RuntimeError: Conversation already exists` because it seeded prepended conversation messages before calling `set_system_prompt`; now patched per-instance on the adversarial chat target to tolerate existing conversation history. (2) `RedTeamingAttack._generate_next_prompt_async` returned `context.next_message` without calling `.duplicate_message()`, causing `sqlite3.IntegrityError: UNIQUE constraint failed: PromptMemoryEntries.id` on the second turn; now patched at module load with an idempotent wrapper that duplicates the message before returning.
12
+
- Fixed `sensitive_data_leakage` red team attacks producing 100% false-pass rates. `_extract_context_items` in the Foundry execution path only handled `list` or `dict` shapes for `messages[0].context`; pre-curated SDL attack objectives store the document text as a `str` with sibling `context_type`/`tool_name` fields, so the document was silently dropped and a fallback synthesized a context item from the user prompt. The agent never received the sensitive document content and could not leak it, causing the evaluator to score every attempt as a pass. Added `str` handling (both message-level and top-level), normalized raw string entries inside list-shaped context, and gated the `context_type` fallback so it only runs when no usable context was extracted (including the `context: null` case).
0 commit comments