FEAT capability-aware multimodal feedback loop for Crescendo/RedTeaming/TAP by fitzpr · Pull Request #1377 · microsoft/PyRIT

fitzpr · 2026-02-19T14:40:30Z

Summary

This PR makes multimodal feedback routing capability-aware across multi-turn attacks so media can be forwarded end-to-end when targets support it.

Core behavior

Introduces a shared ModalityFeedbackRouter component for multi-turn attacks.
Uses each target's declared TargetCapabilities.input_modalities to decide when media should be forwarded.
Ensures first-turn seed validation for edit-only objective targets (no text-only path).

Attack integrations

Applied the router-driven logic to:

CrescendoAttack
RedTeamingAttack
TreeOfAttacksWithPruningAttack (TAP)

What this enables

Objective target media output can flow back to the adversarial chat with score text when the adversarial chat supports {text, <media_type>}.
Previous objective media can flow forward into next objective turns when objective target supports {text, <media_type>}.
Text-only paths remain unchanged when capability combinations do not support media forwarding.

Notebook / docs

Added and refreshed the multimodal executor demo:

doc/code/executor/8_modality_feedback.py
doc/code/executor/8_modality_feedback.ipynb
doc/code/executor/assets/three_masted_ship_color.jpg

Notebook now demonstrates:

seeded two-image Crescendo flow,
ordered turn-by-turn rendering (input pieces, generated image, score),
stronger identity constraints for the seeded raccoon anchor.

Prompt-template updates

pyrit/datasets/executors/crescendo/image_generation.yaml:

aligns response key names with executor expectations (next_message, rationale, last_response_summary),
strengthens seeded-anchor preservation guidance,
adds explicit non-human anchor preservation guidance (avoid humanization/substitution).

Naming/API consistency

For consistency with existing attack terminology:

ModalityFeedbackRouter(..., adversarial_target=...) ➜ adversarial_chat=...
objective_requires_media_on_first_turn ➜ objective_target_requires_media_on_first_turn

Tests

Updated/added tests covering router behavior and multi-turn integration:

tests/unit/executor/attack/component/test_modality_router.py
tests/unit/executor/attack/multi_turn/test_crescendo.py
tests/unit/executor/attack/multi_turn/test_red_teaming.py
tests/unit/executor/attack/multi_turn/test_supports_multi_turn_attacks.py
tests/unit/executor/attack/multi_turn/test_tree_of_attacks.py
tests/unit/executor/attack/test_attack_parameter_consistency.py
tests/unit/models/test_message_piece.py

When the objective target returns non-text content (images, video, etc.), the adversarial chat now receives a multimodal message containing both the scorer's textual feedback AND the actual generated media. This enables vision-capable adversarial LLMs (e.g. GPT-4o) to see what the target produced and craft more informed follow-up prompts. Changes: - _handle_adversarial_file_response: returns (feedback_text, media_piece) tuple instead of just the feedback string - _build_adversarial_prompt: returns Union[str, tuple] to propagate media - _generate_next_prompt_async: constructs multimodal Message with text + media pieces when file response detected; text-only path unchanged Tests: - Updated 2 existing tests for new tuple return type - Added 5 new tests in TestMultimodalFeedbackLoop: - image response produces multimodal message to adversarial chat - video response produces multimodal message to adversarial chat - text response stays text-only (no regression) - _build_adversarial_prompt returns tuple for image - _build_adversarial_prompt returns str for text

When a target response has data_type='error' (e.g. content filter block), treat it as text in OpenAIChatTarget's multimodal message builder instead of raising ValueError. This prevents crashes when conversation history contains error responses from prior turns.

romanlutz

This is a great contribution! Exactly what I was thinking of. We are missing something fundamental in PyRIT for this to work, though.

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

- Add SUPPORTED_INPUT_MODALITIES class attribute to PromptTarget base class - Add input_modality_supported() and supports_multimodal_input() methods - Add supported_input_modalities property that returns list of supported modalities - Add supported_input_modalities and supports_conversation_history fields to TargetIdentifier - Update PromptTarget._create_identifier() to populate new fields - Implement modality declarations in OpenAIChatTarget (text, image_path), TextTarget (text), and HuggingFaceChatTarget (text) - Add comprehensive tests for modality support detection This system enables attacks to detect whether targets support multimodal input (text + other modalities) and route accordingly, addressing the limitation mentioned in PR microsoft#1377 where multimodal attacks need to know target capabilities.

Address Roman's feedback items #2 and #3: - Change _build_adversarial_prompt to return Message instead of Union type - Extract message construction logic into separate helper methods - Add _build_text_message() for simple text prompts - Add _build_multimodal_message() for media responses - Simplify caller code by removing tuple handling logic - Improve logging to work with Message objects These architectural improvements prepare the code to integrate with the modality support detection system from separate PR.

- Add SUPPORTED_INPUT_MODALITIES class attribute to PromptTarget base class - Add input_modality_supported() and supports_multimodal_input() methods - Add supported_input_modalities property that returns list of supported modalities - Add supported_input_modalities and supports_conversation_history fields to TargetIdentifier - Update PromptTarget._create_identifier() to populate new fields - Implement modality declarations in OpenAIChatTarget (text, image_path), TextTarget (text), and HuggingFaceChatTarget (text) - Add comprehensive tests for modality support detection This system enables attacks to detect whether targets support multimodal input (text + other modalities) and route accordingly, addressing the limitation mentioned in PR microsoft#1377 where multimodal attacks need to know target capabilities.

Addresses all Roman's feedback from PR microsoft#1377: - Uses set[frozenset[PromptDataType]] instead of tuples - Exact frozenset matching prevents ordering issues - Implemented across all target types (OpenAI, HuggingFace, TextTarget) - Future-proof pattern matching for new OpenAI models - Optional verification utility for runtime testing - Comprehensive test suite with 8 passing tests

The objective target's TargetCapabilities are now the single source of truth for whether prior media (image, audio, video) is forwarded between the adversarial chat and the objective target across all multi-turn attacks. A shared ModalityFeedbackRouter is composed into RedTeamingAttack, CrescendoAttack, TreeOfAttacksWithPruningAttack, and PAIRAttack. It decides per turn whether to attach prior response media on either side based on each target's declared input_modalities, and fills MessagePiece.adversarial_placeholder() slots in AttackParameters.next_message so callers can mix seed media (e.g. a base image to edit) with adversarial-generated text on turn 1. Three usage scenarios fall out naturally: * default (target advertises text-to-image and text+image-to-image): turn 1 sends text only, turns 2+ pass the previous image back along with adversarial text; * text-to-image only (narrow the target's input_modalities via custom_configuration): every turn is text-only; * image-editing only (narrow the target's input_modalities to text+image, pass next_message=Message([MessagePiece.adversarial_placeholder(), seed_image])): turn 1 sends adversarial text plus seed, turns 2+ refine the previous image. Same logic is generic across image_path / audio_path / video_path. Notes: * The PR branch's history was disjoint from current origin/main (an orphaned past commit). Branch was reset to origin/main and the feature rebuilt on top; the prior PR-only red_teaming helpers are superseded by the router, and the prior changes to refusal scorer YAMLs / video target / message.py are already in main via separate merges. * TreeOfAttacksWithPruningAttack._TreeOfAttacksNode.last_response widens from Optional[str] to Optional[Message] so the router can introspect the data_type of prior pieces; readers were updated accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz · 2026-06-13T12:50:01Z

@fitzpr heads up I'll make some substantial updates here shortly.

- Reset doc/ to match origin/main (flat numbered notebook structure) - Remove old attack/, workflow/, benchmark/, promptgen/ subdirectory notebooks - Add doc/code/executor/8_modality_feedback.py/.ipynb: two-seed Crescendo modality-feedback example (roakey + sailboat, hybrid capability profile) - Update 0_executor.md and myst.yml to include notebook microsoft#8 in navigation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…icter scorer for multi-turn demo - Section 6 now uses IPyImage(data=bytes) to embed all images directly in the notebook so they render without re-running (no more unresolvable paths). - Replaced custom adversarial system_prompt with SeedPrompt loaded from the built-in crescendo/image_generation.yaml, which has proper multi-turn escalation (starts simple, builds up) forces 2-4 turns instead of 1. - Fixed image_generation.yaml JSON response keys: renamed generated_question -> next_message and rationale_behind_jailbreak -> rationale to match what CrescendoAttack expects. - Tightened SelfAskTrueFalseScorer true_description to require ALL five visual elements simultaneously, making single-turn success unlikely. - Added EXECUTOR_SEED_PROMPT_PATH and SeedPrompt imports. - Removed unused MarkdownConversationMemoryPrinter and IPythonMarkdownSink. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…loop-v2' into feature/media-feedback-loop-v2

…eedback-loop-v2' into feature/media-feedback-loop-v2" This reverts commit 4135eee, reversing changes made to 7d5721a.

…k-loop-v2

…l naming - Tighten modality notebook objective + scorer criteria to preserve the seeded raccoon identity. - Regenerate 8_modality_feedback.ipynb outputs from the updated notebook source. - Strengthen Crescendo image_generation guidance for seeded non-human anchors and aligned rationale key naming. - Rename ModalityFeedbackRouter constructor keyword from adversarial_target to adversarial_chat. - Rename property objective_requires_media_on_first_turn to objective_target_requires_media_on_first_turn. - Update all affected multi-turn attack callsites and unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…k-loop-v2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ema test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rlundeen2 · 2026-06-30T21:21:22Z


+        # Fail fast if the objective target requires media on turn 0 but
+        # ``next_message`` does not supply any (i.e. edit-only mode without a seed).
+        self._modality_router.validate_first_turn_seed(next_message=context.next_message)


Rich agrees with this comment but it is copilot generated: validate_first_turn_seed only guarantees that node 0 can be built. In _initialize_first_level_nodes_async the seed is passed only to i == 0 (line 1814); the other first-level nodes get initial_prompt=None, so on turn 0 they go through normal generation and call build_objective_input_message(turn_index=0) (line 569), which raises ValueError when the objective target doesn't advertise text-only input.

So for tree_width > 1 + an edit-only objective (advertises {text, image_path} but not bare {text}), this passes validation and then crashes on the first turn for the sibling nodes. No shipped target trips it today (OpenAIImageTarget also advertises {text}), but that's exactly the edit-only case this validation exists to support. Suggest either propagating the seed to all first-level nodes, or having the router/validation account for the non-seeded siblings — and adding a tree_width=2 + media-required-objective regression test.

rlundeen2 · 2026-06-30T21:21:22Z

+  - objective
+  - max_turns
+  - conversation_context
+data_type: text


Rich agrees with this comment but it is copilot generated: Unlike every crescendo_variant_*.yaml (which set response_json_schema_name: adversarial_chat right above data_type), this prompt omits it. The schema is inlined in prose, but without the name self._adversarial_chat_system_prompt_template.response_json_schema resolves to None, so the JSON_SCHEMA_METADATA_KEY plumbing this PR relies on never forwards a schema and schema-aware targets won't natively constrain the response shape. If that's intentional, worth a comment explaining why; otherwise add the key for parity with the other variants.

rlundeen2 · 2026-06-30T21:21:22Z

+        self._adversarial_chat = adversarial_chat
+        self._objective_target = objective_target
+
+        adv_input = adversarial_chat.configuration.capabilities.input_modalities


Rich agrees with this comment but it is copilot generated: The router snapshots input_modalities for both targets here at construction time. If a target's capabilities are refined later (e.g. via _discover_input_modalities_async), the router keeps using the stale snapshot. Probably fine given typical construction ordering, but worth a docstring note that capabilities are read once at __init__.

rlundeen2 · 2026-06-30T21:24:56Z

+logger = logging.getLogger(__name__)
+
+
+class ModalityFeedbackRouter:


we may want to make this private because I think it will be refactored with the adversarial conversation manager.

Robert Fitzpatrick and others added 4 commits February 18, 2026 18:42

Merge branch 'main' into feature/media-feedback-loop-v2

1d27bc4

Merge branch 'main' into feature/media-feedback-loop-v2

1bc245b

romanlutz reviewed Feb 19, 2026

View reviewed changes

Comment thread pyrit/executor/attack/multi_turn/red_teaming.py Outdated

Comment thread pyrit/executor/attack/multi_turn/red_teaming.py Outdated

Comment thread pyrit/executor/attack/multi_turn/red_teaming.py Outdated

Comment thread pyrit/executor/attack/multi_turn/red_teaming.py Outdated

romanlutz changed the title ~~FEAT plumb media output through adversarial feedback loop (#6a)~~ FEAT plumb media output through adversarial feedback loop in RedTeamingAttack Feb 19, 2026

Update pyrit/executor/attack/multi_turn/red_teaming.py

5cb97e1

Co-authored-by: Roman Lutz <romanlutz13@gmail.com>

fitzpr mentioned this pull request Feb 19, 2026

FEAT: Add modality support detection system for prompt targets #1381

Closed

Copilot AI added 7 commits June 18, 2026 17:20

Merge remote-tracking branch 'pr-fitzpr-PyRIT/feature/media-feedback-…

4135eee

…loop-v2' into feature/media-feedback-loop-v2

Revert "Merge remote-tracking branch 'pr-fitzpr-PyRIT/feature/media-f…

3ec2684

…eedback-loop-v2' into feature/media-feedback-loop-v2" This reverts commit 4135eee, reversing changes made to 7d5721a.

Merge remote-tracking branch 'origin/main' into feature/media-feedbac…

57e2f28

…k-loop-v2

Merge remote-tracking branch 'origin/main' into feature/media-feedbac…

e0c80d0

…k-loop-v2

romanlutz changed the title ~~FEAT plumb media output through adversarial feedback loop in RedTeamingAttack~~ FEAT capability-aware multimodal feedback loop for Crescendo/RedTeaming/TAP Jun 22, 2026

Copilot AI added 2 commits June 22, 2026 13:19

DOC: add ship seed image source and CC BY-SA 2.5 attribution

b49b379

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: add missing modality_router arg to _TreeOfAttacksNode in TAP sch…

47444d7

…ema test Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz mentioned this pull request Jun 27, 2026

Draft: AdversarialConversationManager #2053

Draft

rlundeen2 reviewed Jun 30, 2026

View reviewed changes

rlundeen2 self-assigned this Jun 30, 2026

rlundeen2 approved these changes Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEAT capability-aware multimodal feedback loop for Crescendo/RedTeaming/TAP#1377

FEAT capability-aware multimodal feedback loop for Crescendo/RedTeaming/TAP#1377
fitzpr wants to merge 16 commits into
microsoft:mainfrom
fitzpr:feature/media-feedback-loop-v2

fitzpr commented Feb 19, 2026 •

edited by romanlutz

Loading

Uh oh!

romanlutz left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz commented Jun 13, 2026

Uh oh!

rlundeen2 Jun 30, 2026

Uh oh!

rlundeen2 Jun 30, 2026

Uh oh!

rlundeen2 Jun 30, 2026

Uh oh!

Uh oh!

rlundeen2 Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		logger = logging.getLogger(__name__)


		class ModalityFeedbackRouter:

Uh oh!

Conversation

fitzpr commented Feb 19, 2026 • edited by romanlutz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core behavior

Attack integrations

What this enables

Notebook / docs

Prompt-template updates

Naming/API consistency

Tests

Uh oh!

romanlutz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanlutz commented Jun 13, 2026

Uh oh!

rlundeen2 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

rlundeen2 Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rlundeen2 Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fitzpr commented Feb 19, 2026 •

edited by romanlutz

Loading

romanlutz left a comment •

edited

Loading

rlundeen2 Jun 30, 2026 •

edited

Loading