feat: Add BijectionConverter and BijectionAttack (#1903) by sajisanchu1913-source · Pull Request #1942 · microsoft/PyRIT

sajisanchu1913-source · 2026-06-04T22:30:00Z

Summary

Implements the Bijection Attack from arXiv:2410.01294 (Haize Labs) into PyRIT.

The attack works by teaching a target LLM a secret character mapping through
demonstration shots, then sending harmful prompts encoded in that mapping to
bypass safety filters. Responses are decoded using the inverse mapping.

Changes

New Files

pyrit/prompt_converter/bijection_converter.py — generates random letter-to-letter mapping, encodes prompts, decodes responses
pyrit/executor/attack/single_turn/bijection_attack.py — runs full bijection attack with teaching phase
tests/unit/prompt_converter/test_bijection_converter.py — 11 unit tests for converter
tests/unit/executor/test_bijection_attack.py — 5 unit tests for attack
doc/code/executor/attack/bijection_attack.ipynb — usage notebook

Modified Files

pyrit/prompt_converter/__init__.py — registered BijectionConverter
pyrit/executor/attack/single_turn/__init__.py — registered BijectionAttack

How It Works

BijectionConverter generates a random secret mapping (e.g. a→q, b→x...)
BijectionAttack sends teaching messages to target AI to teach the mapping
Harmful prompt is encoded and sent as TASK is '⟪encoded prompt⟫'
Response is decoded using inverse mapping
Decoded response is scored by the judge

Pattern Followed

BijectionConverter follows FlipConverter pattern
BijectionAttack follows FlipAttack pattern

Reference

Haize Labs implementation: https://github.com/haizelabs/bijection-learning
Paper: arXiv:2410.01294
Closes FEAT Bijection #1903

…dup and harm categories

… fix imports and ordering

- _RemoteDatasetLoader._fetch_zip_from_url: - keyword-only args (source, inner_files, cache) - streams download (requests stream=True + iter_content) to avoid double-buffering large archives - md5-keyed disk cache under DB_DATA_PATH / seed-prompt-entries when cache=True; named temp file otherwise (cleaned up after parse) - validates each inner_files extension against FILE_TYPE_HANDLERS; raises ValueError with a member preview if an inner file is missing - parses inner files via FILE_TYPE_HANDLERS and returns parsed dicts, so the open ZipFile never escapes the worker thread - adds the missing import zipfile that broke the previous commit - _MICDataset: - drops unused io / json / requests imports (helper handles them) - delegates download + parse to the helper; only owns the seed construction loop - guards non-string Q values (in addition to NaN moral values) - forwards cache from fetch_dataset_async to the helper - factors authors into AUTHORS class constant - Tests: - test_moral_integrity_corpus_dataset.py: stops mocking requests.get directly; patches _fetch_zip_from_url to return parsed dicts so tests don't depend on the helper's internal shape - adds test_fetch_dataset_non_string_q and test_fetch_dataset_passes_cache_flag - hoists imports into the right groups so ruff I001 stops firing - removes trailing whitespace / extra newlines - test_remote_dataset_loader.py: adds TestFetchZipFromUrl covering happy path, on-disk caching (hits 1 network call across 2 fetches), cache=False does not persist, missing inner file raises ValueError, unsupported extension raises ValueError Verified live against the real MIC.zip: 35,408 unique seeds across all 6 moral foundations in ~2.4s cold / ~1.3s warm. All 559 dataset unit tests pass; ruff clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Use tempfile.NamedTemporaryFile instead of fixed temp_audio.wav to prevent concurrent call collisions - Wrap Azure upload in try/finally to ensure temp file is always deleted even when upload fails - Add regression test to verify cleanup on upload failure Fixes microsoft#1894

- Add BijectionConverter that generates random letter-to-letter mapping - Add BijectionAttack that teaches the mapping to target AI and encodes harmful prompts - Add unit tests for both converter and attack - Add notebook demonstrating usage - Update __init__.py files to register new classes Based on arXiv:2410.01294 (Haize Labs bijection-learning)

romanlutz

This is a great start! There are a few things that need addressing but we're pretty close.

- Remove @pytest.mark.asyncio decorators (asyncio_mode=auto) - Fix __init__.py alphabetical ordering for BijectionConverter - Use patch_central_database fixture in attack tests - Use MagicMock(spec=PromptTarget) instead of plain MagicMock - Remove dead num_digits parameter - Add BijectionType StrEnum for bijection_type validation - Use private attributes with underscore prefix - Add _build_identifier() method - Fix teaching shots cap with programmatic cycling - Fix alternating user/assistant roles in teaching messages - Fix response decoding in _perform_async - Add BijectionConverter to _request_converters pipeline - Fix notebook format and add paired .py jupytext file - Register BijectionAttack in executor/attack/__init__.py

sajisanchu1913-source · 2026-06-15T05:08:59Z

Hi @romanlutz I've addressed all the review comments:

Removed @pytest.mark.asyncio decorators
Fixed init.py alphabetical ordering
Used patch_central_database fixture in attack tests
Used MagicMock(spec=PromptTarget) instead of plain MagicMock
Removed dead num_digits parameter
Added BijectionType StrEnum for validation
Used private attributes with underscore prefix
Added _build_identifier() method
Fixed teaching shots cap with programmatic cycling
Fixed alternating user/assistant roles in teaching messages
Fixed response decoding in _perform_async
Added BijectionConverter to _request_converters pipeline
Fixed notebook format and added paired .py jupytext file

Ready for re-review!

…ifier import

sajisanchu1913-source · 2026-06-15T05:28:27Z

Hi @romanlutz I've addressed the remaining review comments:

Resolved merge conflicts with upstream/main (kept BidiConverter from main, added BijectionConverter alphabetically)
Added end-to-end test in TestBijectionAttackEndToEnd that uses MockPromptTarget, returns a cipher-text response, and asserts the result is decoded back to plain text
Fixed ComponentIdentifier import to use pyrit.models.identifiers

Ready for re-review

- Change Optional[X] to X | None (PEP 604) - Change bijection_type: str to BijectionType in attack - Register BijectionType in prompt_converter __init__.py - Store decoded response in metadata instead of mutating last_response - Fix teaching shots: user sends English, assistant responds in cipher - Fix brittle test assertions to check structural properties - Update end-to-end test to check metadata for decoded response

sajisanchu1913-source · 2026-06-15T14:30:45Z

Hi @romanlutz
I've completed the architectural restructure

BijectionConverter is now an abstract base class
Added LetterBijectionConverter with fixed_size and seed parameters
Added DigitBijectionConverter with num_digits and seed parameters
Added seed parameter for reproducibility across all converters
Added explicit mapping parameter for replay/deterministic experiments
BijectionAttack now accepts a bijection_converter instance instead of bijection_type/fixed_size
Updated all tests to use the concrete subclasses
All 23 tests passing

Example usage:

# Default letter mapping
attack = BijectionAttack(objective_target=target)

# Custom letter mapping with seed
attack = BijectionAttack(
    objective_target=target,
    bijection_converter=LetterBijectionConverter(fixed_size=5, seed=42),
)

# Digit mapping
attack = BijectionAttack(
    objective_target=target,
    bijection_converter=DigitBijectionConverter(num_digits=10, seed=42),
)

Ready for re-review....

sajisanchu1913-source · 2026-06-15T14:32:17Z

Hi @romanlutz
For TokenBijectionConverte, since it requires a tokenizer dependency, I'll implement it in a separate follow-up PR to keep this one focused. Let me know if you'd prefer it here instead!

romanlutz · 2026-06-15T16:34:37Z

I agree @sajisanchu1913-source.

romanlutz · 2026-06-15T16:39:01Z

+        })
+
+
+class DigitBijectionConverter(BijectionConverter):


Nice‑to‑have follow‑up — tokenizer bijection mode.

The paper (§2) describes a third bijection type beyond letter and digit modes: "tokens from the target model's tokenizer" — each English letter maps to a randomly‑sampled distinct token from the target's vocabulary. The paper notes these complexity parameters (fixed_size for letter mode, ℓ for digit mode, vocab subset for token mode) are what give the attack its scale‑adaptive property, so token mode is meaningful for evaluating frontier models.

Not blocking for this PR — the abstract base class makes adding it later straightforward (TokenBijectionConverter(BijectionConverter) that takes a tokenizer reference). Worth opening as a follow‑up issue once this lands. Or if you want to tackle it yourself you can do that as well, of course.

can you open a GH issue for this and note that you'll do that yourself (if you want to!)? Just so that it's tracked.

- Remove duplicate docstring line in bijection_attack.py - Remove unused SeedPrompt import - Bump num_teaching_shots default to 10 per paper spec - Fix DigitBijectionConverter to map letters to digit strings (not digits to digits) - Add num_digits validation (must be 1-4, raises ValueError) - Fix bijection_attack.py notebook to correct jupytext format - Fix bijection_attack.ipynb to use new API (LetterBijectionConverter) - Add test_digit_converter_encodes_letters test - Add test_digit_converter_invalid_num_digits test

sajisanchu1913-source · 2026-06-15T17:15:56Z

Hi @romanlutz addressed the latest review comments:

Removed duplicate docstring line
Removed unused SeedPrompt import
Bumped num_teaching_shots default to 10 per paper spec
Fixed DigitBijectionConverter. now maps letters to digit
Added num_digits validation (raises ValueError for values outside 1-4)
Fixed bijection_attack.py notebook to correct jupytext percent format
Fixed bijection_attack.ipynb to use new API (LetterBijectionConverter)
Added test_digit_converter_encodes_letters and test_digit_converter_invalid_num_digits tests

All 25 tests passing. Ready for re-review!

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz · 2026-06-16T04:39:57Z

I'll make another pass in the morning. Thanks for your patience in addressing all my concerns so far 🙂

- Rewrite docstrings in imperative mood with Args/Returns/Raises (D401, DOC201, DOC501) - Add docstrings to public properties and subclass __init__ (D102, D107) - Use dict comprehension/update and zip(strict=True) (PERF403, B905) - Type _build_identifier as ComponentIdentifier; import it - Add type: ignore[ty:invalid-parameter-default] on REQUIRED_VALUE default - Wrap long intro prompt line (E501) and sort __init__ imports (I001) - Register bijection_attack.ipynb in doc/myst.yml; strip kernelspec metadata Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… test - Add LetterBijectionConverter and DigitBijectionConverter examples to doc/code/converters/1_text_to_text_converters (.py and .ipynb) - Add abstract BijectionConverter to test_converter_documentation exceptions, consistent with other abstract base classes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BijectionConverter is an abstract base class (ABC with abstractmethod), so enumerating it broke the converter-service instantiation test. Skip classes with non-empty __abstractmethods__ in get_converter_modalities so both the documentation and instantiation tests only see concrete converters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add Bijection Learning paper to references and cite it from docs/source - Fix digit bijection decoding for multi-character encoded tokens - Reject one-digit digit mappings because 26 letters need 26 distinct values - Add coverage for explicit mappings, identifiers, digit round trips, teaching messages, and attack metadata paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Use a system setup prompt that tells the target to answer only in the bijection code - Make the final task instruction explicit about private decoding and coded answers - Only expose decoded_response metadata when decoding appears to produce English - Show a skip status for plaintext or invalid cipher responses instead of bogus decoded text - Re-execute the bijection attack notebook after the semantic fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Use a system setup message for targets that support system prompts - Fold setup instructions into the first user teaching shot for targets that do not - Preserve user/assistant alternation in the fallback path - Cover unsupported-system and zero-shot fallback behavior in tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Replace live OpenAIChatTarget usage with a local deterministic demo target - Keep the notebook executable without external credentials - Demonstrate a valid bijection-coded response and decoded_response metadata Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Restore the live OpenAIChatTarget example and original red-team objective - Use Azure CLI credential with extended process timeout so notebook execution succeeds locally - Commit the executed notebook output from the live target run Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sajisanchu1913-source · 2026-06-19T16:36:25Z

Hi @romanlutz I can see there are merge conflicts. Should I resolve them, or would you prefer to handle it on your end? Happy to do whatever is most helpful!

sajisanchu1913-source · 2026-06-22T22:24:35Z

Hi @romanlutz just wanted to check in on the merge conflicts, happy to resolve them myself if that would help move things along. Let me know!

sajisanchu1913-source · 2026-06-23T02:22:01Z

Hi @romanlutz I've resolved the merge conflicts, kept both the bijection citation and the new upstream citations in bibliography.md, fixed the init.py conflict by keeping both DecompositionConverter and DigitBijectionConverter alphabetically, and updated myst.yml to the new executor folder structure. Should be ready to merge now!

sajisanchu1913-source · 2026-06-23T04:59:16Z

Hi @romanlutz
I've resolved the merge conflicts. One thing to flag I forgot to create a separate branch before starting work on TokenBijectionConverter, so those changes ended up here. I've gone ahead and included it since the architecture was already in place, but happy to revert it out and open a separate PR if you'd prefer to keep this one focused on the original scope. Let me know what works best!

romanlutz · 2026-06-24T19:35:35Z

I've tried this and it essentially didn't work. The target model didn't seem to learn the mapping and the result was garbage (or at least I couldn't decode it with the provided mapping). Did you have a different experience?

I've tried fixing that in a few different ways and so far not succeeded.

sajisanchu1913-source · 2026-06-24T19:37:51Z

Hi @romanlutz, I hadn't tested it end-to-end with a real model ,I focused on getting the architecture and unit tests right but didn't validate the actual attack effectiveness. Thanks for trying it!

Given that it's not working, would you prefer I:

Remove TokenBijectionConverter from this PR and open a separate issue to investigate the correct approach
Keep it here but mark it as experimental with a note in the docstring

Happy to go with whatever keeps this PR on track!

sajisanchu1913-source · 2026-06-26T19:39:07Z

Hi Roman,

Thank you for taking the time to review my PR. I'd really like to get this feature merged, and I'm happy to put in whatever work is needed to make it meet the project's expectations.

If there are specific issues with the implementation or areas where my approach doesn't align with the project's design, I'd really appreciate your guidance. I'm committed to addressing the feedback and learning from it.

I'm currently a Master's student and this is one of my first open-source feature contributions. Having a feature implementation merged into PyRIT would be a significant milestone for me, both as a learning experience and for my resume. That said, I want to earn the merge by making the implementation as good as it can be.

Thanks again for your time and feedback—I appreciate any suggestions you have, and I'll work through them.

romanlutz · 2026-06-26T21:00:40Z

I understand, @sajisanchu1913-source .

I've not yet seen an LLM actually respond in a way that the decoding worked.

For example, I tried

a=q, b=m, c=j, d=z, e=t, f=g, g=f, h=k, i=p, j=w, k=l, l=s, m=b,
n=o, o=x, p=n, q=c, r=r, s=y, t=e, u=v, v=h, w=i, x=a, y=d, z=u

and got this response

zqqy bm ksp fxa wdykydy xf c bzxu pltkchv gtrktfs twttks

which maps to the following (inverse):

daas mb hli gox jyshsys og q mdoz ikehqvu ferhegl ejeehl

I need to start seeing a working bijection. I'm not even saying a successful attack, but a working bijection.

We can explore options of restructuring it, or do a detailed comparison to the paper. But something appears to be off.

sajisanchu1913-source · 2026-06-26T21:03:03Z

Thanks, Roman. I understand the issue now.

I'll revisit my implementation, compare it more closely with the paper and the reference implementation, and focus on getting the model to produce a working bijection before worrying about the attack itself. I'll work on this and update the PR by tomorrow morning with my findings and any changes.

I really appreciate your feedback and the time you've taken to review it. As a Master's student, this is one of my first feature contributions to a large open-source project, so your guidance is incredibly valuable. Thank you!

sajisanchu1913-source and others added 12 commits May 28, 2026 17:14

FEAT: Add SALT-NLP Moral Integrity Corpus (MIC) dataset loader

ff0843e

FEAT: Add SALT-NLP MIC dataset loader with tests and documentation

83dd517

REFACTOR: Rename to moral_integrity_corpus_dataset, fix async, add de…

abc1e16

…dup and harm categories

fix: address reviewer feedback - fix NaN crash, add liberty category,…

88f89f0

… fix imports and ordering

fix: correct import ordering and trailing newline

fedba1c

fix: add reusable _fetch_zip_from_url helper to base class

cf197d9

Merge branch 'main' into main

039e713

Merge branch 'microsoft:main' into main

010a439

fix: add missing newline at end of file

056e938

sajisanchu1913-source mentioned this pull request Jun 4, 2026

FEAT Bijection #1903

Open

romanlutz reviewed Jun 15, 2026

View reviewed changes

sajisanchu1913-source added 2 commits June 15, 2026 01:20

fix: resolve merge conflicts with upstream/main

1973122

fix: add end-to-end test for response decoding and fix ComponentIdent…

9f0ac6d

…ifier import

romanlutz reviewed Jun 15, 2026

View reviewed changes

sajisanchu1913-source added 2 commits June 15, 2026 09:59

refactor: restructure BijectionConverter into abstract base + subclasses

ec3c54b

romanlutz reviewed Jun 15, 2026

View reviewed changes

Copilot AI added 2 commits June 15, 2026 21:35

test: remove redundant local pytest import in bijection converter test

0dac83e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into feat/bijection-attack

5a417fc

sajisanchu1913-source mentioned this pull request Jun 16, 2026

FEAT: Add TokenBijectionConverter #2023

Open

Copilot AI added 10 commits June 16, 2026 05:42

docs: execute changed bijection notebooks

e84854f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: rerun bijection attack notebook after fallback change

0f48b83

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix: resolve merge conflicts with upstream/main

85f9335

fix: resolve merge conflicts with upstream/main

46aaeaa

Uh oh!

Conversation

sajisanchu1913-source commented Jun 4, 2026

Summary

Changes

New Files

Modified Files

How It Works

Pattern Followed

Reference

Uh oh!

romanlutz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sajisanchu1913-source commented Jun 15, 2026

Uh oh!

sajisanchu1913-source commented Jun 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sajisanchu1913-source commented Jun 15, 2026

Uh oh!

sajisanchu1913-source commented Jun 15, 2026

Uh oh!

romanlutz commented Jun 15, 2026

Uh oh!

romanlutz Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

romanlutz Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sajisanchu1913-source commented Jun 15, 2026

Uh oh!

romanlutz commented Jun 16, 2026

Uh oh!

sajisanchu1913-source commented Jun 19, 2026

Uh oh!

sajisanchu1913-source commented Jun 22, 2026

Uh oh!

sajisanchu1913-source commented Jun 23, 2026

Uh oh!

sajisanchu1913-source commented Jun 23, 2026

Uh oh!

romanlutz commented Jun 24, 2026

Uh oh!

sajisanchu1913-source commented Jun 24, 2026

Uh oh!

sajisanchu1913-source commented Jun 26, 2026

Uh oh!

romanlutz commented Jun 26, 2026

Uh oh!

sajisanchu1913-source commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants