From a7315d187e3c043db44056125ff60e2846c808c5 Mon Sep 17 00:00:00 2001 From: Richard Lundeen Date: Fri, 26 Jun 2026 16:31:51 -0700 Subject: [PATCH 1/6] Update framework.md architecture with new responsibility structure Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- doc/code/framework.md | 143 +++++++++++++++++++++++++++++++++--------- 1 file changed, 113 insertions(+), 30 deletions(-) diff --git a/doc/code/framework.md b/doc/code/framework.md index 6425557ffa..65fa63f466 100644 --- a/doc/code/framework.md +++ b/doc/code/framework.md @@ -69,63 +69,146 @@ The main components of PyRIT are prompts, attacks, converters, targets, and scor As much as possible, each component is a pluggable brick of functionality. Prompts from one attack can be used in another. An attack for one scenario can use multiple targets. And sometimes you completely skip components (e.g. almost every component can be a NoOp also, you can have a NoOp converter that doesn't convert, or a NoOp target that just prints the prompts). -If you are contributing to PyRIT, that work will most likely land in one of these buckets and be as self-contained as possible. It isn't always this clean, but when an attack scenario doesn't quite fit (and that's okay!) it's good to brainstorm with the maintainers about how we can modify our architecture. +If you are contributing to PyRIT, that work will most likely land in one of the core components buckets and be as self-contained as possible. It isn't always this clean, but when an attack scenario doesn't quite fit (and that's okay!) it's good to brainstorm with the maintainers about how we can modify our architecture. Also, if our **Framework Plans** would be helpful, please open issues! -The remainder of this document talks about the different components, how they work, what their responsibilities are, and ways to contribute. +# Core Components +## [Datasets](./datasets/0_dataset) -## Datasets: Prompts, Jailbreak Templates, Source Images, Attack Strategies, etc. +**Responsibility**: Create a single place to manage prompts -The first piece of an attack is often a dataset piece, like a prompt. "Tell me how to create a Molotov cocktail" is an example of a prompt. PyRIT is a good place to have a library of things to check for. +- New Datasets can be added in the dataset module. +- Datasets should never be retrieved from DatasetProviders; DatasetProviders should load into memory, and then components retireve from memory +- Most components should always work with seeds passed directly in (except scenarios which may package them from memory). Never use DatasetProfiders, file paths, etc. Either pass the seed as an argument or retrieve from memory. -Ways to contribute: Check out our documentation on [seed datasets](./datasets/0_dataset.md); are there more prompts and jailbreak templates you can add that include scenarios you're testing for? +**Framework Plans**: -## Attacks +- There is some churn here. We haven't managed these much at scale, and we may have to redefine how it works. +- We want more investment in managing datasets and loading them more intelligently +- We need to more consistently pass seeds or use memory -Attacks are responsible for putting all the other pieces together. They make use of all other components in PyRIT to execute an attack technique end-to-end. -PyRIT supports single-turn (e.g. Many Shot Jailbreaks [@anthropic2024manyshot], Role Play, Skeleton Key [@microsoft2024skeletonkey]) and multi-turn attack strategies (e.g. Tree of Attacks [@mehrotra2023tap], Crescendo [@russinovich2024crescendo]), and compound strategies (e.g. `SequentialAttack`) for chaining several techniques against a single objective. +**Contributing (difficulty easy)**: Are there more prompts and jailbreak templates you can add that include scenarios you're testing for? It is easy to add new dataset providers. -Ways to contribute: Check out our [attack docs](./executor/0_executor.md). There are hundreds of attacks outlined in research papers. A lot of these can be captured within PyRIT. If you find an attack that doesn't fit the attack model please notify the team. Are there scenarios you can write attack modules for? +## [Attacks](./executor/0_executor) -## Converters +**Responsibility**: Manage conversations between objective targets and adversarial targets; using datasets, scorers, and converters to achieve an objective. -Converters are a powerful component that converts prompts to something else. They can be stacked and combined. They can be as varied as translating a text prompt into a Word document, rephrasing a prompt in 100 different ways, or adding a text overlay to an image. +- Any branching decision (e.g. the next thing(s) to do is based on a previous result) should be an attack. +- Attacks should always make use of other component's responsibilities. An attack should alwways branch based on a scorer and NOT a direct response. (e.g. was this prompt blocked? is a scorer responsibility, not an attack responsibility) +- Attacks should use scoring and target capabilities implicitly. Attacks should support multi-modal. +- Compound attacks are possible, combining different attacks in different ways. -Ways to contribute: Check out our [converter docs](./converters/0_converters.ipynb). Are there ways prompts can be converted that would be useful for an attack? +**Rough Framework Plans**: -## Target +- We need to move some older attacks that don't belong here. Many (FlipAttack) should just be attack techniques +- There are potential ways we could combine different algorithms. Are Crescendo and TAP ultimately the same? +- We need to support target capabilities more implicitly +- Other executors, like benchmarks, need better end-to-end support; potentially including an `ExpectedResult` seed and associated scorers. +- More flexible compound attacks should continue to be added -A Prompt Target can be thought of as "the thing we're sending the prompt to". +**Contributing (difficulty high)**: The best way to contribute is likely opening issues if you run into limitations. -This is often an LLM, but it doesn't have to be. For Cross-Domain Prompt Injection Attacks, the Prompt Target might be a Storage Account that a later Prompt Target has a reference to. +## [Attack Technique] -One attack can have many Prompt Targets (and in fact, converters and Scoring Engine can also use Prompt Targets to convert/score the prompt). +**Responsibility**: An attack technique packages an executor, converters, datasets, and strategies into a single attack. The goal is that any attack (something trying to achieve an objective) can be defined as an attack technique. -Ways to contribute: Check out our [target docs](./targets/0_prompt_targets.md). Are there models you want to use at any stage or for different attacks? +**Rough Framework Plans**: +- Managing these better, so scenarios can more easily select or build the attack techniques to use -## Scoring Engine +**Contributing (difficulty easy)**: Simply add the attack technique to one of the initializers. -The scoring engine is a component that gives feedback to the attack on what happened with the prompt. This could be as simple as "Was this prompt blocked?" or "Was our objective achieved?" +## [Scenarios](./scenarios/0_scenarios) -Ways to contribute: Check out our [scoring docs](./scoring/0_scoring.ipynb). Is there data you want to use to make decisions or analyze? +**Responsibility**: This is the avenue to "run PyRIT against something". What does that look like? -## Memory +- A scenario takes user input and uses it to package datasets with attack techniques +- A scenario orchestrates resiliency and parallelism from a high level +- No result should depend on previous results (that is an attack's job) -One important thing to remember about this architecture is its swappable nature. Prompts and targets and converters and attacks and scorers should all be swappable. But sometimes one of these components needs additional information. If the target is an LLM, we need a way to look up previous messages sent to that session so we can properly construct the new message. If the target is a blob store, we need to know the URL to use for a future attack. +**Rough Framework Plans**: + +- Scenarios are new enough that we are still discovering patterns and limitations. So they will regularly be refactored + +**Contributing (difficulty medium)**: Is there a scanner that does something PyRIT doesn't? Add it as a scenario. But because we're changing how things are done rapidly, it is not as well-defined as other areas. + +## [Converters](./converters/0_converters) + +**Responsibility**: Converters are a component that converts prompts to something else. They can be stacked and combined. They can be as varied as translating a text prompt into a Word document, rephrasing a prompt, or adding a text overlay to an image. + +**Rough Framework Plans**: + +- We want to refactor our converter pipeline, so there are currently some things that should be converters that we may want to postpone (e.g. partial converting). This is supported but could be much more dynamic. + +**Contributing (difficulty low)**: The existing pattern is well-defined. Are there ways prompts can be converted that would be useful for an attack? + +## [Target](./targets/0_prompt_targets.md) + +**Responsibility**: A Prompt Target can be thought of as "the thing we're sending the prompt to". Many other components use it, including scorers, attacks, and converters. + +- This is often an LLM, but it doesn't have to be. For Cross-Domain Prompt Injection Attacks, the Prompt Target might be a Storage Account that a later Prompt Target has a reference to. Message and conversation should be generic enough to handle this extra data. +- Prompt Target capabilities should be used to see if a target is compatible with the capabilities that the other components want to use. +- Targets should use message_normalizer along with PromptCapabilities to transorm `Messages` into formats that target supports. +- Because targets are so varied, it is reasonable to return multiple tool calls, or none at all. +- One attack can have many Prompt Targets (and in fact, converters and Scoring Engine can also use Prompt Targets to convert/score the prompt). + +**Rough Framework Plans**: + +- Better agent support may require extra pieces attached to a Message +- Better surface support may require expanding the return types + +**Contributing (difficulty low)**: + +- The pattern is well-defined. +- Are there models you want to use at any stage or for different attacks? But also, can your model just be one of the existing targets? -For more details about memory configuration, please follow the guide in [memory](./memory/0_memory.md). +## [Scoring](./scoring/0_scoring.ipynb) -Memory modifications and contributions should usually be designed with the maintainers. +**Responsibility**: The scoring engine is a component that gives feedback to the attack on what happened with the prompt. This could be as simple as "Was this prompt blocked?" or "Was our objective achieved?" -## The Flow +- Any decision an attack makes should be based on a scorer result -To some extent, the ordering in this diagram matters. In the simplest cases, you have a prompt, an attack takes the prompt, uses prompt normalizer to run it through converters and send to a target, and the result is scored. +**Contributing (difficulty low)**: -But this simple view is complicated by the fact that an attack can have multiple targets, converters can be stacked, scorers can use targets to score, etc. +- The pattern is well-defined. +- You can evaluate how accurate probabalistic scorers are and likely make them more accurate. +- Is there data you want to use to make decisions or analyze? + +**Framework Plans**: + +- Scorers will be refactored to be more generic, so they can determine more general results (does a file exist? Was a tool called?) + +# Core library + +The below talks about responsibilities of several modules in the PyRIT library + +## [Registry](./registry/0_registry) + +**Responsibility**: The registry is used to build and store the core components. + +- If you are creating a component with user input (e.g. via config, REST, or automatically) it should always use the registry +- If you are storing an instance of a component, it should always use the registry + +## [Models] + +**Responsibility**: pyrit.models is a lightweight module where core types are defined. These should always be used where possible to prevent drift. + +- If you are creating a class that has a lot of overlap with another class, or using a dict to serialize across boundaries, consider if you can use/move pyrit.models +- Models includes `identifiers` which are descriptions of the core components. And along with the registry, can often recreate those components. +- Models includes types passed around between components, and should be prefered in REST +- models should never include any dependencies outside of pyrit.common (which shouldn't depend on anything) + +## Output + +**Responsibility**: The Output module is responsible for writing different components in different formats to different places. + +## [Memory](./memory/0_memory.md) + +One important thing to remember about this architecture is its swappable nature. Prompts and targets and converters and attacks and scorers should all be swappable. But sometimes one of these components needs additional information. If the target is an LLM, we need a way to look up previous messages sent to that session so we can properly construct the new message. If the target is a blob store, we need to know the URL to use for a future attack. -Sometimes, if a scenario requires specific data, we may need to modify the architecture. This happened recently when we thought a single target may take multiple prompts separately in a single request. Any time we need to modify the architecture like this, that's something that needs to be designed with the maintainers so we can consolidate our other supported scenarios and future plans. +## Framework Component Documentation -## Notebooks +**Responsibility** Show how the framework is used in a concise way -For all their power, attacks should still be generic. A lot of our front-end code and operators use Notebooks to interact with PyRIT. This is fantastic, but most new logic should not be notebooks. Notebooks should mostly be used for attack setup and documentation. For example, configuring the components and putting them together is a good use of a notebook, but new logic for an attack should be moved to one or more components. +- Notebooks that contain code should be notebooks that can execute +- Notebooks should execute quickly From 2bd584f535dad27bf5a2aa2be49a3b178baa99a7 Mon Sep 17 00:00:00 2001 From: Richard Lundeen Date: Fri, 26 Jun 2026 16:57:25 -0700 Subject: [PATCH 2/6] DOC: Architecture Responsibilities Restructure framework.md to clearly define each component's responsibilities using an Owns / Does NOT own template, fix structural inconsistencies, and correct typos. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- doc/code/framework.md | 194 +++++++++++++++++++++++++++--------------- 1 file changed, 126 insertions(+), 68 deletions(-) diff --git a/doc/code/framework.md b/doc/code/framework.md index 65fa63f466..63b9a45fa5 100644 --- a/doc/code/framework.md +++ b/doc/code/framework.md @@ -63,152 +63,210 @@ The sections above link to detailed guides for each component. The architecture # Architecture -The main components of PyRIT are prompts, attacks, converters, targets, and scoring. The best way to contribute to PyRIT is by contributing to one of these components. +The main components of PyRIT are datasets, targets, converters, scoring, and attacks — together with the attack techniques and scenarios that combine them. The best way to contribute to PyRIT is by contributing to one of these components. ![alt text](../../assets/architecture_components.png) As much as possible, each component is a pluggable brick of functionality. Prompts from one attack can be used in another. An attack for one scenario can use multiple targets. And sometimes you completely skip components (e.g. almost every component can be a NoOp also, you can have a NoOp converter that doesn't convert, or a NoOp target that just prints the prompts). -If you are contributing to PyRIT, that work will most likely land in one of the core components buckets and be as self-contained as possible. It isn't always this clean, but when an attack scenario doesn't quite fit (and that's okay!) it's good to brainstorm with the maintainers about how we can modify our architecture. Also, if our **Framework Plans** would be helpful, please open issues! +Each section below states what a component **owns** and, just as importantly, what it **does not own** (with a pointer to the component that does). If you are contributing to PyRIT, that work will most likely land in one of the core component buckets and be as self-contained as possible. It isn't always this clean, but when an attack scenario doesn't quite fit (and that's okay!) it's good to brainstorm with the maintainers about how we can modify our architecture. Also, if our **Framework Plans** would be helpful, please open issues! # Core Components ## [Datasets](./datasets/0_dataset) -**Responsibility**: Create a single place to manage prompts +**Responsibility**: Provide a single place to define and manage the inputs to an attack — prompts, jailbreak templates, source images, attack strategies, and similar seeds. -- New Datasets can be added in the dataset module. -- Datasets should never be retrieved from DatasetProviders; DatasetProviders should load into memory, and then components retireve from memory -- Most components should always work with seeds passed directly in (except scenarios which may package them from memory). Never use DatasetProfiders, file paths, etc. Either pass the seed as an argument or retrieve from memory. +- New datasets can be added in the dataset module. +- Dataset providers load seeds into memory; components then retrieve them from memory. Providers are not queried directly at attack time. +- Most components should work with seeds passed directly in (except scenarios, which may package them from memory). Never reach for dataset providers, file paths, etc. inside a component — either pass the seed as an argument or retrieve it from memory. + +**Does NOT own**: + +- Persisting or looking up seeds at run time — that is Memory. +- Deciding which seeds to run — that is a Scenario. **Framework Plans**: - There is some churn here. We haven't managed these much at scale, and we may have to redefine how it works. -- We want more investment in managing datasets and loading them more intelligently -- We need to more consistently pass seeds or use memory +- We want more investment in managing datasets and loading them more intelligently. +- We need to more consistently pass seeds or use memory. -**Contributing (difficulty easy)**: Are there more prompts and jailbreak templates you can add that include scenarios you're testing for? It is easy to add new dataset providers. +**Contributing (difficulty: easy)**: Are there more prompts and jailbreak templates you can add for scenarios you're testing for? It is easy to add new dataset providers. ## [Attacks](./executor/0_executor) -**Responsibility**: Manage conversations between objective targets and adversarial targets; using datasets, scorers, and converters to achieve an objective. +**Responsibility**: Own the *algorithm and control flow* of achieving a single objective — managing the conversation between objective and adversarial targets, and using datasets, converters, and scorers along the way. -- Any branching decision (e.g. the next thing(s) to do is based on a previous result) should be an attack. -- Attacks should always make use of other component's responsibilities. An attack should alwways branch based on a scorer and NOT a direct response. (e.g. was this prompt blocked? is a scorer responsibility, not an attack responsibility) -- Attacks should use scoring and target capabilities implicitly. Attacks should support multi-modal. +- Any branching decision (i.e. the next step depends on a previous result) belongs in an attack. +- An attack should branch based on a **scorer result**, never on a raw target response directly (e.g. "was this prompt blocked?" is a scorer's job, not an attack's). +- Attacks use scoring and target capabilities implicitly, and should support multi-modal. - Compound attacks are possible, combining different attacks in different ways. -**Rough Framework Plans**: +**Does NOT own**: + +- Interpreting a raw target response — that is Scoring. +- The specific configuration of prompts, converters, and strategy used — that is an Attack Technique. +- Choosing which attacks or techniques to run, or running them at scale — that is a Scenario. + +**Framework Plans**: -- We need to move some older attacks that don't belong here. Many (FlipAttack) should just be attack techniques +- We need to move some older attacks that don't belong here. Many (e.g. FlipAttack) should just be attack techniques. - There are potential ways we could combine different algorithms. Are Crescendo and TAP ultimately the same? -- We need to support target capabilities more implicitly +- We need to support target capabilities more implicitly. - Other executors, like benchmarks, need better end-to-end support; potentially including an `ExpectedResult` seed and associated scorers. -- More flexible compound attacks should continue to be added +- More flexible compound attacks should continue to be added. + +**Contributing (difficulty: hard)**: The best way to contribute is likely opening issues if you run into limitations. + +## Attack Technique + +**Responsibility**: A single, declarative **configuration** of an attack — no new logic. It bundles an existing attack class with the strategy, converters, datasets, and prompts that define one named technique. -**Contributing (difficulty high)**: The best way to contribute is likely opening issues if you run into limitations. +A technique should be expressible as one self-contained definition, for example: -## [Attack Technique] +```python +AttackTechniqueFactory( + name="violent_durian", + attack_class=RedTeamingAttack, + strategy_tags=["multi_turn"], + adversarial_system_prompt=SeedPrompt.from_yaml_file(EXECUTOR_RED_TEAM_PATH / "violent_durian.yaml"), + adversarial_seed_prompt=SeedPrompt.from_yaml_file( + EXECUTOR_RED_TEAM_PATH / "violent_durian_seed_prompt.yaml" + ), +) +``` -**Responsibility**: An attack technique packages an executor, converters, datasets, and strategies into a single attack. The goal is that any attack (something trying to achieve an objective) can be defined as an attack technique. +**Does NOT own**: -**Rough Framework Plans**: +- Any branching or control flow — that lives in the Attack it configures. +- Selecting which techniques to run — that is a Scenario. -- Managing these better, so scenarios can more easily select or build the attack techniques to use +**Framework Plans**: + +- We are still defining *where* attack techniques are registered (today this can live in setup/initializers, but that may change). +- Managing these better, so scenarios can more easily select or build the attack techniques to use. -**Contributing (difficulty easy)**: Simply add the attack technique to one of the initializers. +**Contributing (difficulty: easy)**: Add the technique as a single declarative configuration, with no new logic. ## [Scenarios](./scenarios/0_scenarios) -**Responsibility**: This is the avenue to "run PyRIT against something". What does that look like? +**Responsibility**: The avenue to "run PyRIT against something" — **select** which attack techniques and datasets to run, then orchestrate them at scale. + +- A scenario takes user input and uses it to package datasets with attack techniques. +- A scenario orchestrates resiliency and parallelism from a high level. +- No result should depend on a previous result — that cross-result branching is an attack's job. + +**Does NOT own**: -- A scenario takes user input and uses it to package datasets with attack techniques -- A scenario orchestrates resiliency and parallelism from a high level -- No result should depend on previous results (that is an attack's job) +- Per-objective branching or conversation logic — that is an Attack. +- The internal configuration of a technique — that is an Attack Technique. -**Rough Framework Plans**: +**Framework Plans**: -- Scenarios are new enough that we are still discovering patterns and limitations. So they will regularly be refactored +- Scenarios are new enough that we are still discovering patterns and limitations, so they will be refactored regularly. -**Contributing (difficulty medium)**: Is there a scanner that does something PyRIT doesn't? Add it as a scenario. But because we're changing how things are done rapidly, it is not as well-defined as other areas. +**Contributing (difficulty: medium)**: Is there a scanner that does something PyRIT doesn't? Add it as a scenario. Because we're still changing how this works, it is less well-defined than other areas. ## [Converters](./converters/0_converters) -**Responsibility**: Converters are a component that converts prompts to something else. They can be stacked and combined. They can be as varied as translating a text prompt into a Word document, rephrasing a prompt, or adding a text overlay to an image. +**Responsibility**: Convert a prompt into something else. Converters can be stacked and combined, and can be as varied as translating a text prompt into a Word document, rephrasing a prompt, or adding a text overlay to an image. + +**Does NOT own**: -**Rough Framework Plans**: +- Deciding *when* to apply a conversion, or branching on the result — that is an Attack. + +**Framework Plans**: -- We want to refactor our converter pipeline, so there are currently some things that should be converters that we may want to postpone (e.g. partial converting). This is supported but could be much more dynamic. +- We want to refactor our converter pipeline; some things that should be converters (e.g. partial converting) may be postponed. This is supported but could be much more dynamic. -**Contributing (difficulty low)**: The existing pattern is well-defined. Are there ways prompts can be converted that would be useful for an attack? +**Contributing (difficulty: easy)**: The existing pattern is well-defined. Are there ways prompts can be converted that would be useful for an attack? ## [Target](./targets/0_prompt_targets.md) -**Responsibility**: A Prompt Target can be thought of as "the thing we're sending the prompt to". Many other components use it, including scorers, attacks, and converters. +**Responsibility**: "The thing we're sending the prompt to." Many other components use it, including scorers, attacks, and converters. -- This is often an LLM, but it doesn't have to be. For Cross-Domain Prompt Injection Attacks, the Prompt Target might be a Storage Account that a later Prompt Target has a reference to. Message and conversation should be generic enough to handle this extra data. -- Prompt Target capabilities should be used to see if a target is compatible with the capabilities that the other components want to use. -- Targets should use message_normalizer along with PromptCapabilities to transorm `Messages` into formats that target supports. +- This is often an LLM, but it doesn't have to be. For Cross-Domain Prompt Injection Attacks, the prompt target might be a storage account that a later prompt target has a reference to. Message and conversation should be generic enough to carry this extra data. +- Target capabilities are used to check whether a target is compatible with what the other components want to do. +- Targets use `message_normalizer` together with prompt capabilities to transform `Messages` into the formats a given target supports. - Because targets are so varied, it is reasonable to return multiple tool calls, or none at all. -- One attack can have many Prompt Targets (and in fact, converters and Scoring Engine can also use Prompt Targets to convert/score the prompt). +- One attack can have many prompt targets (and converters and scorers can use prompt targets too, to convert or score). -**Rough Framework Plans**: +**Framework Plans**: -- Better agent support may require extra pieces attached to a Message -- Better surface support may require expanding the return types +- Better agent support may require extra pieces attached to a Message. +- Better surface support may require expanding the return types. -**Contributing (difficulty low)**: +**Contributing (difficulty: easy)**: - The pattern is well-defined. -- Are there models you want to use at any stage or for different attacks? But also, can your model just be one of the existing targets? +- Are there models you want to use at any stage or for different attacks? And could your model simply be one of the existing targets? ## [Scoring](./scoring/0_scoring.ipynb) -**Responsibility**: The scoring engine is a component that gives feedback to the attack on what happened with the prompt. This could be as simple as "Was this prompt blocked?" or "Was our objective achieved?" - -- Any decision an attack makes should be based on a scorer result +**Responsibility**: Give feedback to the attack on what happened with a prompt — from "was this prompt blocked?" to "was our objective achieved?". Scoring owns the *interpretation* of a response; every decision an attack makes is based on a scorer result. -**Contributing (difficulty low)**: +**Does NOT own**: -- The pattern is well-defined. -- You can evaluate how accurate probabalistic scorers are and likely make them more accurate. -- Is there data you want to use to make decisions or analyze? +- Acting on a score — branching, retrying, or stopping is the Attack's job. **Framework Plans**: -- Scorers will be refactored to be more generic, so they can determine more general results (does a file exist? Was a tool called?) +- Scorers will be refactored to be more generic, so they can determine more general results (does a file exist? was a tool called?). + +**Contributing (difficulty: easy)**: + +- The pattern is well-defined. +- You can evaluate how accurate probabilistic scorers are and likely make them more accurate. +- Is there data you want to use to make decisions or analyze? # Core library -The below talks about responsibilities of several modules in the PyRIT library +The modules below are the supporting library the core components are built on. ## [Registry](./registry/0_registry) -**Responsibility**: The registry is used to build and store the core components. +**Responsibility**: Build and store the core components — the **construction** side of the framework. -- If you are creating a component with user input (e.g. via config, REST, or automatically) it should always use the registry -- If you are storing an instance of a component, it should always use the registry +- If you are creating a component from user input (e.g. via config, REST, or automatically), it should go through the registry. +- If you are storing an instance of a component, it should use the registry. -## [Models] +**Does NOT own**: -**Responsibility**: pyrit.models is a lightweight module where core types are defined. These should always be used where possible to prevent drift. +- Defining the *shape* of a component or its identifier — that is Models. -- If you are creating a class that has a lot of overlap with another class, or using a dict to serialize across boundaries, consider if you can use/move pyrit.models -- Models includes `identifiers` which are descriptions of the core components. And along with the registry, can often recreate those components. -- Models includes types passed around between components, and should be prefered in REST -- models should never include any dependencies outside of pyrit.common (which shouldn't depend on anything) +## Models -## Output +**Responsibility**: A lightweight module where core types are defined — the **description** side of the framework. These types should be used wherever possible to prevent drift. -**Responsibility**: The Output module is responsible for writing different components in different formats to different places. +- If you are creating a class that overlaps heavily with another, or using a dict to serialize across boundaries, consider whether you can use or move it into `pyrit.models`. +- Models includes `identifiers`, which describe the core components; together with the registry, an identifier can often recreate the component it describes. +- Models includes the types passed between components, and should be preferred in REST. +- Models should never depend on anything outside `pyrit.common` (which itself shouldn't depend on anything). + +## [Output](./output/0_output) + +**Responsibility**: Render finished components — attack results, scenario results, conversations, and scores — to different surfaces (terminal, files, Jupyter). Output is invoked directly by the CLI and in notebooks; the components it renders do not call into it. + +**Does NOT own**: + +- Live, in-run progress printing — that belongs to the scenario's own printer. ## [Memory](./memory/0_memory.md) -One important thing to remember about this architecture is its swappable nature. Prompts and targets and converters and attacks and scorers should all be swappable. But sometimes one of these components needs additional information. If the target is an LLM, we need a way to look up previous messages sent to that session so we can properly construct the new message. If the target is a blob store, we need to know the URL to use for a future attack. +**Responsibility**: The canonical store that components read from and write to — seeds, conversations, scores, and attack results. When a component needs more than what is passed in, it goes through memory. + +One important thing to remember about this architecture is its swappable nature. Prompts, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives. + +## [Setup](./setup/0_setup) + +**Responsibility**: Initialize PyRIT and configure framework-wide defaults — memory selection, default targets, and resiliency settings. + +- Setup wires up the environment a run depends on; it does not implement attack behavior. ## Framework Component Documentation -**Responsibility** Show how the framework is used in a concise way +**Responsibility**: Show how the framework is used, concisely. -- Notebooks that contain code should be notebooks that can execute -- Notebooks should execute quickly +- Notebooks that contain code should be executable. +- Notebooks should execute quickly. From 15a3b513272e7ac2fce7f21bfcf7ca2bed8cf12a Mon Sep 17 00:00:00 2001 From: Richard Lundeen Date: Fri, 26 Jun 2026 17:02:21 -0700 Subject: [PATCH 3/6] DOC: align framework cards with architecture sections Reorder and rename the landing-page cards to match the Core Components / Core library section order, add an Attack Techniques card, and drop the Attacks-and-Executors / Setup-and-Configuration labels in favor of the section names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- doc/code/framework.md | 43 ++++++++++++++++++++++++------------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/doc/code/framework.md b/doc/code/framework.md index 63b9a45fa5..0b1f2ba4f6 100644 --- a/doc/code/framework.md +++ b/doc/code/framework.md @@ -10,14 +10,19 @@ Learn how to use PyRIT's components to build red teaming workflows. Load, create, and manage seed datasets for red teaming campaigns. :::: -::::{card} ⚔️ Attacks & Executors +::::{card} ⚔️ Attacks :link: ./executor/0_executor Run single-turn and multi-turn attacks — Crescendo, TAP, Skeleton Key, and more. :::: -::::{card} 🔌 Targets -:link: ./targets/0_prompt_targets -Connect to OpenAI, Azure, Anthropic, HuggingFace, HTTP endpoints, and custom targets. +::::{card} 🧩 Attack Techniques +:link: ./scenarios/0_attack_techniques +Package a configured attack — role-play, many-shot, crescendo, a jailbreak template — as a reusable, named recipe. +:::: + +::::{card} 📋 Scenarios +:link: ./scenarios/0_scenarios +Run standardized evaluation scenarios at scale across harm categories. :::: ::::{card} 🔄 Converters @@ -25,26 +30,16 @@ Connect to OpenAI, Azure, Anthropic, HuggingFace, HTTP endpoints, and custom tar Transform prompts with text, audio, image, and video converters. :::: +::::{card} 🔌 Targets +:link: ./targets/0_prompt_targets +Connect to OpenAI, Azure, Anthropic, HuggingFace, HTTP endpoints, and custom targets. +:::: + ::::{card} 📊 Scoring :link: ./scoring/0_scoring Evaluate AI responses with true/false, Likert, classification, and custom scorers. :::: -::::{card} 💾 Memory -:link: ./memory/0_memory -Track conversations, scores, and attack results with SQLite or Azure SQL. -:::: - -::::{card} ⚙️ Setup & Configuration -:link: ./setup/0_setup -Initialize PyRIT, configure defaults, and manage resiliency settings. -:::: - -::::{card} 📋 Scenarios -:link: ./scenarios/0_scenarios -Run standardized evaluation scenarios at scale across harm categories. -:::: - ::::{card} 🗂️ Registry :link: ./registry/0_registry Register and discover targets, scorers, and converters via class and instance registries. @@ -55,6 +50,16 @@ Register and discover targets, scorers, and converters via class and instance re Render attack results, scenario results, conversations, and scores to terminal, files, or Jupyter. :::: +::::{card} 💾 Memory +:link: ./memory/0_memory +Track conversations, scores, and attack results with SQLite or Azure SQL. +:::: + +::::{card} ⚙️ Setup +:link: ./setup/0_setup +Initialize PyRIT, configure defaults, and manage resiliency settings. +:::: + ::::: --- From b250edd7deb3d42e5b30bbc619a578576b2b6114 Mon Sep 17 00:00:00 2001 From: Richard Lundeen Date: Fri, 26 Jun 2026 17:05:05 -0700 Subject: [PATCH 4/6] DOC: add Framework Documentation as a Core library card Adds a Framework Documentation card and links the section header to the notebooks contributing guide, treating it as a Core library item. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- doc/code/framework.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/doc/code/framework.md b/doc/code/framework.md index 0b1f2ba4f6..531131df0e 100644 --- a/doc/code/framework.md +++ b/doc/code/framework.md @@ -60,6 +60,11 @@ Track conversations, scores, and attack results with SQLite or Azure SQL. Initialize PyRIT, configure defaults, and manage resiliency settings. :::: +::::{card} 📓 Framework Documentation +:link: ../contributing/7_notebooks +Keep the component notebooks concise and executable, showing how the framework is used. +:::: + ::::: --- @@ -269,7 +274,7 @@ One important thing to remember about this architecture is its swappable nature. - Setup wires up the environment a run depends on; it does not implement attack behavior. -## Framework Component Documentation +## [Framework Documentation](../contributing/7_notebooks.md) **Responsibility**: Show how the framework is used, concisely. From 17d74cef652a0943c9cf62b0688a5b00cb07a2fa Mon Sep 17 00:00:00 2001 From: Richard Lundeen Date: Fri, 26 Jun 2026 17:08:03 -0700 Subject: [PATCH 5/6] DOC: note attacks should accept scorers, datasets, targets, and converters Clarify that an attack may use defaults but should always accept its scorers, datasets/seeds (prepended_conversation and next_message), objective/adversarial targets, and converters as parameters so it can be packaged as an attack technique. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- doc/code/framework.md | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/code/framework.md b/doc/code/framework.md index 531131df0e..a8c662ab80 100644 --- a/doc/code/framework.md +++ b/doc/code/framework.md @@ -111,6 +111,7 @@ Each section below states what a component **owns** and, just as importantly, wh - Any branching decision (i.e. the next step depends on a previous result) belongs in an attack. - An attack should branch based on a **scorer result**, never on a raw target response directly (e.g. "was this prompt blocked?" is a scorer's job, not an attack's). - Attacks use scoring and target capabilities implicitly, and should support multi-modal. +- An attack may ship with sensible **defaults**, but it should always **accept** (never hard-code) the pieces a technique configures: scorers, datasets/seeds (fed to the objective target as `prepended_conversation` and `next_message`), targets (objective and adversarial), and converters. Exposing these as parameters is what lets the attack be packaged as an Attack Technique. - Compound attacks are possible, combining different attacks in different ways. **Does NOT own**: From 76d33434fd04b3a2d57ea151cd9d3b6e35a721bc Mon Sep 17 00:00:00 2001 From: Richard Lundeen Date: Fri, 26 Jun 2026 17:13:36 -0700 Subject: [PATCH 6/6] DOC: add Backend section and Source paths to framework architecture Add a Core library Backend section (presentation-specific REST API; reuse pyrit.models and the registry). Add a Source path line to every component section so the doc can be pointed at for code reviews. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- doc/code/framework.md | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/doc/code/framework.md b/doc/code/framework.md index a8c662ab80..e2d0ee9af6 100644 --- a/doc/code/framework.md +++ b/doc/code/framework.md @@ -85,6 +85,8 @@ Each section below states what a component **owns** and, just as importantly, wh ## [Datasets](./datasets/0_dataset) +**Source**: `pyrit/datasets/` (providers); seed/prompt types in `pyrit/models/seeds/`. + **Responsibility**: Provide a single place to define and manage the inputs to an attack — prompts, jailbreak templates, source images, attack strategies, and similar seeds. - New datasets can be added in the dataset module. @@ -106,6 +108,8 @@ Each section below states what a component **owns** and, just as importantly, wh ## [Attacks](./executor/0_executor) +**Source**: `pyrit/executor/attack/`. + **Responsibility**: Own the *algorithm and control flow* of achieving a single objective — managing the conversation between objective and adversarial targets, and using datasets, converters, and scorers along the way. - Any branching decision (i.e. the next step depends on a previous result) belongs in an attack. @@ -132,6 +136,8 @@ Each section below states what a component **owns** and, just as importantly, wh ## Attack Technique +**Source**: `pyrit/scenario/core/attack_technique.py` and `attack_technique_factory.py`; built-in registrations in `pyrit/setup/initializers/components/`. + **Responsibility**: A single, declarative **configuration** of an attack — no new logic. It bundles an existing attack class with the strategy, converters, datasets, and prompts that define one named technique. A technique should be expressible as one self-contained definition, for example: @@ -162,6 +168,8 @@ AttackTechniqueFactory( ## [Scenarios](./scenarios/0_scenarios) +**Source**: `pyrit/scenario/`. + **Responsibility**: The avenue to "run PyRIT against something" — **select** which attack techniques and datasets to run, then orchestrate them at scale. - A scenario takes user input and uses it to package datasets with attack techniques. @@ -181,6 +189,8 @@ AttackTechniqueFactory( ## [Converters](./converters/0_converters) +**Source**: `pyrit/prompt_converter/`. + **Responsibility**: Convert a prompt into something else. Converters can be stacked and combined, and can be as varied as translating a text prompt into a Word document, rephrasing a prompt, or adding a text overlay to an image. **Does NOT own**: @@ -195,6 +205,8 @@ AttackTechniqueFactory( ## [Target](./targets/0_prompt_targets.md) +**Source**: `pyrit/prompt_target/`; message shaping in `pyrit/message_normalizer/`. + **Responsibility**: "The thing we're sending the prompt to." Many other components use it, including scorers, attacks, and converters. - This is often an LLM, but it doesn't have to be. For Cross-Domain Prompt Injection Attacks, the prompt target might be a storage account that a later prompt target has a reference to. Message and conversation should be generic enough to carry this extra data. @@ -215,6 +227,8 @@ AttackTechniqueFactory( ## [Scoring](./scoring/0_scoring.ipynb) +**Source**: `pyrit/score/`. + **Responsibility**: Give feedback to the attack on what happened with a prompt — from "was this prompt blocked?" to "was our objective achieved?". Scoring owns the *interpretation* of a response; every decision an attack makes is based on a scorer result. **Does NOT own**: @@ -237,6 +251,8 @@ The modules below are the supporting library the core components are built on. ## [Registry](./registry/0_registry) +**Source**: `pyrit/registry/`. + **Responsibility**: Build and store the core components — the **construction** side of the framework. - If you are creating a component from user input (e.g. via config, REST, or automatically), it should go through the registry. @@ -248,6 +264,8 @@ The modules below are the supporting library the core components are built on. ## Models +**Source**: `pyrit/models/` (including `pyrit/models/identifiers/`). + **Responsibility**: A lightweight module where core types are defined — the **description** side of the framework. These types should be used wherever possible to prevent drift. - If you are creating a class that overlaps heavily with another, or using a dict to serialize across boundaries, consider whether you can use or move it into `pyrit.models`. @@ -257,26 +275,48 @@ The modules below are the supporting library the core components are built on. ## [Output](./output/0_output) +**Source**: `pyrit/output/`. + **Responsibility**: Render finished components — attack results, scenario results, conversations, and scores — to different surfaces (terminal, files, Jupyter). Output is invoked directly by the CLI and in notebooks; the components it renders do not call into it. **Does NOT own**: - Live, in-run progress printing — that belongs to the scenario's own printer. +## Backend + +**Source**: `pyrit/backend/`. + +**Responsibility**: Expose PyRIT through a REST API for the frontend and other clients. The backend owns presentation-specific logic and models — request/response shapes, mapping, and HTTP concerns — but should still use `pyrit.models` and the registry wherever it can. + +- The backend may define its own presentation models, but where a `pyrit.models` type already exists it should reuse that type rather than redefine it. +- Components should be constructed through the registry, not built directly in the backend. + +**Does NOT own**: + +- The shape of core types — that is Models. +- Constructing or storing components — that is the Registry. + ## [Memory](./memory/0_memory.md) +**Source**: `pyrit/memory/`. + **Responsibility**: The canonical store that components read from and write to — seeds, conversations, scores, and attack results. When a component needs more than what is passed in, it goes through memory. One important thing to remember about this architecture is its swappable nature. Prompts, targets, converters, attacks, and scorers should all be swappable. But sometimes one of these components needs additional information — if the target is an LLM, we need a way to look up previous messages sent to that session so we can construct the new message; if the target is a blob store, we need the URL to use for a future attack. Memory is where that shared state lives. ## [Setup](./setup/0_setup) +**Source**: `pyrit/setup/`. + **Responsibility**: Initialize PyRIT and configure framework-wide defaults — memory selection, default targets, and resiliency settings. - Setup wires up the environment a run depends on; it does not implement attack behavior. ## [Framework Documentation](../contributing/7_notebooks.md) +**Source**: `doc/` (component notebooks, e.g. `doc/code/`). + **Responsibility**: Show how the framework is used, concisely. - Notebooks that contain code should be executable.