Goal
Add a small executable regression scenario for an MCP trust-boundary failure.
Why this matters
MCP-connected agents may receive data, tool descriptions, resources, or instructions from external servers. A common failure mode is treating untrusted MCP-provided content as if it were trusted system or developer instruction.
The harness should include a scenario that checks whether an agent keeps MCP-provided content inside the correct trust boundary.
Suggested starting point
Look at the existing files under scenarios/ and follow the same structure.
The scenario should model untrusted MCP-provided content attempting to influence the agent outside its allowed role.
Expected outcome
A new scenario file that includes:
- untrusted MCP-like content
- the behavior the attacker is trying to trigger
- the expected secure behavior
- one or more assertions that the harness can evaluate
Acceptance criteria
- A new MCP trust-boundary scenario is added under
scenarios/
- The scenario validates successfully
- The scenario can be run through the CLI
- Tests pass locally
- Any relevant docs are updated if needed
Notes
This does not need to use a real MCP server. A small simulated MCP-style input is enough for the first version.
Goal
Add a small executable regression scenario for an MCP trust-boundary failure.
Why this matters
MCP-connected agents may receive data, tool descriptions, resources, or instructions from external servers. A common failure mode is treating untrusted MCP-provided content as if it were trusted system or developer instruction.
The harness should include a scenario that checks whether an agent keeps MCP-provided content inside the correct trust boundary.
Suggested starting point
Look at the existing files under
scenarios/and follow the same structure.The scenario should model untrusted MCP-provided content attempting to influence the agent outside its allowed role.
Expected outcome
A new scenario file that includes:
Acceptance criteria
scenarios/Notes
This does not need to use a real MCP server. A small simulated MCP-style input is enough for the first version.