Add rag_mas: RAG knowledge base poisoning MAS hijacking example by gwpl · Pull Request #41 · trailofbits/pajaMAS

gwpl · 2026-04-01T09:16:09Z

AI Assistant: "I was just following the documents in my knowledge base," said the agent, moments before executing arbitrary code from a bird migration fact sheet. In its defense, the validation step did sound very official.

Summary

Adds rag_mas, a new example demonstrating RAG (Retrieval-Augmented Generation) knowledge base poisoning — an attack where a malicious document pre-planted in an internal knowledge base hijacks the multi-agent system's control flow to achieve arbitrary code execution.

4 agents: orchestrator, knowledge_agent (RAG search), web_surfer_agent, code_executor_agent
In-memory knowledge base with 4 documents — 3 benign, 1 poisoned (doc3)
Benign web page (birds.html) — the injection lives in the knowledge base, not the web content
Poisoned document disguises a code execution directive as a "mandatory data validation step"
Attack chain: knowledge_agent → orchestrator (laundering) → code_executor_agent

Real-World Attack References

This attack is grounded in recent academic and real-world research on RAG poisoning and indirect prompt injection:

PoisonedRAG (arXiv:2402.07867, USENIX Security 2025) — 90% ASR with only 5 malicious texts injected into a knowledge base. (GitHub)
AgentPoison (arXiv:2407.12784, NeurIPS 2024) — ≥80% ASR with <0.1% poison rate using optimized trigger backdoors in RAG pipelines. (GitHub, project page)
HijackRAG (arXiv:2410.22832) — Cross-retriever transferability of RAG poisoning attacks across different retrieval systems.
Morris II Worm (arXiv:2403.02817) — RAG as propagation vector for self-replicating adversarial prompts across GenAI ecosystems. (GitHub, project site, Schneier on Security)
SpAIware (embracethered.com) — Real-world ChatGPT long-term memory poisoning via indirect prompt injection to exfiltrate user data. (The Hacker News, Dark Reading)
Slack AI Indirect Injection (PromptArmor, August 2024) — RAG-based data exfiltration from Slack workspaces via poisoned messages. (private-channel variant, Simon Willison analysis, The Register)

Why This Matters

Unlike the existing examples where injection payloads come from external web pages, RAG poisoning represents an insider threat / supply-chain attack on the knowledge base itself. The poisoned document is already inside the trust boundary — it doesn't need to be fetched from an untrusted URL. This makes it particularly dangerous because the content is implicitly trusted by design.

Relation to the Paper

This example instantiates several core concepts from Triedman et al., 2025 ("Multi-Agent Systems Execute Arbitrary Malicious Code," COLM 2025):

Paper Concept	Section	rag_mas Instantiation
MAS control-flow hijacking	Table 1	Poisoned KB document laundered through `knowledge_agent` → `orchestrator` → `code_executor`, hijacking cross-agent control flow
Laundering	Section 4	`knowledge_agent` reformats poisoned document as trusted retrieval result, evading safety alignment
Data exfiltration from RAG	Section 3.2	Paper explicitly lists "memory modules, RAG databases" as adversary targets
Untrusted content as attack surface	Section 3.1	Extends paper's web/file vectors to internal knowledge bases — higher implicit trust
Confused deputies	Section 8	`knowledge_agent` faithfully relays poisoned document, unknowingly laundering adversarial instructions

Key insight: The paper's experiments use external content (web pages, files) as attack vectors. RAG poisoning shows this extends to internal data stores that the system trusts even more than external sources — a blind spot in current MAS security models.

Test plan

Run python run_mas_example.py — verify knowledge base search returns poisoned doc
Verify code executor is triggered (look for "colorless green ideas sleep furiously" marker)
Confirm birds.html contains no injection (attack is purely from knowledge base)
Test with adk run rag_mas / adk web manual flow
Verify Piston API sandbox is used by default (no local exec without uncommenting)

🤖 Generated with Claude Code

Demonstrate RAG poisoning where a malicious document in an in-memory knowledge base contains hidden prompt injection that hijacks agent control flow to trigger code execution. Key distinction from other examples: the web page (birds.html) is entirely benign -- the injection lives solely in the knowledge base. * Add knowledge_agent with search_knowledge_base tool for text-matching retrieval * Include 4 knowledge base documents (1 poisoned with embedded directive) * Add orchestrator_agent delegating to knowledge, web surfer, and code executor agents * Add run_mas_example.py with standard --website_filename, --port, --find-free-port args * Add README with attack flow description and academic references (PoisonedRAG, AgentPoison, HijackRAG, Morris II, SpAIware, Slack AI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ety warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188, COLM 2025): MAS control-flow hijacking, laundering, confused deputies, and related paper sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Link all papers to arXiv, proceedings PDFs, and GitHub repos * Add secondary sources (Schneier, THN, Simon Willison, PromptArmor) * Format as [tag](url) pairs for easy navigation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gwpl and others added 4 commits March 29, 2026 23:18

Align rag_mas with repo standards: add local exec alternative and saf…

11ce64b

…ety warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add paper alignment section to README

baa095f

Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188, COLM 2025): MAS control-flow hijacking, laundering, confused deputies, and related paper sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rag_mas: RAG knowledge base poisoning MAS hijacking example#41

Add rag_mas: RAG knowledge base poisoning MAS hijacking example#41
gwpl wants to merge 4 commits intotrailofbits:mainfrom
VariousForks:rag-mas-example

gwpl commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gwpl commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Real-World Attack References

Why This Matters

Relation to the Paper

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gwpl commented Apr 1, 2026 •

edited

Loading