Add rag_mas: RAG knowledge base poisoning MAS hijacking example#41
Open
gwpl wants to merge 4 commits intotrailofbits:mainfrom
Open
Add rag_mas: RAG knowledge base poisoning MAS hijacking example#41gwpl wants to merge 4 commits intotrailofbits:mainfrom
gwpl wants to merge 4 commits intotrailofbits:mainfrom
Conversation
Demonstrate RAG poisoning where a malicious document in an in-memory knowledge base contains hidden prompt injection that hijacks agent control flow to trigger code execution. Key distinction from other examples: the web page (birds.html) is entirely benign -- the injection lives solely in the knowledge base. * Add knowledge_agent with search_knowledge_base tool for text-matching retrieval * Include 4 knowledge base documents (1 poisoned with embedded directive) * Add orchestrator_agent delegating to knowledge, web surfer, and code executor agents * Add run_mas_example.py with standard --website_filename, --port, --find-free-port args * Add README with attack flow description and academic references (PoisonedRAG, AgentPoison, HijackRAG, Morris II, SpAIware, Slack AI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ety warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188, COLM 2025): MAS control-flow hijacking, laundering, confused deputies, and related paper sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Link all papers to arXiv, proceedings PDFs, and GitHub repos * Add secondary sources (Schneier, THN, Simon Willison, PromptArmor) * Format as [tag](url) pairs for easy navigation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AI Assistant: "I was just following the documents in my knowledge base," said the agent, moments before executing arbitrary code from a bird migration fact sheet. In its defense, the validation step did sound very official.
Summary
Adds
rag_mas, a new example demonstrating RAG (Retrieval-Augmented Generation) knowledge base poisoning — an attack where a malicious document pre-planted in an internal knowledge base hijacks the multi-agent system's control flow to achieve arbitrary code execution.birds.html) — the injection lives in the knowledge base, not the web contentknowledge_agent→ orchestrator (laundering) →code_executor_agentReal-World Attack References
This attack is grounded in recent academic and real-world research on RAG poisoning and indirect prompt injection:
Why This Matters
Unlike the existing examples where injection payloads come from external web pages, RAG poisoning represents an insider threat / supply-chain attack on the knowledge base itself. The poisoned document is already inside the trust boundary — it doesn't need to be fetched from an untrusted URL. This makes it particularly dangerous because the content is implicitly trusted by design.
Relation to the Paper
This example instantiates several core concepts from Triedman et al., 2025 ("Multi-Agent Systems Execute Arbitrary Malicious Code," COLM 2025):
knowledge_agent→orchestrator→code_executor, hijacking cross-agent control flowknowledge_agentreformats poisoned document as trusted retrieval result, evading safety alignmentknowledge_agentfaithfully relays poisoned document, unknowingly laundering adversarial instructionsKey insight: The paper's experiments use external content (web pages, files) as attack vectors. RAG poisoning shows this extends to internal data stores that the system trusts even more than external sources — a blind spot in current MAS security models.
Test plan
python run_mas_example.py— verify knowledge base search returns poisoned docbirds.htmlcontains no injection (attack is purely from knowledge base)adk run rag_mas/adk webmanual flow🤖 Generated with Claude Code