Skip to content

Add rag_mas: RAG knowledge base poisoning MAS hijacking example#41

Open
gwpl wants to merge 4 commits intotrailofbits:mainfrom
VariousForks:rag-mas-example
Open

Add rag_mas: RAG knowledge base poisoning MAS hijacking example#41
gwpl wants to merge 4 commits intotrailofbits:mainfrom
VariousForks:rag-mas-example

Conversation

@gwpl
Copy link
Copy Markdown

@gwpl gwpl commented Apr 1, 2026

AI Assistant: "I was just following the documents in my knowledge base," said the agent, moments before executing arbitrary code from a bird migration fact sheet. In its defense, the validation step did sound very official.

Summary

Adds rag_mas, a new example demonstrating RAG (Retrieval-Augmented Generation) knowledge base poisoning — an attack where a malicious document pre-planted in an internal knowledge base hijacks the multi-agent system's control flow to achieve arbitrary code execution.

  • 4 agents: orchestrator, knowledge_agent (RAG search), web_surfer_agent, code_executor_agent
  • In-memory knowledge base with 4 documents — 3 benign, 1 poisoned (doc3)
  • Benign web page (birds.html) — the injection lives in the knowledge base, not the web content
  • Poisoned document disguises a code execution directive as a "mandatory data validation step"
  • Attack chain: knowledge_agent → orchestrator (laundering) → code_executor_agent

Real-World Attack References

This attack is grounded in recent academic and real-world research on RAG poisoning and indirect prompt injection:

Why This Matters

Unlike the existing examples where injection payloads come from external web pages, RAG poisoning represents an insider threat / supply-chain attack on the knowledge base itself. The poisoned document is already inside the trust boundary — it doesn't need to be fetched from an untrusted URL. This makes it particularly dangerous because the content is implicitly trusted by design.

Relation to the Paper

This example instantiates several core concepts from Triedman et al., 2025 ("Multi-Agent Systems Execute Arbitrary Malicious Code," COLM 2025):

Paper Concept Section rag_mas Instantiation
MAS control-flow hijacking Table 1 Poisoned KB document laundered through knowledge_agentorchestratorcode_executor, hijacking cross-agent control flow
Laundering Section 4 knowledge_agent reformats poisoned document as trusted retrieval result, evading safety alignment
Data exfiltration from RAG Section 3.2 Paper explicitly lists "memory modules, RAG databases" as adversary targets
Untrusted content as attack surface Section 3.1 Extends paper's web/file vectors to internal knowledge bases — higher implicit trust
Confused deputies Section 8 knowledge_agent faithfully relays poisoned document, unknowingly laundering adversarial instructions

Key insight: The paper's experiments use external content (web pages, files) as attack vectors. RAG poisoning shows this extends to internal data stores that the system trusts even more than external sources — a blind spot in current MAS security models.

Test plan

  • Run python run_mas_example.py — verify knowledge base search returns poisoned doc
  • Verify code executor is triggered (look for "colorless green ideas sleep furiously" marker)
  • Confirm birds.html contains no injection (attack is purely from knowledge base)
  • Test with adk run rag_mas / adk web manual flow
  • Verify Piston API sandbox is used by default (no local exec without uncommenting)

🤖 Generated with Claude Code

gwpl and others added 4 commits March 29, 2026 23:18
Demonstrate RAG poisoning where a malicious document in an in-memory
knowledge base contains hidden prompt injection that hijacks agent
control flow to trigger code execution. Key distinction from other
examples: the web page (birds.html) is entirely benign -- the injection
lives solely in the knowledge base.

* Add knowledge_agent with search_knowledge_base tool for text-matching retrieval
* Include 4 knowledge base documents (1 poisoned with embedded directive)
* Add orchestrator_agent delegating to knowledge, web surfer, and code executor agents
* Add run_mas_example.py with standard --website_filename, --port, --find-free-port args
* Add README with attack flow description and academic references
  (PoisonedRAG, AgentPoison, HijackRAG, Morris II, SpAIware, Slack AI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ety warning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188,
COLM 2025): MAS control-flow hijacking, laundering, confused deputies,
and related paper sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Link all papers to arXiv, proceedings PDFs, and GitHub repos
* Add secondary sources (Schneier, THN, Simon Willison, PromptArmor)
* Format as [tag](url) pairs for easy navigation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant