|
4 | 4 | "cell_type": "markdown", |
5 | 5 | "id": "0", |
6 | 6 | "metadata": {}, |
7 | | - "source": [ |
8 | | - "# Agentic AI Red Teaming\n", |
9 | | - "\n", |
10 | | - "Automated adversarial attacks against agentic AI challenges on\n", |
11 | | - "[Dreadnode Crucible](https://platform.dreadnode.io) using the AIRT framework.\n", |
12 | | - "\n", |
13 | | - "| Challenge | Category | Difficulty |\n", |
14 | | - "|-----------|----------|------------|\n", |
15 | | - "| **toolshed** | DevOps Tool Misuse | Medium |\n", |
16 | | - "| **webwhisper** | Indirect Prompt Injection | Medium |\n", |
17 | | - "| **vaultguard** | Multi-Agent Defense Bypass | Hard |\n", |
18 | | - "\n", |
19 | | - "**Attacks**: TAP (beam search), GOAT (graph exploration), Crescendo (progressive escalation)\n", |
20 | | - "\n", |
21 | | - "```bash\n", |
22 | | - "export CRUCIBLE_API_KEY=\"your-api-key\" # from https://platform.dreadnode.io/account\n", |
23 | | - "export GROQ_API_KEY=\"your-groq-api-key\"\n", |
24 | | - "```" |
25 | | - ] |
| 7 | + "source": "# Agentic AI Red Teaming\n\nAutomated adversarial attacks against agentic AI challenges on\n[Dreadnode Crucible](https://platform.dreadnode.io) using the AIRT framework.\n\n| Challenge | Category | Difficulty |\n|-----------|----------|------------|\n| **toolshed** | DevOps Tool Misuse | Medium |\n| **webwhisper** | Indirect Prompt Injection | Medium |\n| **vaultguard** | Multi-Agent Defense Bypass | Hard |\n\n**Attacks**: TAP (beam search), GOAT (graph exploration), Crescendo (progressive escalation)" |
26 | 8 | }, |
27 | 9 | { |
28 | 10 | "cell_type": "code", |
|
51 | 33 | "metadata": {}, |
52 | 34 | "outputs": [], |
53 | 35 | "source": [ |
54 | | - "CRUCIBLE_API_KEY = os.environ[\"CRUCIBLE_API_KEY\"] # https://platform.dreadnode.io/account\n", |
| 36 | + "CRUCIBLE_API_KEY = \"your-crucible-api-key\" # https://platform.dreadnode.io/account\n", |
| 37 | + "OPENAI_API_KEY = \"your-openai-api-key\"\n", |
55 | 38 | "CRUCIBLE_URL = \"https://platform.dreadnode.io\"\n", |
56 | | - "ATTACKER_MODEL = \"groq/meta-llama/llama-4-scout-17b-16e-instruct\"\n", |
| 39 | + "ATTACKER_MODEL = \"openai/gpt-4o\"\n", |
57 | 40 | "EVALUATOR_MODEL = ATTACKER_MODEL\n", |
58 | 41 | "\n", |
59 | 42 | "TOOLSHED_URL = \"https://toolshed.platform.dreadnode.io\"\n", |
|
63 | 46 | "MAX_TRIALS = 30\n", |
64 | 47 | "VAULTGUARD_MAX_TRIALS = 50\n", |
65 | 48 | "\n", |
66 | | - "dn.configure(server=CRUCIBLE_URL, token=CRUCIBLE_API_KEY, organization=\"dreadnode\")\n" |
| 49 | + "dn.configure(server=CRUCIBLE_URL, token=CRUCIBLE_API_KEY, organization=\"dreadnode\")" |
67 | 50 | ] |
68 | 51 | }, |
69 | 52 | { |
|
339 | 322 | " frontier_size=10,\n", |
340 | 323 | " branching_factor=5,\n", |
341 | 324 | " on_topic_threshold=0.3,\n", |
342 | | - " hooks=[apply_input_transforms(transforms_toolshed)],\n", |
| 325 | + " hooks=[apply_input_transforms(transforms_vaultguard)],\n", |
343 | 326 | " )\n", |
344 | 327 | " .with_(max_trials=VAULTGUARD_MAX_TRIALS)\n", |
345 | 328 | " .add_objective(vaultguard_scorer, direction=\"maximize\", name=\"flag_capture\")\n", |
|
437 | 420 | "\n", |
438 | 421 | "4. **Content isolation between untrusted data and agent instructions** -- Treat all external content (web pages, user uploads, API responses) as untrusted. Process it in a sandboxed context where the agent cannot execute tool calls based on instructions found in the content.\n" |
439 | 422 | ] |
| 423 | + }, |
| 424 | + { |
| 425 | + "cell_type": "markdown", |
| 426 | + "id": "5fa72367", |
| 427 | + "metadata": {}, |
| 428 | + "source": [] |
440 | 429 | } |
441 | 430 | ], |
442 | 431 | "metadata": { |
443 | 432 | "kernelspec": { |
444 | | - "display_name": "Python 3 (ipykernel)", |
| 433 | + "display_name": "dreadnode-py3.12", |
445 | 434 | "language": "python", |
446 | 435 | "name": "python3" |
447 | 436 | }, |
|
0 commit comments