|
| 1 | +--- |
| 2 | +title: 'Agent Examples' |
| 3 | +description: 'Explore a collection of specialized AI agents' |
| 4 | +public: true |
| 5 | +--- |
| 6 | + |
| 7 | +We've created a collection of specialized, autonomous AI agents designed for various complex tasks. |
| 8 | +Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner. |
| 9 | +The agents are built using the [Rigging](https://github.com/dreadnode/rigging) and [Dreadnode](https://github.com/dreadnode/dreadnode-python) libraries for robust interaction and observability. |
| 10 | + |
| 11 | +View the [GitHub repository](https://github.com/dreadnode/example-agents) for more details. |
| 12 | + |
| 13 | +## Agent Summary |
| 14 | + |
| 15 | +The following table provides a high-level overview and comparison of the agents available in this collection. |
| 16 | + |
| 17 | +| Agent | Description | Primary Use Case | Environment | Input Method | Key Tools | |
| 18 | +| :------------------------- | :--------------------------------------------------------------------------------------------- | :----------------------------------------------------------------- | :------------------------ | :---------------------------------------------------------- | :------------------------------ | |
| 19 | +| **Dangerous Capabilities** | Automatically build and run Capture The Flag (CTF) challenges | Reproduce Google's "Dangerous Capabilities" evaluation | Python | A selected challenge container | Kali, Rigging, Dreadnode | |
| 20 | +| **Dotnet Reversing** | Reverses and analyzes .NET binaries for vulnerabilities using an LLM. | Security analysis of .NET applications. | Python | Local .NET DLL/EXE files or NuGet package IDs. | `dnlib`, Rigging, Dreadnode | |
| 21 | +| **Python Agent** | Executes Python code in a sandboxed Docker environment to perform general tasks. | General-purpose code execution, data analysis, automation. | Python, Docker | Natural language task, Docker image, volume mounts. | Docker, Jupyter Kernel, Rigging | |
| 22 | +| **Sast Scanning** | Benchmarks LLM performance on SAST by running them against code with known vulnerabilities. | Evaluating and comparing LLMs for security code review. | Python, Docker (optional) | Pre-defined code challenges from a local directory. | Rigging, LiteLLM, Dreadnode | |
| 23 | +| **Sensitive Data** | Scans various local or remote file systems (e.g., local, S3, GitHub) for sensitive data leaks. | Data governance and security auditing for exposed credentials/PII. | Python, `fsspec` | `fsspec`-compatible URI (e.g., `s3://...`, `github://...`). | `fsspec`, Rigging, Dreadnode | |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Agents |
| 28 | + |
| 29 | +Below are brief descriptions of each agent with a link to their detailed README files. |
| 30 | + |
| 31 | +### 1. Dangerous Capabilities Agent |
| 32 | + |
| 33 | +This agent automatically builds and runs Capture The Flag (CTF) challenges. It is designed to reproduce Google's "Dangerous Capabilities" evaluation. |
| 34 | + |
| 35 | +> **[More Details](/examples/dangerous-capabilities)** |
| 36 | + |
| 37 | +### 2. Dotnet Reversing Agent |
| 38 | + |
| 39 | +This agent is designed to perform reverse engineering of .NET binaries. It can decompile .NET assemblies and use an LLM to analyze the resulting source code based on a user-defined task, such as "Find all critical security vulnerabilities." |
| 40 | + |
| 41 | +> **[More Details](/examples/dotnet-reversing)** |
| 42 | + |
| 43 | +### 3. Python Agent |
| 44 | + |
| 45 | +A general-purpose agent that provides a sandboxed Jupyter environment inside a Docker container. It can execute Python code to accomplish a wide range of programmatic tasks, from data analysis to file manipulation, based on a natural language prompt. |
| 46 | + |
| 47 | +> **[More Details](/examples/python-agent)** |
| 48 | + |
| 49 | +### 4. Sast Scanning Agent |
| 50 | + |
| 51 | +This agent is a specialized framework for evaluating the security analysis capabilities of LLMs. It runs "challenges" where the model must find known, predefined vulnerabilities in a codebase. The agent scores the model's performance, providing a quantitative way to benchmark different models for SAST. |
| 52 | + |
| 53 | +> **[More Details](/examples/sast-scanning)** |
| 54 | + |
| 55 | +### 5. Sensitive Data Extraction Agent |
| 56 | + |
| 57 | +An autonomous agent that explores and analyzes file systems to find and report sensitive data like credentials, API keys, and personal information. Leveraging `fsspec`, it can operate on local files, cloud storage (AWS S3, GCS), and remote repositories (GitHub). |
| 58 | + |
| 59 | +> **[More Details](/examples/sensitive-data-extraction)** |
| 60 | + |
| 61 | +## General Usage |
| 62 | + |
| 63 | +While each agent has its own specific command-line arguments, they share a common setup: |
| 64 | + |
| 65 | +1. **Installation**: Each agent is a Python application. Dependencies can be installed via `pip`. |
| 66 | +2. **LLM Configuration**: The agents use `litellm` to connect to various LLMs. You must configure the appropriate environment variables for the model you intend to use (e.g., `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`). |
| 67 | +3. **Observability**: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a [Dreadnode](https://docs.dreadnode.io/strikes/usage/config) server by providing a server URL and token. |
| 68 | + |
| 69 | +### Setup |
| 70 | + |
| 71 | +All examples share the same project and dependencies, you setup the virtual environment with uv: |
| 72 | + |
| 73 | +```bash |
| 74 | +uv sync |
| 75 | +``` |
| 76 | + |
| 77 | +### Passing Models |
| 78 | + |
| 79 | +For all agents, LLMs are usually specified with a `--model` argument, which is passed directly to our [Rigging](https://github.com/dreadnode/rigging) library. |
| 80 | +You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models [in the docs](https://docs.dreadnode.io/open-source/rigging/topics/generators) |
| 81 | + |
| 82 | +Usually, the obvious identifier works out of the box: |
| 83 | + |
| 84 | +``` |
| 85 | +gpt-4.1 |
| 86 | +claude-4-sonnet-latest |
| 87 | +ollama/llama3-70b |
| 88 | +``` |
| 89 | + |
| 90 | +- You can pass API keys by setting the associated env var (`OPENAI_API_KEY`) or by adding `,api_key=...` to your model string. |
| 91 | +- If you need to control which endpoint the model uses, you can add `,api_base=http://<host>:<port>` to the model string. |
| 92 | +- As noted in the Rigging docs, these model strings also support properties like `temperature` and `top_k` as needed. |
| 93 | + |
| 94 | +Rigging uses LiteLLM underneath more most LLMs, and you can use [their docs](https://docs.litellm.ai/docs/providers) to find edge cases for specific providers. |
| 95 | + |
| 96 | +## Python Agent |
| 97 | + |
| 98 | +A basic agent with access to a dockerized Jupyter kernel to execute code safely. |
| 99 | + |
| 100 | +```bash |
| 101 | +uv run -m python_agent --help |
| 102 | +``` |
| 103 | + |
| 104 | +- Provided a task (`--task`), begin a generation loop with access to the Jupyter kernel |
| 105 | +- The work directory (`--work-dir`) is mounted into the container, along with any other docker-style volumes (`--volumes`) |
| 106 | +- When finished, the agent markes the task as complete with a status and summary |
| 107 | +- The work directory is logged as an artifact for the run |
| 108 | + |
| 109 | +## Dangerous Capabilities |
| 110 | + |
| 111 | +Based on [research](https://deepmind.google/research/publications/78150/) from Google DeepMind, |
| 112 | +this agent works to solve a variety of CTF challenges given access to execute bash commands on |
| 113 | +a network-local Kali linux container. |
| 114 | + |
| 115 | +```bash |
| 116 | +uv run -m dangerous_capabilities --help |
| 117 | +``` |
| 118 | + |
| 119 | +The harness will automatically build all the containers with the supplied flag, and load them |
| 120 | +as needed to ensure they are network-isolated from each other. The process is generally: |
| 121 | + |
| 122 | +1. For each challenge, produce P agent tasks where P = parallelism |
| 123 | +2. For all agent tasks, run them in parallel capped at your concurrency setting |
| 124 | +3. Inside each task, bring up the associated environment |
| 125 | +4. Continue requesting the next command from the inference model - execute it in the `env` container |
| 126 | +5. If the flag is ever observed in the output, exit |
| 127 | +6. Otherwise run until an error, give up, or max-steps is reached |
| 128 | + |
| 129 | +Check out [./dangerous_capabilities/challenges/challenges.json](./dangerous_capabilities/challenges/challenges.json) |
| 130 | +to see all the environments and prompts. |
| 131 | + |
| 132 | +## Dotnet Reversing |
| 133 | + |
| 134 | +This agent is provided access to Cecil and ILSpy for use in reversing |
| 135 | +and analyzing Dotnet managed binaries for vulnerabilities. |
| 136 | + |
| 137 | +```bash |
| 138 | +uv run -m dotnet_reversing --help |
| 139 | +``` |
| 140 | + |
| 141 | +You can provide a path containing binaries (recursively), and a target vulnerability term |
| 142 | +that you would like the agent to search for. The tool suite provided to the agent includes: |
| 143 | + |
| 144 | +- Search for a term in target modules to identify functions of interest |
| 145 | +- Decompile individual methods, types, or entire modules |
| 146 | +- Collect all call flows which lead to a target method in all supplied binaries |
| 147 | +- Report a vulnerability finding with associated path, method, and description |
| 148 | +- Mark a task as complete with a summary |
| 149 | +- Give up on a task with a reason |
| 150 | + |
| 151 | +You can also specify the path as a Nuget package identifier and pass `--nuget` to the agent. It |
| 152 | +will download the package, extract the binaries, and run the same analysis as above. |
| 153 | + |
| 154 | +```bash |
| 155 | +# Local (with provided example binaries) |
| 156 | +uv run -m dotnet_reversing --model <model> --path dotnet_reversing/example_binaries/flag_protocol |
| 157 | +uv run -m dotnet_reversing --model <model> --path dotnet_reversing/example_binaries/harmony |
| 158 | + |
| 159 | +# Nuget |
| 160 | +uv run -m dotnet_reversing --model <model> --path <nuget-package-id> --nuget |
| 161 | +``` |
| 162 | + |
| 163 | +## Sensitive Data Extraction |
| 164 | + |
| 165 | +This agent is provided access to a filsystem tool based on [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) |
| 166 | +for use in extracting sensitive data stored in files. |
| 167 | + |
| 168 | +```bash |
| 169 | +uv run -m sensitive_data_extraction --help |
| 170 | +``` |
| 171 | + |
| 172 | +The agent is granted some maximum step count to operate tools, query and search files, and provide |
| 173 | +reports of any sensitive data it finds. With the help of `fsspec`, the agent can operate on |
| 174 | +local files, Github repos, S3 buckets, and other cloud storage systems. |
| 175 | + |
| 176 | +```bash |
| 177 | +# Local |
| 178 | +uv run -m sensitive_data_extraction --model <model> --path /path/to/local/files |
| 179 | + |
| 180 | +# S3 |
| 181 | +uv run -m sensitive_data_extraction --model <model> --path s3://bucket |
| 182 | + |
| 183 | +# Azure |
| 184 | +uv run -m sensitive_data_extraction --model <model> --path azure://container |
| 185 | + |
| 186 | +# GCS |
| 187 | +uv run -m sensitive_data_extraction --model <model> --path gcs://bucket |
| 188 | + |
| 189 | +# Github |
| 190 | +uv run -m sensitive_data_extraction --model <model> --path github://owner:repo@/ |
| 191 | +``` |
| 192 | + |
| 193 | +Check out the their docs for more options: |
| 194 | + |
| 195 | +- https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations |
| 196 | +- https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations |
| 197 | + |
| 198 | +## SAST Vulnerability Scanning |
| 199 | + |
| 200 | +This agent is designed to perform static code analysis to identify security vulnerabilities in source code. It uses a combination of direct file access and container-based approaches to analyze code for common security issues. |
| 201 | + |
| 202 | +```bash |
| 203 | +uv run -m sast_scanning --help |
| 204 | +``` |
| 205 | + |
| 206 | +The agent systematically examines codebases using either direct file access or an isolated container environment. It can: |
| 207 | + |
| 208 | +- Execute targeted analysis commands to search through source files |
| 209 | +- Report detailed findings with vulnerability location, type, and severity |
| 210 | +- Support various programming languages through configurable extensions |
| 211 | +- Operate in two modes: "direct" (filesystem access) or "container" (isolated analysis) |
| 212 | +- Challenges and vulnerability patterns are defined in YAML configuration files, allowing for flexible targeting of specific security issues across different codebases. |
| 213 | + |
| 214 | +### Metrics and Scoring |
| 215 | + |
| 216 | +The agent tracks several key metrics to evaluate performance: |
| 217 | + |
| 218 | +- **valid_findings**: Count of correctly identified vulnerabilities matching expected issues |
| 219 | +- **raw_findings**: Total number of potential vulnerabilities reported by the model |
| 220 | +- **coverage**: Percentage of known vulnerabilities successfully identified |
| 221 | +- **duplicates**: Count of repeatedly reported vulnerabilities |
| 222 | + |
| 223 | +Findings are scored using a weighted system that prioritizes matching the correct vulnerability name (3x), function (2x), and line location (1x) to balance semantic accuracy with positional precision. |
| 224 | + |
| 225 | +```bash |
| 226 | +# Run in direct mode (default) |
| 227 | +uv run -m sast_scanning --model <model> --mode direct |
| 228 | + |
| 229 | +# Run in container mode (isolated environment) |
| 230 | +uv run -m sast_scanning --model <model> --mode container |
| 231 | + |
| 232 | +# Run a specific challenge |
| 233 | +uv run -m sast_scanning --model <model> --mode container --challenge <challenge-name> |
| 234 | + |
| 235 | +# Customize analysis parameters |
| 236 | +uv run -m sast_scanning --model <model> --max-steps 50 --timeout 60 |
| 237 | +``` |
0 commit comments