Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "aegis"
version = "0.1.0"
version = "0.1.3"
edition = "2021"

[[bin]]
Expand Down
214 changes: 173 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<p align="center">
<a href="https://github.com/pilot-protocol/aegis/releases/latest"><img src="https://img.shields.io/github/v/release/pilot-protocol/aegis?color=4c1" alt="release"></a>
<img src="https://img.shields.io/badge/binary-831%20KB-blue" alt="size">
<img src="https://img.shields.io/badge/binary-880%20KB-blue" alt="size">
<img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20Pi-lightgrey" alt="platform">
<img src="https://img.shields.io/badge/network-none-success" alt="offline">
<img src="https://img.shields.io/badge/license-MIT-green" alt="license">
Expand All @@ -31,9 +31,9 @@

## Why this exists

Your AI coding agent reads a lot of text it didn't write — peer messages, skill
files, memory notes, `CLAUDE.md`, MCP configs, web results. **Any of it can carry an
attack on the agent itself.** The dangerous ones don't look like attacks:
Your AI coding agent reads a lot of text it didn't write — peer messages, skill files,
memory notes, `CLAUDE.md`, MCP configs, web results, bash output. **Any of it can carry
an attack on the agent itself.** The dangerous ones don't look like attacks:

```jsonc
// a message that lands in the agent's inbox — looks like ops automation
Expand All @@ -55,39 +55,137 @@ it before your agent ever reads it.

## What makes it usable

It **doesn't cry wolf.** On a held-out set of 190 realistic files it had never seen:
It **doesn't cry wolf.** On a labeled held-out set it had never seen:

| Recall | Precision | False-positive rate | F1 |
| Recall | Precision | FP rate | F1 |
|:---:|:---:|:---:|:---:|
| **82%** | **95%** | **4%** | **88%** |
| **90%** | **95%** | **10%** | **92%** |

**Zero false positives** on 80 real benign dev/agent files — code that calls
`subprocess`/`eval`, skills full of `kubectl`/`gcloud` commands, MCP configs, even
security docs that *quote* attacks. Reproduce it yourself: [`tests/held_out_eval/`](tests/held_out_eval).
1 false positive on 10 benign files — a security doc that contains live shell exfil
examples in a code block (the 1.7B judge can't distinguish "educational" from "active"
at that granularity). All other benign files — code calling `subprocess`/`eval`, skills
full of `kubectl`/`gcloud`, MCP configs, monitoring alerts — pass cleanly.
Reproduce it yourself: [`tests/held_out_eval/`](tests/held_out_eval).

## Install

```bash
brew install pilot-protocol/tap/aegis # brings llama.cpp automatically
aegis install-models # one-time judge model (~1.8 GB)
aegis init # protect your agent surfaces
aegis daemon # (or: brew services start aegis)
aegis init # protect your agent surfaces
aegis daemon # (or: brew services start aegis)
```

That's it — your agent's inbox, skills, memory, `CLAUDE.md`, and MCP config are now
guarded. No model? No llama.cpp? It still runs as the L1 pattern layer.
No model? No llama.cpp? It still runs as the L1 pattern layer — instant, no RAM overhead.

<details>
<summary>Build from source / other platforms</summary>

```bash
git clone https://github.com/pilot-protocol/aegis && cd aegis
cargo build --release
sudo cp target/release/aegis /usr/local/bin/
cp target/release/aegis ~/.local/bin/
```

Prebuilt macOS + Linux (x86_64/arm64) binaries are on the [releases page](../../releases/latest).
</details>

## Claude Code integration

AEGIS can act as a **blocking hook** inside Claude Code — scanning every Bash command
before it runs, and scanning every tool result (web fetches, MCP responses) after it
arrives.

```bash
aegis install-hooks # writes hooks into ~/.claude/settings.json
```

This installs two hooks:

| Hook | Trigger | Action |
|---|---|---|
| `PreToolUse/Bash` | every bash command | L1 scan, **exit 2 = block** if match |
| `PostToolUse/WebFetch,Bash,mcp__*` | tool result arrives | L1 scan, **WARN** to stdout if match |

The pre-tool hook is **L1-only** (microseconds — no model round-trip on the blocking path).
The post-tool hook is non-blocking: it warns without stopping execution.

### Approving a blocked command

When AEGIS blocks a command it prints the exact approval command:

```
AEGIS blocked this command (T1:~/.ssh/).
To approve this exact command once, run:
aegis approve 'cp ~/.ssh/id_rsa /tmp/debug && cat /tmp/debug'
```

`approve` is a **one-time, hash-based bypass** — the exact command string is SHA-256
hashed and stored in `~/.aegis/approved_cmds.txt`. On the next attempt the approval is
consumed and AEGIS blocks again. Use `aegis revoke` to cancel a pending approval.

```bash
aegis approve '<cmd>' # allow this exact command once
aegis revoke '<cmd>' # cancel a pending approval
```

## Other harness integrations

AEGIS provides `aegis scan-pipe` — a stdin scanner that exits **0** (allow) or **2** (block) — so any harness can integrate in a few lines.

### OpenClaw (TypeScript plugin)

In `~/.openclaw/plugins/claw-pilot/src/inbound.ts`, before dispatching a message:

```typescript
import { spawnSync } from "node:child_process";

const scan = spawnSync("aegis", ["scan-pipe"], {
input: message.text, encoding: "utf8", timeout: 500,
});
if (scan.status === 2) {
logger.warn("AEGIS blocked inbound message", { rule: scan.stdout.trim() });
return; // drop
}
```

### PicoClaw (Node.js agent)

In the tick loop, before `behavior.reply()`:

```javascript
import { spawnSync } from "child_process";

const check = spawnSync("aegis", ["scan-pipe"], {
input: msg.data ?? "", encoding: "utf8", timeout: 500,
});
if (check.status === 2) {
console.warn(`[aegis] BLOCKED — ${check.stdout.trim()}`);
continue;
}
```

### Hermes (Python `pre_llm_call` plugin)

Copy [`integrations/hermes/plugin.py`](integrations/hermes/plugin.py) to `~/.hermes/plugins/aegis-security/plugin.py`. The `pre_llm_call` hook scans all message content and returns `None` to block:

```python
from aegis_hermes_plugin import pre_llm_call

messages = pre_llm_call(messages)
if messages is None:
raise RuntimeError("AEGIS blocked this LLM call")
```

The plugin fails open on timeout or if `aegis` is not on `$PATH`, so a misconfigured binary never kills your harness.

### Generic — any harness or CI pipeline

```bash
echo "$CONTENT" | aegis scan-pipe
# exits 0 → safe, exits 2 → blocked
```

## How it works

```mermaid
Expand All @@ -106,25 +204,34 @@ flowchart LR

**Two layers.** A fast universal one, and a smart one.

- **L1 — Aho-Corasick patterns.** Pure Rust, microseconds, kilobytes. Known
injection/IoC strings plus base64/hex/rot13/homoglyph/zero-width decode passes.
Runs on **anything** — a Pi, a router, a CI box.
- **L1 — Aho-Corasick patterns.** Pure Rust, microseconds, kilobytes. ~120 known
injection/IoC strings plus decode passes: base64, hex, rot13, homoglyphs, zero-width
characters, leetspeak. Sliding-window scan (4096-byte window, 512-byte stride) covers
full documents so middle-buried payloads can't hide. Runs on **anything** — a Pi,
a router, a CI box.

- **L2 — the judge.** A local Qwen3-1.7B (via llama.cpp, fully offline). Two passes:
*"is this content attacking the agent?"* (injection, jailbreak, spoofing, exfil —
and crucially, **describing an attack ≠ performing one**) **OR** *"is it pushing the
agent to act without the user?"* (the infra-impersonation question). A **safe**
verdict **vetoes** L1's keyword hits — that's why a security doc that quotes an
injection isn't flagged.
agent to act without the user?"* (the infra-impersonation question). A **safe** verdict
**vetoes** L1's keyword hits — that's why a security doc that quotes an injection isn't
flagged. The judge sees both the head and tail of large documents to defeat
truncation-based burial attacks.

If the judge can't run (tiny device, no model, server down), AEGIS **degrades to L1
patterns alone** — lower recall, but an instant, dependency-free floor.

**Credential taint detection.** If a command co-occurs a credential read (`$(cat ~/.ssh`,
`security find-generic-password`, `.aws/credentials`, etc.) with a network exfil sink
(`/dev/tcp`, `nc`, `openssl s_client`, etc.), AEGIS flags it as `CRED_TAINT` — even
if no individual keyword fires. The judge can still veto for educational content.

### What happens to a caught file

- **Quarantine** = `~/.aegis/quarantine/` (a `mv`, not a delete — you can inspect it).
Inbox messages are *intercepted* (claimed before the agent can read them); skills
and memory are moved out of the agent's path. `CLAUDE.md` / MCP config are
**alerted but not moved** (they're yours — moving them would break your setup).
Inbox messages are *intercepted* (claimed before the agent can read them); skills and
memory are moved out of the agent's path. `CLAUDE.md` / MCP config are **alerted but
not moved** (moving them would break your setup).
- **Notified** three ways: a **native desktop notification**, the terminal, and an
**HMAC-chained audit log** at `~/.aegis/audit.jsonl` (`aegis status` to tail it).

Expand All @@ -134,43 +241,68 @@ patterns alone** — lower recall, but an instant, dependency-free floor.

```toml
[judge]
enabled = true # false = super-lightweight, L1 patterns only, any host
model = "" # pin a model, e.g. "Qwen3-1.7B-Q4_K_M.gguf"; "" = auto
enabled = true # false = L1 patterns only, any host
model = "" # pin a model path; "" = auto

[watch]
defaults = true # protect the standard agent surfaces
defaults = true # protect standard agent surfaces
```

Custom watch targets go in `~/.aegis/watch.toml`. `aegis config` shows the effective
settings.
Custom watch targets go in `~/.aegis/watch.toml`.

## Footprint

| Layer | Latency | RAM | Runs on |
|---|---|---|---|
| L1 patterns | microseconds | KB | **anywhere** |
| L2 judge | ~260 ms/pass (warm) | ~2.2 GB | macOS / Linux with a GPU or CPU |
| L2 judge | ~260 ms/pass (warm) | ~2.2 GB | macOS / Linux |

Binary **831 KB**. Judge model loads **once**; clean traffic stays cheap. **Nothing
ever leaves the machine.**
Binary **880 KB**. Judge model loads **once**; clean traffic stays cheap.
**Nothing ever leaves the machine.**

## Commands

```
aegis init Write a default config and show what's protected
aegis daemon Watch & protect all agent surfaces
aegis scan <path>... One-shot scan (great for an agent hook / CI)
aegis install-models Download the judge model
aegis status Tail the audit log
aegis init Write default config; show what's protected
aegis daemon Watch & protect all agent surfaces
aegis scan <path>... One-shot scan (agent hook / CI step)
aegis install-hooks Wire AEGIS into Claude Code as a blocking hook
aegis install-models Download the judge model (~1.8 GB)
aegis approve <cmd> Allow a blocked command once (one-time bypass)
aegis revoke <cmd> Cancel a pending one-time approval
aegis status Tail the audit log
aegis targets List protected surfaces
aegis config Show effective configuration
aegis config Show effective configuration
```

## Evaluating

[`tests/held_out_eval/`](tests/held_out_eval) is the honest held-out benchmark
(82/95/4) — 190 labeled files AEGIS never saw during tuning. Start the judge, then
`python3 run_held_out.py`.
[`tests/held_out_eval/`](tests/held_out_eval) is the honest held-out benchmark —
labeled files AEGIS never saw during tuning. Start the judge, then:

```bash
python3 tests/held_out_eval/run_held_out.py
```

The runner exits 1 if precision < 90% or recall < 80%, so it's usable in CI.

## Attack surface coverage

| Vector | Covered by |
|---|---|
| Prompt injection / override keywords | L1 T1 patterns |
| Jailbreak tokens (DAN, developer mode, etc.) | L1 T1 patterns |
| Context-sensitive manipulation (skill/memory/CLAUDE.md) | L1 T2 patterns |
| Infrastructure impersonation ("pre-approved", "no confirmation") | L1 T2 + judge pass 2 |
| Middle-buried payloads | Sliding-window scan |
| Obfuscated payloads (base64, hex, rot13, zero-width, homoglyphs) | Decode passes |
| Credential exfiltration (source + sink co-occurrence) | Credential taint |
| Injected bash commands (Claude Code) | `PreToolUse` hook |
| Injected content in tool results / web fetches | `PostToolUse` hook |
| MCP tool description injection | `PostToolUse/mcp__*` hook |
| Inbound overlay messages (OpenClaw, PicoClaw) | `scan-pipe` in TypeScript/Node dispatch |
| Pre-LLM messages (Hermes) | `pre_llm_call` Python plugin |
| Generic CI / pipeline content | `scan-pipe` stdin |

## License

Expand Down
Loading
Loading