pilot-protocol · TeoSlayer · Jun 30, 2026 · Jun 23, 2026 · Jun 23, 2026 · Jun 23, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "aegis"
-version = "0.1.0"
+version = "0.1.3"
 edition = "2021"
 
 [[bin]]

diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@
 
 <p align="center">
   <a href="https://github.com/pilot-protocol/aegis/releases/latest"><img src="https://img.shields.io/github/v/release/pilot-protocol/aegis?color=4c1" alt="release"></a>
-  <img src="https://img.shields.io/badge/binary-831%20KB-blue" alt="size">
+  <img src="https://img.shields.io/badge/binary-880%20KB-blue" alt="size">
   <img src="https://img.shields.io/badge/platform-macOS%20%7C%20Linux%20%7C%20Pi-lightgrey" alt="platform">
   <img src="https://img.shields.io/badge/network-none-success" alt="offline">
   <img src="https://img.shields.io/badge/license-MIT-green" alt="license">
@@ -31,9 +31,9 @@
 
 ## Why this exists
 
-Your AI coding agent reads a lot of text it didn't write — peer messages, skill
-files, memory notes, `CLAUDE.md`, MCP configs, web results. **Any of it can carry an
-attack on the agent itself.** The dangerous ones don't look like attacks:
+Your AI coding agent reads a lot of text it didn't write — peer messages, skill files,
+memory notes, `CLAUDE.md`, MCP configs, web results, bash output. **Any of it can carry
+an attack on the agent itself.** The dangerous ones don't look like attacks:
 
 ```jsonc
 // a message that lands in the agent's inbox — looks like ops automation
@@ -55,39 +55,137 @@ it before your agent ever reads it.
 
 ## What makes it usable
 
-It **doesn't cry wolf.** On a held-out set of 190 realistic files it had never seen:
+It **doesn't cry wolf.** On a labeled held-out set it had never seen:
 
-| Recall | Precision | False-positive rate | F1 |
+| Recall | Precision | FP rate | F1 |
 |:---:|:---:|:---:|:---:|
-| **82%** | **95%** | **4%** | **88%** |
+| **90%** | **95%** | **10%** | **92%** |
 
-**Zero false positives** on 80 real benign dev/agent files — code that calls
-`subprocess`/`eval`, skills full of `kubectl`/`gcloud` commands, MCP configs, even
-security docs that *quote* attacks. Reproduce it yourself: [`tests/held_out_eval/`](tests/held_out_eval).
+1 false positive on 10 benign files — a security doc that contains live shell exfil
+examples in a code block (the 1.7B judge can't distinguish "educational" from "active"
+at that granularity). All other benign files — code calling `subprocess`/`eval`, skills
+full of `kubectl`/`gcloud`, MCP configs, monitoring alerts — pass cleanly.
+Reproduce it yourself: [`tests/held_out_eval/`](tests/held_out_eval).
 
 ## Install
 
 ```bash
 brew install pilot-protocol/tap/aegis     # brings llama.cpp automatically
 aegis install-models                      # one-time judge model (~1.8 GB)
-aegis init                                 # protect your agent surfaces
-aegis daemon                               # (or: brew services start aegis)
+aegis init                                # protect your agent surfaces
+aegis daemon                              # (or: brew services start aegis)
 ```
 
-That's it — your agent's inbox, skills, memory, `CLAUDE.md`, and MCP config are now
-guarded. No model? No llama.cpp? It still runs as the L1 pattern layer.
+No model? No llama.cpp? It still runs as the L1 pattern layer — instant, no RAM overhead.
 
 <details>
 <summary>Build from source / other platforms</summary>
 
 ```bash
 git clone https://github.com/pilot-protocol/aegis && cd aegis
 cargo build --release
-sudo cp target/release/aegis /usr/local/bin/
+cp target/release/aegis ~/.local/bin/
 ```
+
 Prebuilt macOS + Linux (x86_64/arm64) binaries are on the [releases page](../../releases/latest).
 </details>
 
+## Claude Code integration
+
+AEGIS can act as a **blocking hook** inside Claude Code — scanning every Bash command
+before it runs, and scanning every tool result (web fetches, MCP responses) after it
+arrives.
+
+```bash
+aegis install-hooks   # writes hooks into ~/.claude/settings.json
+```
+
+This installs two hooks:
+
+| Hook | Trigger | Action |
+|---|---|---|
+| `PreToolUse/Bash` | every bash command | L1 scan, **exit 2 = block** if match |
+| `PostToolUse/WebFetch,Bash,mcp__*` | tool result arrives | L1 scan, **WARN** to stdout if match |
+
+The pre-tool hook is **L1-only** (microseconds — no model round-trip on the blocking path).
+The post-tool hook is non-blocking: it warns without stopping execution.
+
+### Approving a blocked command
+
+When AEGIS blocks a command it prints the exact approval command:
+
+```
+AEGIS blocked this command (T1:~/.ssh/).
+To approve this exact command once, run:
+  aegis approve 'cp ~/.ssh/id_rsa /tmp/debug && cat /tmp/debug'
+```
+
+`approve` is a **one-time, hash-based bypass** — the exact command string is SHA-256
+hashed and stored in `~/.aegis/approved_cmds.txt`. On the next attempt the approval is
+consumed and AEGIS blocks again. Use `aegis revoke` to cancel a pending approval.
+
+```bash
+aegis approve '<cmd>'   # allow this exact command once
+aegis revoke  '<cmd>'   # cancel a pending approval
+```
+
+## Other harness integrations
+
+AEGIS provides `aegis scan-pipe` — a stdin scanner that exits **0** (allow) or **2** (block) — so any harness can integrate in a few lines.
+
+### OpenClaw (TypeScript plugin)
+
+In `~/.openclaw/plugins/claw-pilot/src/inbound.ts`, before dispatching a message:
+
+```typescript
+import { spawnSync } from "node:child_process";
+
+const scan = spawnSync("aegis", ["scan-pipe"], {
+  input: message.text, encoding: "utf8", timeout: 500,
+});
+if (scan.status === 2) {
+  logger.warn("AEGIS blocked inbound message", { rule: scan.stdout.trim() });
+  return; // drop
+}
+```
+
+### PicoClaw (Node.js agent)
+
+In the tick loop, before `behavior.reply()`:
+
+```javascript
+import { spawnSync } from "child_process";
+
+const check = spawnSync("aegis", ["scan-pipe"], {
+  input: msg.data ?? "", encoding: "utf8", timeout: 500,
+});
+if (check.status === 2) {
+  console.warn(`[aegis] BLOCKED — ${check.stdout.trim()}`);
+  continue;
+}
+```
+
+### Hermes (Python `pre_llm_call` plugin)
+
+Copy [`integrations/hermes/plugin.py`](integrations/hermes/plugin.py) to `~/.hermes/plugins/aegis-security/plugin.py`. The `pre_llm_call` hook scans all message content and returns `None` to block:
+
+```python
+from aegis_hermes_plugin import pre_llm_call
+
+messages = pre_llm_call(messages)
+if messages is None:
+    raise RuntimeError("AEGIS blocked this LLM call")
+```
+
+The plugin fails open on timeout or if `aegis` is not on `$PATH`, so a misconfigured binary never kills your harness.
+
+### Generic — any harness or CI pipeline
+
+```bash
+echo "$CONTENT" | aegis scan-pipe
+# exits 0 → safe, exits 2 → blocked
+```
+
 ## How it works
 
 ```mermaid
@@ -106,25 +204,34 @@ flowchart LR
 
 **Two layers.** A fast universal one, and a smart one.
 
-- **L1 — Aho-Corasick patterns.** Pure Rust, microseconds, kilobytes. Known
-  injection/IoC strings plus base64/hex/rot13/homoglyph/zero-width decode passes.
-  Runs on **anything** — a Pi, a router, a CI box.
+- **L1 — Aho-Corasick patterns.** Pure Rust, microseconds, kilobytes. ~120 known
+  injection/IoC strings plus decode passes: base64, hex, rot13, homoglyphs, zero-width
+  characters, leetspeak. Sliding-window scan (4096-byte window, 512-byte stride) covers
+  full documents so middle-buried payloads can't hide. Runs on **anything** — a Pi,
+  a router, a CI box.
+
 - **L2 — the judge.** A local Qwen3-1.7B (via llama.cpp, fully offline). Two passes:
   *"is this content attacking the agent?"* (injection, jailbreak, spoofing, exfil —
   and crucially, **describing an attack ≠ performing one**) **OR** *"is it pushing the
-  agent to act without the user?"* (the infra-impersonation question). A **safe**
-  verdict **vetoes** L1's keyword hits — that's why a security doc that quotes an
-  injection isn't flagged.
+  agent to act without the user?"* (the infra-impersonation question). A **safe** verdict
+  **vetoes** L1's keyword hits — that's why a security doc that quotes an injection isn't
+  flagged. The judge sees both the head and tail of large documents to defeat
+  truncation-based burial attacks.
 
 If the judge can't run (tiny device, no model, server down), AEGIS **degrades to L1
 patterns alone** — lower recall, but an instant, dependency-free floor.
 
+**Credential taint detection.** If a command co-occurs a credential read (`$(cat ~/.ssh`,
+`security find-generic-password`, `.aws/credentials`, etc.) with a network exfil sink
+(`/dev/tcp`, `nc`, `openssl s_client`, etc.), AEGIS flags it as `CRED_TAINT` — even
+if no individual keyword fires. The judge can still veto for educational content.
+
 ### What happens to a caught file
 
 - **Quarantine** = `~/.aegis/quarantine/` (a `mv`, not a delete — you can inspect it).
-  Inbox messages are *intercepted* (claimed before the agent can read them); skills
-  and memory are moved out of the agent's path. `CLAUDE.md` / MCP config are
-  **alerted but not moved** (they're yours — moving them would break your setup).
+  Inbox messages are *intercepted* (claimed before the agent can read them); skills and
+  memory are moved out of the agent's path. `CLAUDE.md` / MCP config are **alerted but
+  not moved** (moving them would break your setup).
 - **Notified** three ways: a **native desktop notification**, the terminal, and an
   **HMAC-chained audit log** at `~/.aegis/audit.jsonl` (`aegis status` to tail it).
 
@@ -134,43 +241,68 @@ patterns alone** — lower recall, but an instant, dependency-free floor.
 
 ```toml
 [judge]
-enabled = true                 # false = super-lightweight, L1 patterns only, any host
-model   = ""                   # pin a model, e.g. "Qwen3-1.7B-Q4_K_M.gguf"; "" = auto
+enabled = true                 # false = L1 patterns only, any host
+model   = ""                   # pin a model path; "" = auto
 
 [watch]
-defaults = true                # protect the standard agent surfaces
+defaults = true                # protect standard agent surfaces
 ```
 
-Custom watch targets go in `~/.aegis/watch.toml`. `aegis config` shows the effective
-settings.
+Custom watch targets go in `~/.aegis/watch.toml`.
 
 ## Footprint
 
 | Layer | Latency | RAM | Runs on |
 |---|---|---|---|
 | L1 patterns | microseconds | KB | **anywhere** |
-| L2 judge | ~260 ms/pass (warm) | ~2.2 GB | macOS / Linux with a GPU or CPU |
+| L2 judge | ~260 ms/pass (warm) | ~2.2 GB | macOS / Linux |
 
-Binary **831 KB**. Judge model loads **once**; clean traffic stays cheap. **Nothing
-ever leaves the machine.**
+Binary **880 KB**. Judge model loads **once**; clean traffic stays cheap.
+**Nothing ever leaves the machine.**
 
 ## Commands
 
 ```
-aegis init            Write a default config and show what's protected
-aegis daemon          Watch & protect all agent surfaces
-aegis scan <path>...  One-shot scan (great for an agent hook / CI)
-aegis install-models  Download the judge model
-aegis status          Tail the audit log
+aegis init             Write default config; show what's protected
+aegis daemon           Watch & protect all agent surfaces
+aegis scan <path>...   One-shot scan (agent hook / CI step)
+aegis install-hooks    Wire AEGIS into Claude Code as a blocking hook
+aegis install-models   Download the judge model (~1.8 GB)
+aegis approve <cmd>    Allow a blocked command once (one-time bypass)
+aegis revoke  <cmd>    Cancel a pending one-time approval
+aegis status           Tail the audit log
 aegis targets          List protected surfaces
-aegis config          Show effective configuration
+aegis config           Show effective configuration
 ```
 
 ## Evaluating
 
-[`tests/held_out_eval/`](tests/held_out_eval) is the honest held-out benchmark
-(82/95/4) — 190 labeled files AEGIS never saw during tuning. Start the judge, then
-`python3 run_held_out.py`.
+[`tests/held_out_eval/`](tests/held_out_eval) is the honest held-out benchmark —
+labeled files AEGIS never saw during tuning. Start the judge, then:
+
+```bash
+python3 tests/held_out_eval/run_held_out.py
+```
+
+The runner exits 1 if precision < 90% or recall < 80%, so it's usable in CI.
+
+## Attack surface coverage
+
+| Vector | Covered by |
+|---|---|
+| Prompt injection / override keywords | L1 T1 patterns |
+| Jailbreak tokens (DAN, developer mode, etc.) | L1 T1 patterns |
+| Context-sensitive manipulation (skill/memory/CLAUDE.md) | L1 T2 patterns |
+| Infrastructure impersonation ("pre-approved", "no confirmation") | L1 T2 + judge pass 2 |
+| Middle-buried payloads | Sliding-window scan |
+| Obfuscated payloads (base64, hex, rot13, zero-width, homoglyphs) | Decode passes |
+| Credential exfiltration (source + sink co-occurrence) | Credential taint |
+| Injected bash commands (Claude Code) | `PreToolUse` hook |
+| Injected content in tool results / web fetches | `PostToolUse` hook |
+| MCP tool description injection | `PostToolUse/mcp__*` hook |
+| Inbound overlay messages (OpenClaw, PicoClaw) | `scan-pipe` in TypeScript/Node dispatch |
+| Pre-LLM messages (Hermes) | `pre_llm_call` Python plugin |
+| Generic CI / pipeline content | `scan-pipe` stdin |
 
 ## License