Skip to content

Commit 039d288

Browse files
committed
docs(security): document the deterministic tool-scanner detect engine (Spec 076 T022)
Adds docs/features/tool-scanner.md covering the offline detect engine behind the built-in tpa-descriptions scanner: - the six checks (unicode.hidden / shadowing.cross_server / payload.decoded — hard tier; directive.imperative / capability.mismatch / secret.embedded — soft tier) - the two-tier model (hard auto-quarantines; soft severity = distinct soft-check count 1->low/2->medium/3+->high; consensus adds to confidence/risk score) - the eval gate (scan-eval --gate --min-recall 0.90 --max-fp 0.05, exit 6 on breach) and its blocking CI wiring in .github/workflows/eval.yml - the offline / no-egress guarantee (no I/O, deterministic, recover-isolated) - normalization rules (raw-text hidden-Unicode + secrets, normalized phrases) Also expands the tpa-descriptions row in security-scanner-plugins.md to point at the new page, links it from Related reading, registers it in the docs sidebar, and checks off T013-T019 + T022 in the Spec 076 tasks checklist. Docs-only change (exempt from TDD per CLAUDE.md). No code touched. Related: Spec 076 (specs/076-deterministic-tool-scanner)
1 parent 9702260 commit 039d288

4 files changed

Lines changed: 270 additions & 9 deletions

File tree

docs/features/security-scanner-plugins.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ MCPProxy ships with a bundled registry of 8 scanners. The bundled list lives in
118118
| `nova-proximity` | MCPProxy (NOVA-inspired rules) | source || Keyword-based, fully offline. Very fast. |
119119
| `ramparts` | Javelin | source || Rust-based YARA scanner. Runs fully offline: v0.8.x scans a live MCP endpoint, so MCPProxy replays the captured tool definitions to it over stdio (the upstream is never re-executed). *(`amd64`-only image; runs under emulation on arm64 — see [Scanner Images](/features/scanner-images).)* |
120120
| `semgrep-mcp` | Semgrep | source || Static analysis with MCP-specific rules. Uses the upstream `returntocorp/semgrep:latest` image. |
121-
| `tpa-descriptions` | MCPProxy | source || **Built-in, Docker-less, always on.** In-process analysis of tool descriptions/schemas for Tool-Poisoning-Attack indicators (hidden instructions, prompt-injection phrasing, data-exfiltration hints) and embedded secrets. Also runs the deterministic offline detection engine (Spec 076): hidden-Unicode smuggling (zero-width/bidi/tag-block/PUA), cross-server tool shadowing, and base64/hex payloads that decode to shell/exfil commands — each finding carries a `confidence` score and the contributing check `signals`. Runs for any connected server — including remote `http`/`sse` servers with no source or Docker. |
121+
| `tpa-descriptions` | MCPProxy | source || **Built-in, Docker-less, always on.** In-process analysis of tool descriptions/schemas via the deterministic offline [detect engine (Spec 076)](/features/tool-scanner): six checks across two tiers — **hard** (hidden-Unicode smuggling, cross-server shadowing, decode-to-shell payloads) auto-quarantine; **soft** (prompt-injection directives, capability-mismatch, embedded secrets) raise a review item. Each finding carries a `confidence` score and the contributing check `signals`. Fully offline (no network/filesystem/Docker), deterministic, and runs for any connected server — including remote `http`/`sse` servers with no source or Docker. See [Tool Scanner](/features/tool-scanner) for the full rule reference and the CI eval gate. |
122122
| `trivy-mcp` | Aqua Security | source, container_image || Filesystem + CVE scan. Uses the upstream `ghcr.io/aquasecurity/trivy:latest` image. |
123123

124124
See [Scanner Images](/features/scanner-images) for the image sources and why vendor images are preferred over custom wrappers.
@@ -343,6 +343,7 @@ The Security page at `/security` in the Web UI mirrors the CLI and provides:
343343

344344
## Related reading
345345

346+
- [Tool Scanner (Spec 076)](/features/tool-scanner) — the built-in offline detect engine behind `tpa-descriptions`: the six checks, two-tier model, and CI eval gate
346347
- [Security Commands](/cli/security-commands) — exhaustive CLI reference
347348
- [Scanner Images](/features/scanner-images) — where each Docker image comes from
348349
- [Security Quarantine](/features/security-quarantine) — the underlying quarantine mechanism that scanners gate

docs/features/tool-scanner.md

Lines changed: 259 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,259 @@
1+
---
2+
id: tool-scanner
3+
title: Deterministic Tool Scanner (Spec 076)
4+
sidebar_label: Tool Scanner (detect engine)
5+
description: The offline, deterministic in-process detection engine that scans MCP tool definitions for hidden-Unicode smuggling, cross-server shadowing, decoded shell payloads, prompt-injection directives, capability mismatch, and embedded secrets.
6+
keywords: [security, tool-poisoning, prompt-injection, unicode-smuggling, shadowing, detection, offline, deterministic, quarantine, mcp]
7+
---
8+
9+
# Deterministic Tool Scanner (Spec 076)
10+
11+
The **detect engine** (`internal/security/detect/`) is the deterministic, fully-offline
12+
in-process detector that analyzes every upstream tool's definition — name,
13+
description, input schema, and output schema — for tool-poisoning and
14+
prompt-injection attacks. It is what powers the built-in, Docker-less
15+
[`tpa-descriptions` scanner](/features/security-scanner-plugins#scanner-registry),
16+
so it runs for **every connected server**, including remote `http`/`sse`
17+
servers that have no source code or Docker container to scan.
18+
19+
> This page documents the detection rules themselves. For the scanner plugin
20+
> framework that hosts them (SARIF orchestration, the Docker-based scanners, the
21+
> approval workflow), see [Security Scanner Plugins](/features/security-scanner-plugins).
22+
> For the per-tool hash-based approval that quarantine decisions feed into, see
23+
> [Tool Quarantine (Spec 032)](/features/tool-quarantine).
24+
25+
## Offline / no-egress guarantee
26+
27+
The detect engine performs **no I/O of any kind**. It imports no networking
28+
(`net`, `net/http`), no process execution (`os/exec`), no filesystem access
29+
(`os`), and no HTTP or Docker client. Detection runs purely over the in-memory
30+
tool definitions the caller supplies. This is not a convention — it is enforced
31+
by a standing import-guard test (`internal/security/detect/imports_test.go`)
32+
that fails the build if any forbidden import is added (FR-001).
33+
34+
Three properties hold by construction:
35+
36+
- **Offline** — no network, filesystem, Docker, external API, or LLM is ever
37+
consulted. Safe to run in air-gapped deployments.
38+
- **Deterministic** — identical input yields byte-identical output, including
39+
the ordering of findings and signals. No maps are iterated for output
40+
ordering; no clocks or randomness are consulted.
41+
- **Total** — every check runs under `recover()`. A check that panics or errors
42+
is isolated, counted as degraded coverage, and never aborts the scan. A
43+
degraded scan still returns the findings from every other check (the same way
44+
the external scanner pipeline surfaces `scanners_failed`).
45+
46+
## The two-tier model
47+
48+
Each check emits zero or more **signals**, and every signal carries a **tier**:
49+
50+
| Tier | What it means | Effect on the tool |
51+
|------|---------------|--------------------|
52+
| **Hard** | A structural attack that essentially never appears in a legitimate tool definition (near-zero false positive). | **Auto-quarantines** the affected tool/server. |
53+
| **Soft** | A phrased or heuristic indicator that *can* appear in benign tooling (e.g. a security tool that legitimately mentions attack strings). | **Raises the tool for human review only** — never auto-quarantines on its own. |
54+
55+
The per-tool aggregation combines all of a tool's signals into a single
56+
finding (`internal/security/detect/aggregate.go`):
57+
58+
- **Any hard signal → dangerous.** The tool is quarantined regardless of what
59+
else fired (FR-004).
60+
- **Soft-only severity is driven by the count of _distinct_ checks that fired**
61+
(FR-005): `1 → low`, `2 → medium`, `3+ → high`. A single soft signal is a
62+
low-severity review item; three independent soft checks agreeing on the same
63+
tool is high severity.
64+
- **Independent signals add to confidence and risk score** rather than being
65+
deduplicated away (FR-006). When multiple independent checks agree on a tool,
66+
that agreement is visible in the finding's `confidence` and raises the
67+
aggregated risk score, instead of collapsing to one entry keyed on
68+
`(rule_id + location)`.
69+
- **Every finding exposes its `confidence` value and the list of contributing
70+
check IDs** (`signals`), so an operator can see *why* a tool was flagged and
71+
how strongly (FR-010). These surface in the CLI report (`Confidence:` /
72+
`Signals:` lines) and in the REST scan report JSON.
73+
74+
### Normalization (FR-007)
75+
76+
Phrase-matching checks (directive, capability, embedded-secret position logic)
77+
run over a **normalized** form of the text: Unicode-normalized (NFKC),
78+
zero-width / format-rune stripped, lowercased, whitespace-collapsed, and lightly
79+
stemmed. Normalization defeats trivial wording variants — `don't disclose` and
80+
`do not tell the user` collapse to the same matchable form (SC-004).
81+
82+
Crucially, the **hidden-Unicode check runs on the RAW text _before_
83+
normalization** — normalization strips exactly the invisible characters that
84+
check exists to detect, so running it on normalized text would hide the attack.
85+
The embedded-secret check likewise scans **raw** text, because secrets are
86+
case-sensitive and exact (lowercasing would fold the very bytes the matchers
87+
key on, e.g. `AKIA…` prefixes).
88+
89+
## The six checks
90+
91+
Three **hard** structural checks and three **soft** heuristic checks.
92+
93+
### Hard tier
94+
95+
#### `unicode.hidden` — hidden-Unicode smuggling
96+
97+
Flags invisible / format-control runes smuggled into a tool's **raw**
98+
description or schema text: zero-width joiners/spaces, bidirectional controls,
99+
Unicode TAG-block characters, and Private-Use-Area code points. These never
100+
appear in a legitimate human-readable tool description, so a hit is near-zero
101+
false-positive.
102+
103+
**Escalation:** a description carrying **≥3 distinct hidden classes**, or
104+
TAG-block characters that **decode to a printable ASCII message**, is rated
105+
near-certain (critical); a single class is still hard but high.
106+
107+
#### `shadowing.cross_server` — cross-server tool impersonation
108+
109+
Flags two cross-server attack shapes, using the read-only registry snapshot of
110+
all servers' tools:
111+
112+
1. **Name collision** — a *distinctive* tool name exposed by two different
113+
servers (one impersonating the other so an agent calls the wrong one).
114+
2. **Cross-server reference** — a tool whose description names a *distinctive*
115+
tool that lives on a different server (steering the agent's tool selection).
116+
117+
To hold near-zero FP, both shapes require the name to be **distinctive**:
118+
generic verbs (`search`, `get`, `list`) collide across servers all the time and
119+
are never flagged. A tool referencing its **own** name is also ignored.
120+
121+
#### `payload.decoded` — decode-then-confirm shell payload
122+
123+
Decodes base64/hex blobs embedded in a description or schema and flags **only
124+
when the decoded bytes are a shell/exfiltration command**`curl … | sh`,
125+
`wget … | sh`, `chmod`, `rm -rf`, a pipe-to-shell, or a raw `IP:port`
126+
reverse-shell target (FR-008). Benign encoded data (an icon, a JSON config)
127+
decodes to non-matching/non-printable bytes and is never flagged. The
128+
**evidence presents the decoded content**, so an operator sees exactly what was
129+
hidden — not the encoded string.
130+
131+
### Soft tier
132+
133+
#### `directive.imperative` — prompt-injection directives
134+
135+
Flags prompt-injection directives smuggled into a description: hidden-instruction
136+
tags (`<IMPORTANT>…`), secrecy imperatives ("do not tell the user"), instruction
137+
overrides ("ignore previous instructions"), and tool-preamble injections
138+
("before using this tool, first …"). Runs over **normalized** text.
139+
140+
Each hit is **position-classified** (FR-009): a phrase that is quoted or
141+
illustrated — *"detects prompts such as 'ignore previous instructions'"* — is
142+
example-position and discounted below the emit threshold, so legitimate security
143+
tooling that merely *describes* these phrases is not flagged. The same phrase in
144+
imperative position ("before using this tool, read ~/.ssh/id_rsa") retains full
145+
confidence. This is the core false-positive control for legitimate security
146+
documentation.
147+
148+
#### `capability.mismatch` — declared-vs-implied capability gap
149+
150+
Flags a gap between what a tool *declares* it does and what it *implies* it
151+
touches:
152+
153+
- **Declared-vs-implied** — a tool whose declared purpose is pure computation or
154+
string manipulation (name/lead sentence like `add`, `to_uppercase`) that
155+
nevertheless references a sensitive resource it has no business touching
156+
(`~/.ssh`, `/etc/passwd`, an external URL, a shell). A calculator reading
157+
`id_rsa` is a classic exfiltration tell.
158+
- **Unexplained data-sink param** — a free-form input named like an
159+
exfiltration channel (`sidenote`, `scratchpad`) that the description never
160+
explains — the model is steered to stuff stolen data into it.
161+
162+
The declared category is taken from the tool **name and its leading sentence**,
163+
not the full description, so an attacker's benign cover sentence still anchors
164+
the declaration while the smuggled access in the rest of the text is treated as
165+
implied. Tools that legitimately declare file/network/system access are
166+
therefore **not** flagged for touching those resources.
167+
168+
#### `secret.embedded` — hardcoded live credential
169+
170+
Flags a live credential hardcoded into a description or schema — an AWS key, a
171+
private key, a database password, a Luhn-valid card, etc. It wraps the shared
172+
`internal/security/patterns/` matchers (the same set used by
173+
[sensitive-data detection](/features/sensitive-data-detection)) and carries each
174+
match's **per-match confidence**: a validated card / live cloud key is high; a
175+
documented placeholder (`AKIA…EXAMPLE`) collapses to near-zero and is dropped.
176+
Scans **raw** text (secrets are case-sensitive). Being soft, a hit raises a
177+
review item rather than auto-quarantining — an embedded secret may be a careless
178+
example as easily as a planted one.
179+
180+
### At a glance
181+
182+
| Check ID | Tier | Catches |
183+
|----------|------|---------|
184+
| `unicode.hidden` | hard | Zero-width / bidi / TAG-block / PUA character smuggling (raw text) |
185+
| `shadowing.cross_server` | hard | Distinctive tool name collision or cross-server reference |
186+
| `payload.decoded` | hard | base64/hex blob that decodes to a shell/exfil command |
187+
| `directive.imperative` | soft | Injection directives, secrecy imperatives, instruction overrides (normalized, position-discounted) |
188+
| `capability.mismatch` | soft | Compute/string tool touching `~/.ssh` etc.; unexplained data-sink param |
189+
| `secret.embedded` | soft | Hardcoded live credential (confidence-scored, placeholders dropped) |
190+
191+
## The eval gate (CI-enforced reliability)
192+
193+
Reliability is enforced as a number the build checks, so the detector cannot
194+
silently regress (the original keyword detector drifted to ~10% recall
195+
unnoticed). A labeled corpus runs as a **blocking CI gate**:
196+
197+
```bash
198+
go run ./cmd/scan-eval \
199+
--corpus specs/065-evaluation-foundation/datasets/detect_corpus_v1.json \
200+
--gate --min-recall 0.90 --max-fp 0.05
201+
```
202+
203+
- **Recall ≥ 0.90** on malicious entries and **false-positive rate ≤ 0.05** on
204+
the **hard-negative** set (benign tools that deliberately resemble attacks).
205+
Clean-benign entries are reported for transparency but do **not** dilute the
206+
gated FP rate — only the hard-negative FP rate feeds the gate decision
207+
(SC-002).
208+
- On a breach the command prints a `GATE FAILED: …` reason and exits with code
209+
**6** (distinct from config/write errors so CI can tell a real regression
210+
from a tooling fault). On success it prints `GATE PASSED: …` and exits `0`.
211+
- It always prints a per-category recall/precision/FP/F1 JSON scorecard to
212+
stdout for the CI log.
213+
214+
**CI wiring:** the gate runs as a blocking step in the `security-d2` job of
215+
[`.github/workflows/eval.yml`](https://github.com/smart-mcp-proxy/mcpproxy-go/blob/main/.github/workflows/eval.yml).
216+
The job is pure Go + Python with no live upstreams, so it is fast and
217+
hermetic (FR-013, SC-006).
218+
219+
### Corpus and category gating
220+
221+
The labeled corpus lives at
222+
`specs/065-evaluation-foundation/datasets/detect_corpus_v1.json` (separate from
223+
the immutable `security_corpus_v1.json`; it carries the server/tool/schema/peers
224+
context the detect engine needs). Each entry is labeled `malicious` or
225+
`benign`, tagged with a category (e.g. `unicode_smuggling`, `decoded_payload`,
226+
`shadowing`, `capability_mismatch`), and hard-negatives record which attack
227+
class they `resemble` so a false positive is attributed to that category.
228+
229+
A category is only **enforced** by the gate when its corresponding check is
230+
registered in the gate's check list (`gateChecks()` in `cmd/scan-eval/gate.go`).
231+
This is a forward-compatibility mechanism: a category whose check is not yet in
232+
the gate list is **measured and reported but never fails the build
233+
prematurely**. When a new check is wired into the gate list, the gate begins
234+
enforcing its category.
235+
236+
## How it plugs in (unchanged entry points)
237+
238+
The detect engine is invoked from `internal/security/scanner/inprocess.go`,
239+
which projects the connected servers' parsed tool definitions into a
240+
`RegistryView` and renders each `detect.Finding` 1:1 into the existing
241+
`ScanFinding` type (additively carrying `Confidence` and `Signals`). Because the
242+
finding shape is preserved, all existing entry points keep working unchanged
243+
(FR-015):
244+
245+
- CLI `mcpproxy security scan <server>`
246+
- REST `POST /api/v1/servers/{name}/scan`
247+
- the `quarantine_security` MCP tool
248+
249+
It reuses — rather than rebuilds — the Spec-032 quarantine hashing, the
250+
quarantine state machine, the aggregated-report types, and the
251+
`internal/security/patterns/` secret matchers (FR-012).
252+
253+
## Related reading
254+
255+
- [Security Scanner Plugins](/features/security-scanner-plugins) — the plugin framework hosting the `tpa-descriptions` scanner
256+
- [Security Quarantine](/features/security-quarantine) — the quarantine mechanism hard-tier findings drive
257+
- [Tool Quarantine (Spec 032)](/features/tool-quarantine) — per-tool hash-based approval
258+
- [Sensitive-Data Detection](/features/sensitive-data-detection) — the shared secret matchers the embedded-secret check reuses
259+
- Spec: `specs/076-deterministic-tool-scanner/spec.md` · engine contract: `internal/security/detect/doc.go`

0 commit comments

Comments
 (0)