Skip to content

Commit 171249d

Browse files
feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019) (#777)
* feat(security): Spec 076 US3 — detect-engine eval corpus + CI recall/FP gate (T017-T019) Make detector reliability a blocking CI number for the offline Spec-076 detect.Engine. T017 — new labeled corpus specs/065-evaluation-foundation/datasets/ detect_corpus_v1.json (32 self-authored entries) carrying the full ToolView fields the structural checks need (server, tool name/description/schema, cross-server peers). Categories map to detect checks: unicode_smuggling, decoded_payload, shadowing (US1, gated today) plus capability_mismatch (US2, reported but not yet gated) and attack-resembling hard-negatives. Validated by detect_corpus_test.go (coherent labels, redistributable provenance, per-category coverage). README documents the file + counts. T018 — `scan-eval --gate --min-recall --max-fp` runs detect.Engine over the corpus, prints per-category recall/precision/FP/F1 JSON, and exits non-zero on a breach. A category is only enforced when its check is registered, so future checks (capability.mismatch) begin gating automatically with no corpus change. T019 — blocking step in the eval.yml security-d2 job: `scan-eval --gate --min-recall 0.90 --max-fp 0.05` (pure Go, offline, runs first so a detector regression fails fast). TDD: gate_test.go (incl. a committed-corpus regression anchor) written first. Committed corpus passes at recall 1.0 (16/16 gated), FP 0/14. Related #MCP-3579 * fix(security): gate FP rate on hard-negatives only (Spec 076 SC-002) CodexReviewer re-review of #777: the gated false-positive rate computed its denominator over every non-malicious entry (benign + hard_negative). SC-002 (spec.md:48,52,114) requires the ≤5% FP threshold to be measured on the hard-negative set specifically — otherwise adding clean-benign corpus entries dilutes the rate and the gate can pass while hard-negatives regress. - fp_rate denominator = hard_negative entries only (the gated SC-002 metric). - Report benign_total / benign_false_positives separately for transparency (SC-003 still expects zero FP across benign + hard-negatives), but only the hard-negative fp_rate feeds the gate decision. - Precision now uses all-benign FPs; recall accounting unchanged. - Guard: a corpus with zero hard-negatives fails the gate as vacuous (mirrors the zero-gated-malicious guard) rather than silently passing the FP side. - New test TestGateFP_HardNegativeDenominatorOnly proves benign-corpus growth does not move the gated fp_rate (old code would dilute 1/3 -> 1/23). - README documents the hard-negative denominator. Committed corpus still passes: recall 1.0 (16/16 gated), fp_rate 0/9 hard-negs. Related #MCP-3579 Co-Authored-By: Paperclip <noreply@paperclip.ing> * feat(security): per-category precision/FP/F1 in scan-eval gate (T018) CodexReviewer re-review of #777: T018 (tasks.md:75) requires `scan-eval --gate` to print per-category recall/precision/FP/F1, but categoryMetric only carried recall (precision/FP/F1 existed only as overall metrics). - categoryMetric now carries hard_negatives, false_positives, fp_rate, precision, and f1 per category, populated in the gate computation and JSON. - Per-category FP is attributed via a new `resembles` field on hard_negative corpus entries (the attack class a benign mimics — the SC-003 framing): a flagged hard-negative lowers its resembled category's precision. Clean-benign entries carry no `resembles` and affect only the overall benign FP count. - detect_corpus_v1.json: every hard_negative now declares `resembles` (consistent with its hn_<class> id); validator asserts it is set, names a gated category, and matches the id prefix. - Extracted an f1() helper; overall F1 reuses it. - Tests: TestGateMetrics_PerCategoryShapeAndFPAttribution proves the per-category JSON exposes recall/precision/FP/F1 and that a resembling hard-negative FP drops that category's precision (1 TP + 1 FP -> precision 0.5); TestEvaluateGateCorpus asserts per-category recall/precision/f1 = 1.0. Committed corpus: recall 1.0 (16/16 gated), fp_rate 0/9; every gated category reports recall/precision/f1 = 1.0, FP 0. Related #MCP-3579 Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
1 parent 86634b3 commit 171249d

7 files changed

Lines changed: 1405 additions & 0 deletions

File tree

.github/workflows/eval.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,17 @@ jobs:
5858
go-version: "1.25"
5959
cache: true
6060

61+
# Spec 076 / US3 (FR-013, SC-006): pure-Go, offline regression gate over the
62+
# labeled detect corpus. It runs the production detect.Engine and fails the
63+
# build if malicious recall drops below 0.90 or the hard-negative
64+
# false-positive rate climbs above 0.05. No mcp-eval/Python needed — runs
65+
# first so a detector regression fails fast.
66+
- name: Run Spec-076 detect-engine gate (offline, blocking)
67+
run: |
68+
go run ./cmd/scan-eval \
69+
--corpus specs/065-evaluation-foundation/datasets/detect_corpus_v1.json \
70+
--gate --min-recall 0.90 --max-fp 0.05
71+
6172
- name: Checkout mcp-eval (public, pinned)
6273
uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
6374
with:

cmd/scan-eval/gate.go

Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
package main
2+
3+
import (
4+
"encoding/json"
5+
"fmt"
6+
"io"
7+
"os"
8+
"sort"
9+
10+
"github.com/smart-mcp-proxy/mcpproxy-go/internal/security/detect"
11+
"github.com/smart-mcp-proxy/mcpproxy-go/internal/security/detect/checks"
12+
)
13+
14+
// exitGateBreach is returned when --gate fails its recall/FP thresholds. It is
15+
// distinct from config (4) / write (1) so CI can tell a real regression from a
16+
// tooling error. Any non-zero value fails the CI step (FR-013, SC-006).
17+
const exitGateBreach = 6
18+
19+
// gateTool is the minimal projection of a tool the detect engine needs.
20+
type gateTool struct {
21+
Name string `json:"name"`
22+
Description string `json:"description"`
23+
InputSchema json.RawMessage `json:"input_schema,omitempty"`
24+
OutputSchema json.RawMessage `json:"output_schema,omitempty"`
25+
}
26+
27+
// gatePeer is another server's tool supplied as cross-server context so the
28+
// shadowing check can fire (it only emits when a collision/reference points at a
29+
// DIFFERENT server). Non-shadowing entries leave Peers empty.
30+
type gatePeer struct {
31+
Server string `json:"server"`
32+
Tool gateTool `json:"tool"`
33+
}
34+
35+
// gateEntry is one labeled sample: a tool, its owning server, optional peers,
36+
// the ground-truth label/category, and redistributable provenance.
37+
type gateEntry struct {
38+
ID string `json:"id"`
39+
Label string `json:"label"` // "malicious" | "benign"
40+
Category string `json:"category"` // detect taxonomy or benign|hard_negative
41+
// Resembles names the attack class a hard_negative mimics (e.g.
42+
// "unicode_smuggling"), so a false positive on it counts toward that
43+
// category's precision/FP (SC-003). Empty for clean-benign entries.
44+
Resembles string `json:"resembles,omitempty"`
45+
Server string `json:"server"`
46+
Tool gateTool `json:"tool"`
47+
Peers []gatePeer `json:"peers,omitempty"`
48+
Provenance struct {
49+
Source string `json:"source"`
50+
License string `json:"license"`
51+
} `json:"provenance"`
52+
}
53+
54+
// gateCorpus is the Spec-076 detect-engine labeled evaluation corpus.
55+
type gateCorpus struct {
56+
Version string `json:"version"`
57+
Description string `json:"description"`
58+
Entries []gateEntry `json:"entries"`
59+
}
60+
61+
// categoryCheck maps each malicious category to the detect Check ID expected to
62+
// catch it. A category is only enforced by the gate when its check is actually
63+
// registered (see gateChecks) — so categories whose checks land in a later user
64+
// story are measured and reported but never fail the build prematurely. Add the
65+
// mapping when a new check is registered so the gate begins enforcing it.
66+
var categoryCheck = map[string]string{
67+
"unicode_smuggling": "unicode.hidden",
68+
"decoded_payload": "payload.decoded",
69+
"shadowing": "shadowing.cross_server",
70+
"capability_mismatch": "capability.mismatch", // US2 (T016) — not yet registered
71+
}
72+
73+
// gateChecks is the canonical set of detect checks the gate runs. It MUST mirror
74+
// the checks registered in the live scanner (internal/security/scanner/
75+
// inprocess.go); when a soft check (US2) or any new check is registered there,
76+
// add it here too so the gate measures the same detector the product ships.
77+
func gateChecks() []detect.Check {
78+
return []detect.Check{
79+
&checks.UnicodeHidden{},
80+
&checks.Shadowing{},
81+
&checks.PayloadDecoded{},
82+
}
83+
}
84+
85+
// categoryMetric is one category's per-run scorecard (T018: per-category
86+
// recall/precision/FP/F1). Precision and FP are attributed via hard-negatives
87+
// that resemble this category (SC-003); a category with no resembling
88+
// hard-negatives reports zero FP.
89+
type categoryMetric struct {
90+
Category string `json:"category"`
91+
Gated bool `json:"gated"` // is this category's check registered?
92+
Malicious int `json:"malicious"` // malicious samples in this category
93+
Detected int `json:"detected"` // malicious samples the engine flagged (TP)
94+
Recall float64 `json:"recall"`
95+
HardNegatives int `json:"hard_negatives"` // resembling hard-negatives
96+
FalsePositives int `json:"false_positives"` // resembling hard-negatives flagged (FP)
97+
FPRate float64 `json:"fp_rate"`
98+
Precision float64 `json:"precision"` // TP / (TP + FP)
99+
F1 float64 `json:"f1"`
100+
}
101+
102+
// gateMetrics is the full metrics report emitted for the CI log.
103+
type gateMetrics struct {
104+
Corpus string `json:"corpus_version"`
105+
Checks []string `json:"checks"`
106+
Categories []categoryMetric `json:"categories"`
107+
GatedMalicious int `json:"gated_malicious"`
108+
GatedDetected int `json:"gated_detected"`
109+
OverallRecall float64 `json:"overall_recall"`
110+
// FP rate is gated over the HARD-NEGATIVE set only (Spec 076 SC-002): clean
111+
// benign entries must not dilute it, or growing the corpus could mask a
112+
// hard-negative regression. BenignTotal/BenignFalsePositives are reported for
113+
// transparency (SC-003 expects zero FP across benign + hard-negatives), but
114+
// only FPRate (hard-negative) feeds the gate decision.
115+
HardNegatives int `json:"hard_negatives"`
116+
HardNegFalsePositives int `json:"hard_negative_false_positives"`
117+
FPRate float64 `json:"fp_rate"` // hard-neg FP / hard-neg total (SC-002, gated)
118+
BenignTotal int `json:"benign_total"`
119+
BenignFalsePositives int `json:"benign_false_positives"`
120+
Precision float64 `json:"precision"`
121+
F1 float64 `json:"f1"`
122+
}
123+
124+
// evaluateGateCorpus runs the detect engine over every entry and tallies recall
125+
// (over categories whose checks are registered), the false-positive rate over
126+
// the HARD-NEGATIVE set (Spec 076 SC-002), precision, and F1. Each entry is
127+
// scanned in a RegistryView of its own tool plus its declared peers, so
128+
// shadowing fires deterministically and entries never cross-contaminate.
129+
func evaluateGateCorpus(c *gateCorpus, checkList []detect.Check) gateMetrics {
130+
engine := detect.NewEngine(detect.Options{Checks: checkList})
131+
132+
registered := make(map[string]struct{}, len(checkList))
133+
for _, ch := range checkList {
134+
registered[ch.ID()] = struct{}{}
135+
}
136+
gatedCategory := func(cat string) bool {
137+
id, ok := categoryCheck[cat]
138+
if !ok {
139+
return false
140+
}
141+
_, reg := registered[id]
142+
return reg
143+
}
144+
145+
type catTally struct {
146+
gated bool
147+
malicious, flagged int
148+
hardNeg, hardNegFP int
149+
}
150+
cats := map[string]*catTally{}
151+
order := []string{}
152+
getCat := func(cat string) *catTally {
153+
ct := cats[cat]
154+
if ct == nil {
155+
ct = &catTally{gated: gatedCategory(cat)}
156+
cats[cat] = ct
157+
order = append(order, cat)
158+
}
159+
return ct
160+
}
161+
162+
var gatedMal, gatedDet, truePos int
163+
var benignTotal, benignFP, hardNegTotal, hardNegFP int
164+
165+
for i := range c.Entries {
166+
e := c.Entries[i]
167+
flagged := scanEntryFlagged(engine, e)
168+
169+
switch e.Label {
170+
case "malicious":
171+
ct := getCat(e.Category)
172+
ct.malicious++
173+
if flagged {
174+
ct.flagged++
175+
}
176+
if ct.gated {
177+
gatedMal++
178+
if flagged {
179+
gatedDet++
180+
truePos++
181+
}
182+
}
183+
default: // benign / hard_negative
184+
benignTotal++
185+
if flagged {
186+
benignFP++
187+
}
188+
// SC-002 gates the FP rate on the hard-negative set specifically;
189+
// SC-003 attributes each hard-negative FP to the attack class it
190+
// resembles for the per-category precision/FP.
191+
if e.Category == "hard_negative" {
192+
hardNegTotal++
193+
if flagged {
194+
hardNegFP++
195+
}
196+
if e.Resembles != "" {
197+
ct := getCat(e.Resembles)
198+
ct.hardNeg++
199+
if flagged {
200+
ct.hardNegFP++
201+
}
202+
}
203+
}
204+
}
205+
}
206+
207+
m := gateMetrics{
208+
Corpus: c.Version,
209+
Checks: sortedCheckIDs(checkList),
210+
GatedMalicious: gatedMal,
211+
GatedDetected: gatedDet,
212+
HardNegatives: hardNegTotal,
213+
HardNegFalsePositives: hardNegFP,
214+
BenignTotal: benignTotal,
215+
BenignFalsePositives: benignFP,
216+
}
217+
for _, cat := range order {
218+
ct := cats[cat]
219+
recall := ratio(ct.flagged, ct.malicious)
220+
precision := ratio(ct.flagged, ct.flagged+ct.hardNegFP)
221+
m.Categories = append(m.Categories, categoryMetric{
222+
Category: cat,
223+
Gated: ct.gated,
224+
Malicious: ct.malicious,
225+
Detected: ct.flagged,
226+
Recall: recall,
227+
HardNegatives: ct.hardNeg,
228+
FalsePositives: ct.hardNegFP,
229+
FPRate: ratio(ct.hardNegFP, ct.hardNeg),
230+
Precision: precision,
231+
F1: f1(precision, recall),
232+
})
233+
}
234+
m.OverallRecall = ratio(gatedDet, gatedMal)
235+
m.FPRate = ratio(hardNegFP, hardNegTotal)
236+
m.Precision = ratio(truePos, truePos+benignFP)
237+
m.F1 = f1(m.Precision, m.OverallRecall)
238+
return m
239+
}
240+
241+
// scanEntryFlagged builds the entry's RegistryView (its tool + peers), scans it,
242+
// and reports whether the engine produced any finding for the entry's own tool.
243+
func scanEntryFlagged(engine *detect.Engine, e gateEntry) bool {
244+
views := []detect.ToolView{toGateView(e.Server, e.Tool)}
245+
for _, p := range e.Peers {
246+
views = append(views, toGateView(p.Server, p.Tool))
247+
}
248+
res := engine.Scan(detect.NewRegistryView(views))
249+
want := e.Server + ":" + e.Tool.Name
250+
for _, f := range res.Findings {
251+
if f.Location == want {
252+
return true
253+
}
254+
}
255+
return false
256+
}
257+
258+
func toGateView(server string, t gateTool) detect.ToolView {
259+
return detect.ToolView{
260+
Server: server,
261+
Name: t.Name,
262+
Description: t.Description,
263+
InputSchema: t.InputSchema,
264+
OutputSchema: t.OutputSchema,
265+
}
266+
}
267+
268+
// decide applies the gate thresholds. It returns ok=false plus a human-readable
269+
// reason per breached metric.
270+
func (m gateMetrics) decide(minRecall, maxFP float64) (ok bool, reasons []string) {
271+
if m.OverallRecall < minRecall {
272+
reasons = append(reasons, fmt.Sprintf("recall %.4f < min-recall %.4f", m.OverallRecall, minRecall))
273+
}
274+
if m.FPRate > maxFP {
275+
reasons = append(reasons, fmt.Sprintf("false-positive rate %.4f > max-fp %.4f", m.FPRate, maxFP))
276+
}
277+
return len(reasons) == 0, reasons
278+
}
279+
280+
// runGate evaluates the corpus, prints the metrics JSON, and returns the process
281+
// exit code: exitOK on pass, exitGateBreach on a recall/FP breach.
282+
func runGate(c *gateCorpus, minRecall, maxFP float64, stdout, stderr io.Writer) int {
283+
m := evaluateGateCorpus(c, gateChecks())
284+
285+
out, err := json.MarshalIndent(m, "", " ")
286+
if err != nil {
287+
fmt.Fprintf(stderr, "error: marshaling gate metrics: %v\n", err)
288+
return exitWriteError
289+
}
290+
fmt.Fprintln(stdout, string(out))
291+
292+
if m.GatedMalicious == 0 {
293+
fmt.Fprintln(stderr, "error: no malicious samples in a gated category — the gate would be vacuous")
294+
return exitConfigError
295+
}
296+
if m.HardNegatives == 0 {
297+
fmt.Fprintln(stderr, "error: no hard-negative samples — the FP gate (SC-002) would be vacuous")
298+
return exitConfigError
299+
}
300+
301+
ok, reasons := m.decide(minRecall, maxFP)
302+
if !ok {
303+
for _, r := range reasons {
304+
fmt.Fprintf(stderr, "GATE FAILED: %s\n", r)
305+
}
306+
return exitGateBreach
307+
}
308+
fmt.Fprintf(stderr, "GATE PASSED: recall=%.4f (>=%.4f), fp=%.4f (<=%.4f)\n", m.OverallRecall, minRecall, m.FPRate, maxFP)
309+
return exitOK
310+
}
311+
312+
// loadGateCorpus reads and decodes the detect-engine eval corpus.
313+
func loadGateCorpus(path string) (*gateCorpus, error) {
314+
data, err := os.ReadFile(path)
315+
if err != nil {
316+
return nil, fmt.Errorf("reading gate corpus %q: %w", path, err)
317+
}
318+
var c gateCorpus
319+
if err := json.Unmarshal(data, &c); err != nil {
320+
return nil, fmt.Errorf("parsing gate corpus %q: %w", path, err)
321+
}
322+
if len(c.Entries) == 0 {
323+
return nil, fmt.Errorf("gate corpus %q has no entries", path)
324+
}
325+
return &c, nil
326+
}
327+
328+
func sortedCheckIDs(checkList []detect.Check) []string {
329+
ids := make([]string, 0, len(checkList))
330+
for _, ch := range checkList {
331+
ids = append(ids, ch.ID())
332+
}
333+
sort.Strings(ids)
334+
return ids
335+
}
336+
337+
// ratio is n/d with a 0 guard (an empty denominator yields 0, not NaN).
338+
func ratio(n, d int) float64 {
339+
if d == 0 {
340+
return 0
341+
}
342+
return float64(n) / float64(d)
343+
}
344+
345+
// f1 is the harmonic mean of precision and recall (0 when both are 0).
346+
func f1(precision, recall float64) float64 {
347+
if precision+recall == 0 {
348+
return 0
349+
}
350+
return 2 * precision * recall / (precision + recall)
351+
}

0 commit comments

Comments
 (0)