Commit b1406a8
feat(security): per-category precision/FP/F1 in scan-eval gate (T018)
CodexReviewer re-review of #777: T018 (tasks.md:75) requires `scan-eval --gate`
to print per-category recall/precision/FP/F1, but categoryMetric only carried
recall (precision/FP/F1 existed only as overall metrics).
- categoryMetric now carries hard_negatives, false_positives, fp_rate,
precision, and f1 per category, populated in the gate computation and JSON.
- Per-category FP is attributed via a new `resembles` field on hard_negative
corpus entries (the attack class a benign mimics — the SC-003 framing): a
flagged hard-negative lowers its resembled category's precision. Clean-benign
entries carry no `resembles` and affect only the overall benign FP count.
- detect_corpus_v1.json: every hard_negative now declares `resembles`
(consistent with its hn_<class> id); validator asserts it is set, names a
gated category, and matches the id prefix.
- Extracted an f1() helper; overall F1 reuses it.
- Tests: TestGateMetrics_PerCategoryShapeAndFPAttribution proves the
per-category JSON exposes recall/precision/FP/F1 and that a resembling
hard-negative FP drops that category's precision (1 TP + 1 FP -> precision
0.5); TestEvaluateGateCorpus asserts per-category recall/precision/f1 = 1.0.
Committed corpus: recall 1.0 (16/16 gated), fp_rate 0/9; every gated category
reports recall/precision/f1 = 1.0, FP 0.
Related #MCP-3579
Co-Authored-By: Paperclip <noreply@paperclip.ing>1 parent f8cc0a4 commit b1406a8
4 files changed
Lines changed: 140 additions & 30 deletions
File tree
- cmd/scan-eval
- specs/065-evaluation-foundation/datasets
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
39 | | - | |
40 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
41 | 45 | | |
42 | 46 | | |
43 | 47 | | |
| |||
78 | 82 | | |
79 | 83 | | |
80 | 84 | | |
81 | | - | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
82 | 89 | | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
88 | 100 | | |
89 | 101 | | |
90 | 102 | | |
| |||
133 | 145 | | |
134 | 146 | | |
135 | 147 | | |
| 148 | + | |
136 | 149 | | |
137 | 150 | | |
138 | 151 | | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
139 | 161 | | |
140 | 162 | | |
141 | 163 | | |
| |||
146 | 168 | | |
147 | 169 | | |
148 | 170 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
| 171 | + | |
155 | 172 | | |
156 | 173 | | |
157 | 174 | | |
| |||
168 | 185 | | |
169 | 186 | | |
170 | 187 | | |
171 | | - | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
172 | 191 | | |
173 | 192 | | |
174 | 193 | | |
175 | 194 | | |
176 | 195 | | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
177 | 203 | | |
178 | 204 | | |
179 | 205 | | |
| |||
190 | 216 | | |
191 | 217 | | |
192 | 218 | | |
| 219 | + | |
| 220 | + | |
193 | 221 | | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
199 | 232 | | |
200 | 233 | | |
201 | 234 | | |
202 | 235 | | |
203 | 236 | | |
204 | | - | |
205 | | - | |
206 | | - | |
| 237 | + | |
207 | 238 | | |
208 | 239 | | |
209 | 240 | | |
| |||
310 | 341 | | |
311 | 342 | | |
312 | 343 | | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
86 | 94 | | |
87 | 95 | | |
88 | 96 | | |
| |||
162 | 170 | | |
163 | 171 | | |
164 | 172 | | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
165 | 218 | | |
166 | 219 | | |
167 | 220 | | |
| |||
Lines changed: 13 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
150 | 151 | | |
151 | 152 | | |
152 | 153 | | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
157 | 160 | | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
158 | 167 | | |
159 | 168 | | |
160 | 169 | | |
| |||
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
294 | 294 | | |
295 | 295 | | |
296 | 296 | | |
| 297 | + | |
297 | 298 | | |
298 | 299 | | |
299 | 300 | | |
| |||
308 | 309 | | |
309 | 310 | | |
310 | 311 | | |
| 312 | + | |
311 | 313 | | |
312 | 314 | | |
313 | 315 | | |
| |||
322 | 324 | | |
323 | 325 | | |
324 | 326 | | |
| 327 | + | |
325 | 328 | | |
326 | 329 | | |
327 | 330 | | |
| |||
336 | 339 | | |
337 | 340 | | |
338 | 341 | | |
| 342 | + | |
339 | 343 | | |
340 | 344 | | |
341 | 345 | | |
| |||
350 | 354 | | |
351 | 355 | | |
352 | 356 | | |
| 357 | + | |
353 | 358 | | |
354 | 359 | | |
355 | 360 | | |
| |||
364 | 369 | | |
365 | 370 | | |
366 | 371 | | |
| 372 | + | |
367 | 373 | | |
368 | 374 | | |
369 | 375 | | |
| |||
378 | 384 | | |
379 | 385 | | |
380 | 386 | | |
| 387 | + | |
381 | 388 | | |
382 | 389 | | |
383 | 390 | | |
| |||
401 | 408 | | |
402 | 409 | | |
403 | 410 | | |
| 411 | + | |
404 | 412 | | |
405 | 413 | | |
406 | 414 | | |
| |||
424 | 432 | | |
425 | 433 | | |
426 | 434 | | |
| 435 | + | |
427 | 436 | | |
428 | 437 | | |
429 | 438 | | |
| |||
0 commit comments