@@ -58,14 +58,23 @@ Same eval slice (`training/FROZEN_EVAL_SCORED.jsonl`, 791 attacks /
5858at its published default threshold and at a cross-model neutral
5959` 0.5 ` .
6060
61- | Model | recall@default | FPR@default | recall@0.5 | FPR@0.5 |
61+ Header arrows show the direction of merit (recall higher = better,
62+ FPR lower = better). Per cell: ↑ = top-tier on this axis, ↓ =
63+ bottom-tier, blank = mid.
64+
65+ | Model | recall@default ↑ | FPR@default ↓ | recall@0.5 ↑ | FPR@0.5 ↓ |
6266| ---| ---:| ---:| ---:| ---:|
63- | ** promptpurify** | ** 83.94%** | ** 10.61%** | ** 87.10%** | ** 12.88%** |
64- | ProtectAI v2 | 40.71% | 43.18% | 40.71% | 43.18% |
65- | deepset | 97.22% | 59.85% | 97.22% | 59.85% |
66- | fmops | 100.00% | 100.00% | 100.00% | 100.00% |
67- | Meta Prompt-Guard | 67.00% | 88.64% | 67.00% | 88.64% |
68- | Meta Prompt-Guard-2 | 12.77% | 1.52% | 12.77% | 1.52% |
67+ | ** promptpurify** | ** 83.94% ↑** | ** 10.61% ↑** | ** 87.10% ↑** | ** 12.88% ↑** |
68+ | ProtectAI v2 | 40.71% ↓ | 43.18% ↓ | 40.71% ↓ | 43.18% ↓ |
69+ | deepset | 97.22% ↑ | 59.85% ↓ | 97.22% ↑ | 59.85% ↓ |
70+ | fmops | 100.00% ↑ | 100.00% ↓ | 100.00% ↑ | 100.00% ↓ |
71+ | Meta Prompt-Guard | 67.00% | 88.64% ↓ | 67.00% | 88.64% ↓ |
72+ | Meta Prompt-Guard-2 | 12.77% ↓ | 1.52% ↑ | 12.77% ↓ | 1.52% ↑ |
73+
74+ ` promptpurify ` is the only row with ↑ on every column. ` fmops ` "wins"
75+ recall by predicting positive for every input — its FPR ↓ shows it's
76+ mis-calibrated, not skilled. ` Meta Prompt-Guard-2 ` flips the trade:
77+ nearly-zero FPR at the cost of catching ~ 1 in 8 attacks.
6978
7079How to read this:
7180
0 commit comments