@@ -67,14 +67,12 @@ bottom-tier, blank = mid.
6767| ** promptpurify** | ** 83.94% ↑** | ** 10.61% ↑** | ** 87.10% ↑** | ** 12.88% ↑** |
6868| ProtectAI v2 | 40.71% ↓ | 43.18% ↓ | 40.71% ↓ | 43.18% ↓ |
6969| deepset | 97.22% ↑ | 59.85% ↓ | 97.22% ↑ | 59.85% ↓ |
70- | fmops | 100.00% ↑ | 100.00% ↓ | 100.00% ↑ | 100.00% ↓ |
7170| Meta Prompt-Guard | 67.00% | 88.64% ↓ | 67.00% | 88.64% ↓ |
7271| Meta Prompt-Guard-2 | 12.77% ↓ | 1.52% ↑ | 12.77% ↓ | 1.52% ↑ |
7372
74- ` promptpurify ` is the only row with ↑ on every column. ` fmops ` "wins"
75- recall by predicting positive for every input — its FPR ↓ shows it's
76- mis-calibrated, not skilled. ` Meta Prompt-Guard-2 ` flips the trade:
77- nearly-zero FPR at the cost of catching ~ 1 in 8 attacks.
73+ ` promptpurify ` is the only row with ↑ on every column.
74+ ` Meta Prompt-Guard-2 ` flips the trade: nearly-zero FPR at the cost of
75+ catching ~ 1 in 8 attacks.
7876
7977How to read this:
8078
@@ -85,9 +83,6 @@ How to read this:
8583 on this slice. ` deepset ` reaches higher recall but at ~ 6x the FPR
8684 (60% of benigns blocked); for most production traffic that's worse,
8785 not better.
88- - ` fmops ` predicts the positive class for every input on this slice.
89- Treat the row as evidence the model is mis-calibrated for this
90- distribution, not as a real recall claim.
9186- ` Meta Prompt-Guard ` is a 3-class model; we score it as
9287 ` P(INJECTION) + P(JAILBREAK) ` (see ` scripts/bench_oss.py ` ).
9388
0 commit comments