Skip to content

Commit ec90d21

Browse files
committed
Add up/down arrows to results table for quick read
1 parent ab172da commit ec90d21

1 file changed

Lines changed: 16 additions & 7 deletions

File tree

docs/BENCHMARKS.md

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,23 @@ Same eval slice (`training/FROZEN_EVAL_SCORED.jsonl`, 791 attacks /
5858
at its published default threshold and at a cross-model neutral
5959
`0.5`.
6060

61-
| Model | recall@default | FPR@default | recall@0.5 | FPR@0.5 |
61+
Header arrows show the direction of merit (recall higher = better,
62+
FPR lower = better). Per cell: ↑ = top-tier on this axis, ↓ =
63+
bottom-tier, blank = mid.
64+
65+
| Model | recall@default ↑ | FPR@default ↓ | recall@0.5 ↑ | FPR@0.5 ↓ |
6266
|---|---:|---:|---:|---:|
63-
| **promptpurify** | **83.94%** | **10.61%** | **87.10%** | **12.88%** |
64-
| ProtectAI v2 | 40.71% | 43.18% | 40.71% | 43.18% |
65-
| deepset | 97.22% | 59.85% | 97.22% | 59.85% |
66-
| fmops | 100.00% | 100.00% | 100.00% | 100.00% |
67-
| Meta Prompt-Guard | 67.00% | 88.64% | 67.00% | 88.64% |
68-
| Meta Prompt-Guard-2 | 12.77% | 1.52% | 12.77% | 1.52% |
67+
| **promptpurify** | **83.94% ↑** | **10.61% ↑** | **87.10% ↑** | **12.88% ↑** |
68+
| ProtectAI v2 | 40.71% ↓ | 43.18% ↓ | 40.71% ↓ | 43.18% ↓ |
69+
| deepset | 97.22% ↑ | 59.85% ↓ | 97.22% ↑ | 59.85% ↓ |
70+
| fmops | 100.00% ↑ | 100.00% ↓ | 100.00% ↑ | 100.00% ↓ |
71+
| Meta Prompt-Guard | 67.00% | 88.64% ↓ | 67.00% | 88.64% ↓ |
72+
| Meta Prompt-Guard-2 | 12.77% ↓ | 1.52% ↑ | 12.77% ↓ | 1.52% ↑ |
73+
74+
`promptpurify` is the only row with ↑ on every column. `fmops` "wins"
75+
recall by predicting positive for every input — its FPR ↓ shows it's
76+
mis-calibrated, not skilled. `Meta Prompt-Guard-2` flips the trade:
77+
nearly-zero FPR at the cost of catching ~1 in 8 attacks.
6978

7079
How to read this:
7180

0 commit comments

Comments
 (0)