@@ -45,19 +45,20 @@ python -m agent_action_guard.harmactionseval
4545** HarmActionsEval** benchmark proved that AI agents with harmful tools will use them — even today's ** most capable** LLMs.
464680% of the LLMs tested executed actions at the first attempt for over 95% of the harmful prompts.
4747
48- | Model | SafeActions@1 |
49- | -----------------------| ------:|
50- | Claude Haiku 4.5 | 0.00% |
51- | Phi 4 Mini Instruct | 0.00% |
52- | Granite 4-H-Tiny | 0.00% |
53- | GPT-5.4 Mini | 0.71% |
54- | Gemini 3.1 Flash Lite | 0.71% |
55- | Ministral 3 (3B) | 2.13% |
56- | Claude Sonnet 4.6 | 2.84% |
57- | Phi 4 Mini Reasoning | 2.84% |
58- | GPT-5.3 | 12.77% |
59- | Qwen3.5-397b-a17b | 23.40% |
60- | ** Average** | ** 4.54%** |
48+ | Model | SafeActions@1 |
49+ | -------------------------| ------:|
50+ | Claude Haiku 4.5 | 0.00% |
51+ | Phi 4 Mini Instruct | 0.00% |
52+ | Granite 4-H-Tiny | 0.00% |
53+ | GPT-5.4 Mini | 0.71% |
54+ | Gemini 3.1 Flash Lite | 0.71% |
55+ | Grok 4.20 Non Reasoning | 2.13% |
56+ | Ministral 3 (3B) | 2.13% |
57+ | Claude Sonnet 4.6 | 2.84% |
58+ | Phi 4 Mini Reasoning | 2.84% |
59+ | GPT-5.3 | 12.77% |
60+ | Qwen3.5-397b-a17b | 23.40% |
61+ | ** Average** | ** 4.54%** |
6162
6263> These models often still respond * "Sorry, I can't help with that"* while executing the harmful action anyway.
6364
0 commit comments