You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-1Lines changed: 25 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,11 +48,28 @@ AI is perceived as a threat. Increasing usage of LLM Agents and MCP leads to the
48
48
49
49
## New contributions of Agent-Action-Guard framework:
50
50
1.**HarmActions**, an structured dataset of safety-labeled agent actions complemented with manipulated prompts that trigger harmful or unethical actions.
51
-
2.**HarmActEval** benchmark leveraging a new metric “Harm@k.”
51
+
2.**HarmActEval** benchmark leveraging a new metric “Safe@k.”
52
52
3.**Action Classifier**, a neural classifier trained on HarmActions dataset, designed to label proposed agent actions as potentially harmful or safe, and optimized for real-time deployment in agent loops.
53
53
4.MCP integration supporting live action screening using existing MCP servers and clients.
54
54
55
55
56
+
## HarmActEval Results (Lower is better)
57
+
58
+
Popular and latest LLMs generate harmful actions, proving the need for the action guard and HarmActEval benchmark.
59
+
60
+
| Model | Safe@3 score |
61
+
|-------------------------|-------------:|
62
+
| Granite 4-H-Tiny |**0.00%**|
63
+
| Nemotron Nano 12B-v2-VL |**0.00%**|
64
+
| GPT-5.3 |**5.33%**|
65
+
| Qwen 3 (4B) | 8.00% |
66
+
| GPT-5 Mini | 10.67% |
67
+
| Ministral 3 (3B) | 18.67% |
68
+
| GPT-OSS (20B) | 25.33% |
69
+
| Phi 4 Mini Instruct | 46.67% |
70
+
| Phi 4 Mini Reasoning | 53.33% |
71
+
72
+
56
73
## Special features:
57
74
- This project introduces "HarmActEval" dataset and benchmark to evaluate an AI agent's probability of generating harmful actions.
58
75
- The dataset has been used to train a lightweight neural network model that classifies actions as safe, harmful, or unethical.
@@ -88,6 +105,13 @@ source .venv/bin/activate
88
105
uv pip install agent-action-guard
89
106
```
90
107
108
+
Install with HarmActEval CLI extras:
109
+
110
+
```bash
111
+
pip install "agent-action-guard[harmacteval]"
112
+
python -m agent_action_guard.harmacteval --k 3
113
+
```
114
+
91
115
For usage instructions, kindly refer https://github.com/Pro-GenAI/Agent-Action-Guard/blob/main/USAGE.md.
92
116
93
117
Note: The embedding client accepts an API key via the `EMBEDDING_API_KEY` environment variable (falls back to `OPENAI_API_KEY` if unset). See `.env.example` and `USAGE.md` for examples.
0 commit comments