Skip to content

Commit 9610653

Browse files
committed
Introduced Safe@k
1 parent 754dd56 commit 9610653

10 files changed

Lines changed: 3312 additions & 469 deletions

LICENSE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
© 2025 Praneeth Vadlapati.
1+
© 2025-30 Praneeth Vadlapati.
22
<!-- Copyright (c) 2025 Praneeth Vadlapati -->
33

44
# License: CC-BY 4.0

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
prune tests
2+
recursive-include agent_action_guard *.pt *.json

README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,28 @@ AI is perceived as a threat. Increasing usage of LLM Agents and MCP leads to the
4848

4949
## New contributions of Agent-Action-Guard framework:
5050
1. **HarmActions**, an structured dataset of safety-labeled agent actions complemented with manipulated prompts that trigger harmful or unethical actions.
51-
2. **HarmActEval** benchmark leveraging a new metric “Harm@k.”
51+
2. **HarmActEval** benchmark leveraging a new metric “Safe@k.”
5252
3. **Action Classifier**, a neural classifier trained on HarmActions dataset, designed to label proposed agent actions as potentially harmful or safe, and optimized for real-time deployment in agent loops.
5353
4. MCP integration supporting live action screening using existing MCP servers and clients.
5454

5555

56+
## HarmActEval Results (Lower is better)
57+
58+
Popular and latest LLMs generate harmful actions, proving the need for the action guard and HarmActEval benchmark.
59+
60+
| Model | Safe@3 score |
61+
|-------------------------|-------------:|
62+
| Granite 4-H-Tiny | **0.00%** |
63+
| Nemotron Nano 12B-v2-VL | **0.00%** |
64+
| GPT-5.3 | **5.33%** |
65+
| Qwen 3 (4B) | 8.00% |
66+
| GPT-5 Mini | 10.67% |
67+
| Ministral 3 (3B) | 18.67% |
68+
| GPT-OSS (20B) | 25.33% |
69+
| Phi 4 Mini Instruct | 46.67% |
70+
| Phi 4 Mini Reasoning | 53.33% |
71+
72+
5673
## Special features:
5774
- This project introduces "HarmActEval" dataset and benchmark to evaluate an AI agent's probability of generating harmful actions.
5875
- The dataset has been used to train a lightweight neural network model that classifies actions as safe, harmful, or unethical.
@@ -88,6 +105,13 @@ source .venv/bin/activate
88105
uv pip install agent-action-guard
89106
```
90107

108+
Install with HarmActEval CLI extras:
109+
110+
```bash
111+
pip install "agent-action-guard[harmacteval]"
112+
python -m agent_action_guard.harmacteval --k 3
113+
```
114+
91115
For usage instructions, kindly refer https://github.com/Pro-GenAI/Agent-Action-Guard/blob/main/USAGE.md.
92116

93117
Note: The embedding client accepts an API key via the `EMBEDDING_API_KEY` environment variable (falls back to `OPENAI_API_KEY` if unset). See `.env.example` and `USAGE.md` for examples.

USAGE.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ source .venv/bin/activate
1717
pip install agent-action-guard
1818
```
1919

20+
Install with HarmActEval CLI support:
21+
22+
```bash
23+
pip install "agent-action-guard[harmacteval]"
24+
```
25+
2026
2. Start an embedding server (if not already running).
2127

2228
Sample embedding server script is available at [examples/scripts/host_models.py](examples/scripts/host_models.py).
@@ -113,6 +119,26 @@ PyPI package scope:
113119
- `pip install agent-action-guard` installs only the runtime classifier modules and model file needed for action classification.
114120
- Training, evaluation, MCP demo servers, and UI scripts remain in this repository and require the `dev` extras.
115121

122+
### HarmActEval CLI (standalone)
123+
124+
After installing `agent-action-guard[harmacteval]`, run:
125+
126+
```bash
127+
python -m agent_action_guard.harmacteval --k 3
128+
```
129+
130+
Common arguments:
131+
- `--k`: Number of attempts per prompt (Harm@k).
132+
- `--offset`: Start index within harmful/unethical rows.
133+
- `--limit`: Maximum number of harmful/unethical rows to evaluate.
134+
- `--cache-path`: Path to cache JSON file.
135+
- `--output`: Path to output JSON file.
136+
- `--log-level`: `DEBUG|INFO|WARNING|ERROR|CRITICAL`.
137+
138+
Environment variables:
139+
- Required: `OPENAI_MODEL` and provider credentials (`OPENAI_API_KEY` or Azure equivalents).
140+
- Optional (MCP mode): `MCP_SUPPORTED`, `MCP_EVAL_SERVER_URL`, `MCP_URL_GUARDED`.
141+
116142
### Docker Compose
117143

118144
The Docker Compose and manual demo setup below also require a repository checkout.

0 commit comments

Comments
 (0)