Skip to content

Commit a707602

Browse files
committed
Added AGENTS.md
1 parent dcff712 commit a707602

2 files changed

Lines changed: 69 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Project: Agent Action Guard
2+
3+
Agent Action Guard classifies proposed AI agent actions as safe or harmful and blocks or flags harmful actions. This repository provides the model, dataset, integration helpers, and example MCP-compatible tooling to enable runtime action screening in agent loops.
4+
- Repository URL: https://github.com/Pro-GenAI/Agent-Action-Guard
5+
6+
## Why it matters
7+
8+
- Helps prevent autonomous agents from executing harmful, unethical, or risky operations.
9+
- Provides a reproducible benchmark (AgentHarmBench) and dataset (HarmActions) for evaluating agent safety.
10+
- Lightweight model for easy integration into MCP or custom agent frameworks.
11+
12+
## Quick Usage (for agents)
13+
14+
1. Install the package (recommended in a venv):
15+
16+
```bash
17+
python3 -m venv .venv
18+
source .venv/bin/activate
19+
pip install agent-action-guard
20+
```
21+
22+
2. Start or configure an embedding server if using vector features (see `USAGE.md`).
23+
24+
3. In your agent runtime, call the convenience API to check actions before execution:
25+
26+
```python
27+
from agent_action_guard import is_action_harmful
28+
29+
is_harmful, confidence = is_action_harmful(action_dict)
30+
if is_harmful:
31+
# block, log, or escalate
32+
raise Exception("Harmful action blocked")
33+
```
34+
35+
## Key Files & Structure
36+
37+
- `agent_action_guard/` — implementation package (classifier, runtime helpers, dataset loaders).
38+
- `training/` — training scripts and dataset artifacts used to produce the classifier.
39+
- `examples/` — sample integrations and MCP server examples.
40+
- `tests/` — unit tests validating core behavior.
41+
- `USAGE.md` — detailed usage examples and environment setup.
42+
- `README.md` — project overview, demos, and citations.
43+
44+
## Architecture Overview
45+
46+
- Input: proposed agent action (structured dict describing tool call, intent, parameters).
47+
- Preprocessing: optional embedding + metadata normalization.
48+
- Classifier: lightweight NN (PyTorch / ONNX) outputs harmful/safe logits and confidence.
49+
- Policy: decision layer in the agent runtime that blocks, allows, or requests human approval.
50+
51+
## Development & CI
52+
53+
- Formatting and linting: `make format` and `make lint` (created through `Makefile`).
54+
- Tests: run `pytest` (configured by `pytest.ini`) to run test cases in [`tests/`](tests/) directory.
55+
56+
## Guidance for AI agents reading this repo
57+
58+
- Use `USAGE.md` and `examples/` for integration patterns rather than reproducing code.
59+
- Prefer runtime API `is_action_harmful()` for decision making.
60+
- Respect model limitations: the classifier is trained on a limited dataset; combine with rule-based checks for high-risk systems.
61+
62+
## Where to look next (quick links)
63+
64+
- Full details and demo: [README.md](README.md)
65+
- Integration and examples: [USAGE.md](USAGE.md) and `examples/`
66+
- Implementation: `agent_action_guard/`
67+
- Training scripts & dataset: `training/`

CLAUDE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
@AGENTS.md
2+
<!-- Read AGENTS.md for project details -->

0 commit comments

Comments
 (0)