Added AGENTS.md

prane-eth · prane-eth · commit a7076029101c · 2026-03-23T10:32:48.000+05:30
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,67 @@
+# Project: Agent Action Guard
+
+Agent Action Guard classifies proposed AI agent actions as safe or harmful and blocks or flags harmful actions. This repository provides the model, dataset, integration helpers, and example MCP-compatible tooling to enable runtime action screening in agent loops.
+- Repository URL: https://github.com/Pro-GenAI/Agent-Action-Guard
+
+## Why it matters
+
+- Helps prevent autonomous agents from executing harmful, unethical, or risky operations.
+- Provides a reproducible benchmark (AgentHarmBench) and dataset (HarmActions) for evaluating agent safety.
+- Lightweight model for easy integration into MCP or custom agent frameworks.
+
+## Quick Usage (for agents)
+
+1. Install the package (recommended in a venv):
+
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install agent-action-guard
+```
+
+2. Start or configure an embedding server if using vector features (see `USAGE.md`).
+
+3. In your agent runtime, call the convenience API to check actions before execution:
+
+```python
+from agent_action_guard import is_action_harmful
+
+is_harmful, confidence = is_action_harmful(action_dict)
+if is_harmful:
+    # block, log, or escalate
+    raise Exception("Harmful action blocked")
+```
+
+## Key Files & Structure
+
+- `agent_action_guard/` — implementation package (classifier, runtime helpers, dataset loaders).
+- `training/` — training scripts and dataset artifacts used to produce the classifier.
+- `examples/` — sample integrations and MCP server examples.
+- `tests/` — unit tests validating core behavior.
+- `USAGE.md` — detailed usage examples and environment setup.
+- `README.md` — project overview, demos, and citations.
+
+## Architecture Overview
+
+- Input: proposed agent action (structured dict describing tool call, intent, parameters).
+- Preprocessing: optional embedding + metadata normalization.
+- Classifier: lightweight NN (PyTorch / ONNX) outputs harmful/safe logits and confidence.
+- Policy: decision layer in the agent runtime that blocks, allows, or requests human approval.
+
+## Development & CI
+
+- Formatting and linting: `make format` and `make lint` (created through `Makefile`).
+- Tests: run `pytest` (configured by `pytest.ini`) to run test cases in [`tests/`](tests/) directory.
+
+## Guidance for AI agents reading this repo
+
+- Use `USAGE.md` and `examples/` for integration patterns rather than reproducing code.
+- Prefer runtime API `is_action_harmful()` for decision making.
+- Respect model limitations: the classifier is trained on a limited dataset; combine with rule-based checks for high-risk systems.
+
+## Where to look next (quick links)
+
+- Full details and demo: [README.md](README.md)
+- Integration and examples: [USAGE.md](USAGE.md) and `examples/`
+- Implementation: `agent_action_guard/`
+- Training scripts & dataset: `training/`
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,2 @@
+@AGENTS.md
+<!-- Read AGENTS.md for project details -->

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+@AGENTS.md`
	`2`	`+<!-- Read AGENTS.md for project details -->`