@@ -43,8 +43,8 @@ $ ffmpeg -i unused/banner_video.mp4 -vframes 1 project_banner.jpg
4343- π Model being overconfident in its incorrect knowledge.
4444- π§ Lack of proper constraints or guidelines for the agent.
4545- π Inadequate training data for specific scenarios.
46- - π οΈ MCP server providing incorrect tool descriptions that mislead the agent.
47- - π Harmful MCP servers returning manipulative text to mislead the model.
46+ - π οΈ Tools with incorrect descriptions that mislead the agent.
47+ - π Harmful tools descriptions including manipulative text to mislead the model.
4848- π¬ The experiments proved that the model performs a harmful action and still responds "Sorry, I can't help with that."
4949
5050## π New contributions of Agent-Action-Guard framework:
@@ -76,13 +76,13 @@ $ ffmpeg -i unused/banner_video.mp4 -vframes 1 project_banner.jpg
7676## β¨ Special features:
7777- This project introduces "HarmActionsBench" dataset and benchmark to evaluate an AI agent's probability of generating harmful actions.
7878- The dataset has been used to train a lightweight neural network model that classifies actions as safe, harmful, or unethical.
79- - β‘ The model is lightweight and can be easily integrated into existing AI agent frameworks like MCP .
80- - π Supports MCP (Model Context Protocol) to allow real-time action classification.
79+ - β‘ The model is lightweight and can be easily integrated into existing AI agent frameworks.
80+ <!-- - π Supports MCP (Model Context Protocol) to allow real-time action classification. -->
8181<!-- - Unlike OpenAI's `"require_approval": "always"` flag, this blocks harmful actions without human intervention. -->
8282<!-- - π€ A2A-compatible version: https://github.com/Pro-GenAI/A2A-Agent-Action-Guard. -->
8383
8484π‘οΈ ** Safety Features:**
85- - π Automatically classifies MCP tool calls before execution.
85+ - π Automatically classifies tool calls before execution.
8686- π« Blocks harmful actions based on the outputs of the trained model.
8787- π Provides detailed classification results.
8888- β
Allows safe actions to proceed normally.
0 commit comments