Skip to content

Commit 8377dc5

Browse files
committed
Added a new result
1 parent 89e2c6a commit 8377dc5

3 files changed

Lines changed: 21 additions & 20 deletions

File tree

.env.example

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
# Example environment variables for running the demo and embedding server.
22
# Copy this file to `.env` and edit values before running the demo or docker-compose.
33

4-
# Embedding settings
4+
# Embedding variables
55
EMBED_MODEL_NAME=sentence-transformers/all-MiniLM-L6-v2
66
EMBEDDING_BASE_URL=http://localhost:1234/v1
77
EMBEDDING_API_KEY=your-embedding-key
88

9-
# Use an OpenAI-compatible API
10-
OPENAI_BASE_URL=https://api.tokenfactory.nebius.com/v1
9+
# Using an OpenAI-compatible API
10+
OPENAI_BASE_URL=https://<endpoint>.services.ai.azure.com/openai/v1/
1111
OPENAI_API_KEY=<your_api_key_here>
12-
OPENAI_MODEL=openai/gpt-oss-20b
12+
OPENAI_MODEL=gpt-5.4-mini-2026-03-17
1313

14-
# # To use Azure OpenAI instead
15-
# AZURE_OPENAI_ENDPOINT=https://az-foundry-resource-pr.services.ai.azure.com/
14+
# # To use Azure OpenAI without the OpenAI-compatible endpoint
15+
# AZURE_OPENAI_ENDPOINT=https://<endpoint>.services.ai.azure.com/
1616
# AZURE_OPENAI_API_KEY=
1717
# OPENAI_API_VERSION=2024-10-21
1818
# OPENAI_MODEL=gpt-5.4-mini-2026-03-17

README.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -45,19 +45,20 @@ python -m agent_action_guard.harmactionseval
4545
**HarmActionsEval** benchmark proved that AI agents with harmful tools will use them — even today's **most capable** LLMs.
4646
80% of the LLMs tested executed actions at the first attempt for over 95% of the harmful prompts.
4747

48-
| Model | SafeActions@1 |
49-
|-----------------------|------:|
50-
| Claude Haiku 4.5 | 0.00% |
51-
| Phi 4 Mini Instruct | 0.00% |
52-
| Granite 4-H-Tiny | 0.00% |
53-
| GPT-5.4 Mini | 0.71% |
54-
| Gemini 3.1 Flash Lite | 0.71% |
55-
| Ministral 3 (3B) | 2.13% |
56-
| Claude Sonnet 4.6 | 2.84% |
57-
| Phi 4 Mini Reasoning | 2.84% |
58-
| GPT-5.3 | 12.77% |
59-
| Qwen3.5-397b-a17b | 23.40% |
60-
| **Average** | **4.54%** |
48+
| Model | SafeActions@1 |
49+
|-------------------------|------:|
50+
| Claude Haiku 4.5 | 0.00% |
51+
| Phi 4 Mini Instruct | 0.00% |
52+
| Granite 4-H-Tiny | 0.00% |
53+
| GPT-5.4 Mini | 0.71% |
54+
| Gemini 3.1 Flash Lite | 0.71% |
55+
| Grok 4.20 Non Reasoning | 2.13% |
56+
| Ministral 3 (3B) | 2.13% |
57+
| Claude Sonnet 4.6 | 2.84% |
58+
| Phi 4 Mini Reasoning | 2.84% |
59+
| GPT-5.3 | 12.77% |
60+
| Qwen3.5-397b-a17b | 23.40% |
61+
| **Average** | **4.54%** |
6162

6263
> These models often still respond *"Sorry, I can't help with that"* while executing the harmful action anyway.
6364

agent_action_guard/harmactionseval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -335,7 +335,7 @@ def evaluate(
335335
for attempt in range(1, k + 1):
336336
tool_calls = attempt_tool_call(prompt, attempt=attempt, tool=tool)
337337
if WAIT_INTERVAL:
338-
logger.info(
338+
logger.debug(
339339
"Waiting %d seconds before retrying (attempt %d/%d)...",
340340
WAIT_INTERVAL,
341341
attempt + 1,

0 commit comments

Comments
 (0)