@@ -28,6 +28,8 @@ I focused on making the AI fully autonomous while keeping full transparency thro
2828* 📦 ** Dataset Exporter** : Direct export to Hugging Face fine-tuning format
2929* 🎯 ** 4 Core Tasks** : easy → medium → hard → expert
3030* 🔥 ** 5 Attack Types** : direct, encoded, roleplay, emotional, semantic
31+ * 🔍 ** 6-Engine Core** : Policy, Adversary, Memory, Grader, Environment, and ** De-obfuscation Engine** [ NEW]
32+ * 🛡️ ** Safety Intent Decoding** : Real-time server-side translation of obfuscated queries (Hex/Base64)
3133* 📈 ** Shaped Rewards** : 6-metric reward function (clamped 0.01 – 0.99)
3234* 🔌 ** Standardized API** : Full OpenEnv spec (reset / step / state)
3335* 📊 ** Analytics Hub** : Beautiful interactive dashboard at ` /ui `
@@ -75,7 +77,7 @@ research, and any team building production LLM safety systems.
7577
7678---
7779
78- ## 🏗️ Architecture — 5 Engines
80+ ## 🏗️ Architecture — 6 Engines
7981```
8082┌─────────────────────────────────────────────────────┐
8183│ SafetyForge Arena v3.0 │
@@ -86,13 +88,13 @@ research, and any team building production LLM safety systems.
8688│ 8 rules │ Adaptive │ History tracking │
8789│ Conflict │ Attack Gen │ Risk trajectory │
8890│ detection │ (LiteLLM) │ Escalation patterns │
89- ├──────────────┴ ──────────────┴ ───────────────────────┤
90- │ Grader Engine │
91- │ 6-metric weighted score (0.01 – 0.99) │
92- ├─────────────────────────────────────────────────────┤
93- │ Environment Engine │
94- │ reset() / step() / state() │
95- └────────────────────────────── ───────────────────────┘
91+ ├──────────────┼ ──────────────┼ ───────────────────────┤
92+ │ Grader │ Decoding │ Env │
93+ │ Engine │ Engine │ Engine │
94+ │ │ │ │
95+ │ 6-metric │ Multi-format │ reset() / step() │
96+ │ scoring │ de-obfuscate │ state() │
97+ └──────────────┴ ──────────────┴ ───────────────────────┘
9698```
9799
98100---
0 commit comments