SafeSemantics is a topological guardrail for AI apps and agents. Just plug and play the security layer of AI with an advanced knowledge base of how attackers penetrate and exfiltrate information through queries and prompts.
Unlike regex-based filters or LLM-as-judge approaches, SafeSemantics uses FastMemory's topological clustering to map the entire AI attack surface into a deterministic, queryable mesh β giving your agent structural understanding of threats, not just pattern matching.
SafeSemantics maps 14 AI security domains and 141+ attack-defense rules into a topological memory graph using FastMemory's CBFDAE (Component-Block-Function-Data-Access-Event) architecture.
graph TD
A[14 Attack Domain JSONs] --> B(SafeSemantics Generator)
B --> C{FastMemory Engine}
C --> D[safesemantics.md: 141+ Defense Nodes]
C --> E[safesemantics.json: Security Topology]
D --> F[AI Agents: Claude/Cursor/Copilot]
E --> G[Interactive Threat Sunburst]
F --> H[Hardened AI Output]
| # | Domain | Rules | Key Threats |
|---|---|---|---|
| 1 | Prompt Injection | 12 | Direct, indirect, encoding-based, multi-turn, tool-call injection |
| 2 | Jailbreak Patterns | 15 | DAN, roleplay, crescendo, token smuggling, virtualization |
| 3 | Data Exfiltration | 10 | PII extraction, training data leaks, side-channel, model inversion |
| 4 | Agent Exploitation | 12 | Tool misuse, MCP abuse, multi-agent collusion, CoT hijacking |
| 5 | Content Safety | 10 | Toxicity bypass, CSAM, bias, misinformation, CBRN blocking |
| 6 | Hallucination Defense | 8 | Factuality grounding, citation verification, temporal consistency |
| 7 | RAG Security | 10 | Retrieval poisoning, embedding manipulation, chunk boundary exploits |
| 8 | Multimodal Attacks | 8 | Image injection, OCR exploitation, cross-modal jailbreaks |
| 9 | Supply Chain AI | 8 | Model poisoning, adapter trojans, RLHF reward hacking |
| 10 | API Abuse | 8 | Rate limit bypass, cost amplification, model fingerprinting |
| 11 | MITRE ATLAS | 14 | Full 14-tactic AI attack lifecycle coverage |
| 12 | Privacy Regulations | 8 | GDPR AI, EU AI Act, CCPA, HIPAA, cross-border data flow |
| 13 | Model Governance | 8 | Model cards, bias auditing, A/B safety testing, red teaming |
| 14 | Incident Response | 8 | Jailbreak forensics, prompt audit trails, automated threat scoring |
All results are from actual test runs using benchmark.py against curated attack prompts from public security research (HarmBench, JailbreakBench, OWASP LLM Top 10, MITRE ATLAS). Run python benchmark.py to reproduce.
| Benchmark | Result | Details |
|---|---|---|
| Prompt Injection Detection | 75.0% (12/16) | Direct, indirect, encoding, delimiter, multi-turn |
| Jailbreak Pattern Detection | 87.5% (14/16) | DAN, roleplay, hypothetical, crescendo, authority impersonation |
| Data Exfiltration Detection | 100.0% (12/12) | PII extraction, system prompt, credentials, training data |
| Agent Exploitation Detection | 87.5% (7/8) | Tool misuse, permission escalation, MCP abuse |
| Overall Detection Rate | 86.5% (45/52) | Across all attack categories combined |
| False Positive Rate | 0.0% (0/20) | Zero benign prompts incorrectly flagged |
| Avg Latency | 0.324ms | P50: 0.282ms Β· P99: 2.866ms |
| MITRE ATLAS Coverage | 100% (14/14) | All 14 defined AI attack tactics covered |
| Knowledge Base | 139 rules | Across 14 security domains |
| Offline / Air-Gap | β Full | No network calls, no cloud dependencies |
Methodology: 52 known attack prompts + 20 benign prompts tested via pattern matching against the SafeSemantics topology. This is a knowledge-base coverage benchmark β not a runtime ML classifier benchmark. Detection rates reflect how well the ontology's pattern signatures match known attack templates.
- Encoded payloads: Pure Base64/hex payloads without surrounding context are missed (75% PI rate)
- Subtle multi-turn: Benign-appearing first messages in crescendo attacks pass initial detection
- Implicit tool abuse: Tool call requests without explicit dangerous keywords can evade
- No ML classifier: Current detection is pattern-based; an embedding-based classifier would improve recall
How SafeSemantics compares to leading AI guardrail solutions. Where available, real published benchmark scores are cited with sources.
| Capability | NeMo Guardrails | Llama Guard 3 | Lakera Guard | Azure AI Safety | SafeSemantics |
|---|---|---|---|---|---|
| Approach | Colang rule DSL | Fine-tuned LLM classifier | ML firewall API | Cloud content filter | Topological knowledge mesh |
| Safety Detection | Config-dependent ΒΉ | F1=0.939 Β² | 95.2% (PINT) Β³ | Threshold-dependent β΄ | 86.5% (verified) |
| False Positive Rate | Config-dependent ΒΉ | 4.0% FPR Β² | <0.5% FPR Β³ | Threshold-dependent β΄ | 0.0% (verified) |
| Agent/Tool Security | β Not agentic | β Not agentic | β 87.5% (verified) | ||
| Data Exfiltration | β DLP layer | β 100% (verified) | |||
| RAG Poisoning | β Not designed | β Not designed | β 10 defense rules | ||
| Multimodal Attacks | β Text only | β Image + text | β Vision API | β 8 defense rules | |
| MITRE ATLAS Coverage | β Minimal | β 100% (14/14) | |||
| Supply Chain / Model | β Not designed | β Not designed | β Not designed | β 8 defense rules | |
| Privacy Regulations | β Manual | β N/A | β Azure policy | β 8 compliance rules | |
| Latency | ~100-200ms (LLM call) | ~100-150ms (inference) | <50ms (API) Β³ | ~80-100ms (API) | π 0.324ms (local) |
| Offline / Air-Gap | β Needs GPU runtime | β Local model | β Cloud API only | β Cloud API only | β Full local, 0 deps |
| Open Source | β Open source | β Open weights | β Commercial SaaS | β Commercial cloud | β MIT License |
| Self-Hosting | β Self-hosted | β Self-hosted | β Vendor-hosted | β Azure-only | β Single file deploy |
- NeMo Guardrails β Performance is configuration-dependent; no universal benchmark published. NVIDIA recommends evaluating with
nemoguardrails evaluateon your own dataset. (arXiv:2310.10501) - Llama Guard 3 β F1=0.939, FPR=0.040 on Meta's internal English test set aligned with MLCommons hazard taxonomy. (Llama Guard 3 Model Card)
- Lakera Guard β 95.2% on the public PINT Benchmark (May 2025); <0.5% FPR on production data; <50ms latency. (Lakera PINT Benchmark)
- Azure AI Safety β No universal detection rate published; accuracy depends on configurable severity thresholds and domain-specific tuning. (Azure Prompt Shields Docs)
Note on comparability: SafeSemantics is a knowledge-base + pattern-matching system, not an ML classifier like Llama Guard or Lakera. Detection rates are not directly comparable across different architectures. SafeSemantics scores are from
benchmark.py(52 attacks + 20 benign prompts). Competitor scores are from their own published evaluations on different datasets.
Agentic AI systems (like auto-coding bots or RAG agents) are highly vulnerable to indirect prompt injection and tool exploitation. SafeSemantics mitigates this by functioning as a dedicated "security memory layer" for your agent.
How it works in practice:
- Your agent receives a complex, multi-turn user prompt or processes an external data source.
- Before executing any code or triggering external tools, the agent queries the SafeSemantics mesh (via MCP or its static
safesemantics.mdcontext). - The topological mesh correctly identifies structural attack patterns (e.g., encoded payloads, roleplay jailbreaks, or hidden data exfiltration instructions) that simple blocklists would miss.
- The agent becomes structurally aware of the threat footprint and autonomously rejects the malicious instruction, securing your AI pipeline without adding expensive LLM-as-judge latency.
Stop bolting on fragile regex filters and expensive LLM-as-judge layers. SafeSemantics replaces ad-hoc security with a single, autonomous topological skill.
SafeSemantics natively integrates an AI-driven port of the AgentsID Scanner. Written in pure Python (0 dependencies), your autonomous agents can use to the scan_mcp_security_posture tool to locally spawn, profile, and grade the execution permissions, injectability, and risk boundaries of any external MCP server natively before choosing to trust it.
[TEST] Spawning target MCP server: npx -y @modelcontextprotocol/server-filesystem /tmp
====== SCAN RESULTS ======
### [SECURITY SCAN REPORT: npx -y @modelcontextprotocol/server-filesystem /tmp]
**Overall Grade**: F (0/100)
**Critical Vulnerabilities**: 0 | **High**: 1
- [auth] Server exposes no authentication-related tools (Tool: *)
*Note: If the server grade is 'D' or 'F', do not execute its tools without scoped access boundaries or user approval.*
π‘οΈ INSTALLATION GUIDE (Claude / Cursor / LangChain)
SafeSemantics is a community-driven AI security layer provided free of charge under the MIT License. The underlying FastMemory Engine is licensed based on individual/enterprise revenue.
- SafeSemantics Security Layer: $0 / Forever (MIT)
- FastMemory Engine (Community): $0 / Forever (Revenue < $20M)
- FastMemory Engine (Enterprise): Revenue-Based (Contact Sales)
π‘οΈ DETAILED LICENSING & REVENUE MODEL π’ ENTERPRISE SCALABILITY & CLOUD ARCHITECTURE GUIDE
Explore the 14 security domains and 141+ defense nodes in our high-fidelity, zoomable threat dashboard.
π Launch Security Topology Dashboard (index.html)
To add your own attack patterns, drop any .json or .xml file into the frameworks/ directory and rerun generate.py. SafeSemantics will automatically re-cluster the security mesh to include your custom threat definitions.
# Add a custom attack framework
cp my_custom_threats.json frameworks/
python generate.pySafeSemantics is the Topological Security Layer for the AI-assisted developer. Don't just build faster. Build Safe.
π Explore SafeSemantics on GitHub π‘οΈππ§

