Category: 12 - AI Safety, Alignment & Interpretability (Security & Safety for AI)
Date: 2026-05-04
Researcher: Max (OSAI Research Loop)
- Repository: https://github.com/protectai/rebuff
- Stars: 1,471
- License: Apache 2.0
- Category: Adversarial & Red-teaming Tools
- Description: LLM prompt injection detector with canary word detection. Detects and prevents prompt leakage attacks by embedding invisible canary tokens in prompts and monitoring for their exposure in model outputs.
- Last Updated: 2024-08-07 (within 3 months criteria window)
- Repository: https://github.com/samugit83/redamon
- Stars: 1,836
- License: MIT
- Category: Adversarial & Red-teaming Tools
- Description: AI-powered agentic red team framework that automates offensive security operations from reconnaissance to exploitation to post-exploitation with zero human intervention. Integrates multiple security tools for comprehensive penetration testing.
- Last Updated: 2026-05-04 (actively maintained)
- Repository: https://github.com/aliasrobotics/cai
- Stars: 8,384
- License: MIT
- Category: Adversarial & Red-teaming Tools
- Description: Cybersecurity AI framework for semi- and fully-automating offensive and defensive security tasks. Purpose-built for cybersecurity use cases with agent-based architecture for vulnerability assessment and security operations.
- Last Updated: 2026-04-20 (actively maintained)
All projects verified to meet elite criteria:
- 1000+ GitHub stars (Rebuff: 1,471; RedAmon: 1,836; CAI: 8,384)
- Active development (commits within last 3 months)
- OSI-approved open source license (Apache 2.0 or MIT)
- Production-ready with good documentation
- Not already in the repository
Added to Section 10 (AI Safety, Alignment & Interpretability) under Adversarial & Red-teaming Tools:
- **[Rebuff](https://github.com/protectai/rebuff)**  - LLM prompt injection detector with canary word detection. Detects and prevents prompt leakage attacks by embedding invisible canary tokens in prompts and monitoring for their exposure in model outputs. Apache 2.0 licensed.
- **[RedAmon](https://github.com/samugit83/redamon)**  - AI-powered agentic red team framework that automates offensive security operations from reconnaissance to exploitation to post-exploitation with zero human intervention. Integrates multiple security tools for comprehensive penetration testing. MIT licensed.
- **[CAI](https://github.com/aliasrobotics/cai)**  - Cybersecurity AI framework for semi- and fully-automating offensive and defensive security tasks. Purpose-built for cybersecurity use cases with agent-based architecture for vulnerability assessment and security operations. MIT licensed.Category 12 (Security & Safety for AI) already contained many excellent projects including:
- PyRIT (Microsoft), Garak (NVIDIA), Promptfoo, LLM Guard, DeepTeam, Agentic Security
- LlamaFirewall (Meta/PurpleLlama), Detoxify, and others
The 3 new additions fill gaps in:
- Prompt injection detection - Rebuff adds canary word detection capability
- Autonomous red teaming - RedAmon provides fully automated agentic penetration testing
- Cybersecurity automation - CAI offers comprehensive offensive/defensive security automation
All three projects are from active maintainers and have strong community engagement.