Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
-
Updated
May 30, 2026
Human-in-the-loop adversarial workflows for high-stakes research audit: from ChatGPT-Gemini duels to 4-model MAD.
Three Claude Code skills for working with Codex CLI: codex-bridge (one-shot Codex calls), mad-build (Claude+Codex collaboration with cross-review), and mad-research (three-stream adversarial audit of papers, grants, reports with anonymized cross-critique and fresh-Codex synthesis).
RevealVLLMSafetyEval is a comprehensive pipeline for evaluating Vision-Language Models (VLMs) on their compliance with harm-related policies. It automates the creation of adversarial multi-turn datasets and the evaluation of model responses, supporting responsible AI development and red-teaming efforts.
Claude Code plugin implementing Anthropic's 3-agent harness (Planner, Generator, Evaluator) for long-running app development with pluggable rubrics and adversarial evaluation
Scientific QA robustness evaluation pipeline for evidence-missing RAG scenarios on PeerQA, with EM/F1 reliability analysis.
GuardMCP - Deterministic Runtime Semantic Enforcement for Agentic Tool Execution using Directional Intent–Action Alignment
Multi-agent deep research engine with SIA (Semantic Intelligence Architecture) — thermodynamic entropy control, adversarial critique, multi-reactor swarm orchestration
CIDeR: a reproducible benchmark framework for causal exposure control in multi-agent LLM deliberation, comparing exposure-aware aggregation against voting, self-consistency, debate, causal-credit, social-choice, diversity, and adversarial baselines.
An adversarial AI expert workshop that stress-tests a research paper (rival-tradition referees argue; every comment quote-grounded and independently re-verified) and then rebuilds it: tracked-changes redline, clean version, your code re-run under a provenance wall, and a replication package. A Claude Code skill.
Add a description, image, and links to the adversarial-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the adversarial-evaluation topic, visit your repo's landing page and select "manage topics."