| title | NEXON-AI |
|---|---|
| emoji | 🛡️ |
| colorFrom | blue |
| colorTo | indigo |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
NEXUS is a next-generation, autonomous dual-agent environment designed to investigate and validate software incidents in real-time. Using a combination of an Investigator and a Validator agent, NEXUS autonomously forms hypotheses, executes systems tools, evaluates system behavior, and reaches strict consensus on root causes.
Traditional manual debugging requires extensive context-switching and tool fatigue. NEXUS solves this through:
- Dual-Agent Autonomy: Two specialized models communicating word-by-word via WebSockets.
- Dynamic Tool Execution: Fully integrated system terminals allowing agents to run sandboxed validation scripts.
- Semantic Reward Engine: Evaluates conversational drift mathematically (using native GPU embeddings).
The result: An AI "Incident Response Team" that navigates servers, traces logs, and fixes bugs identically to a human SRE.
The core command center. Features live agent terminals, a dual-communication consensus log, and a mathematical performance reward graph plotting investigation confidence.
The system is architected for instant adaptability — seamlessly switch LLM providers and inject custom threat models entirely through the frontend DOM.
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT BROWSER │
│ React SPA (Tailwind + Framer Motion) │
│ localhost:5173 │
└───────────┬─────────────────────────────────┬───────────────────┘
│ HTTP (REST) │ ws://
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ FASTAPI BACKEND (localhost:7860) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ /config │ │/scenarios│ │ /reset │ │ ws:// Simulator │ │
│ │ Env Sync │ │ DB Cache │ │ Injection│ │ Live Stream Sync│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
└───────────┬───────────────────────────────────┬─────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ OLLAMA ENGINE / LLM PIPELINE │
│ Agent A (Investigator) ◄──────► Agent B (Validator) │
│ - Generates Hypotheses - Challenges Assertions │
│ - Runs System Tools - Requires Proof │
└─────────────────────────────────────────────────────────────────┘
NEXUS-AI supports two distinct execution models for agent tools, toggleable via the Settings dashboard:
- Default Mode: Agents interact with a pre-defined
clue_mapwithin the scenario YAML. - No System Impact: Commands like
read_logsorcheck_servicereturn mocked data. - Use Case: Training, logic validation, and "what-if" analysis without infrastructure risk.
- Live Connection: Commands are executed in real-time on a remote Linux server via SSH.
- Autonomous Terminal: Agents use the
run_terminal_commandtool to browse logs, check systemd status, and inspect real configs. - Security: Includes a command blocklist to prevent highly destructive operations (e.g.,
rm -rf /). - Use Case: Actual incident response on isolated Lab/Staging nodes.
NEXUS-AI strictly adheres to the OpenEnv 1.0 standard for agent-environment interaction.
The environment accepts a typed NexusAction (Text-based with structured tool calls).
- agent_id:
string("agent_a" or "agent_b") - message:
string(The natural language reasoning/communication) - tool_calls:
List[ToolCall](Optional structured calls likeTOOL: read_logs(file='app.log')) - confidence:
float(0.0 - 1.0)
The environment returns a structured NexusObservation summarizing the system state.
- scenario_description:
string(High-level objective) - scenario_context:
string(Background telemetry/environment info) - partner_message:
string(The last message from the other agent) - tool_results:
List[ToolResult](Output of any executed system tools) - clues_found:
List[string](Accumulated evidence identified by the Reward Engine) - investigation_stage:
string(investigating,narrowing,found,verified) - round:
integer(Current episode round) - available_tools:
List[string](List of permitted tools for the current mode)
| Task Name | Difficulty | Objective | Grader Method |
|---|---|---|---|
software-incident |
Easy | Fix Nginx 503 rate-limit misconfiguration | State Check: nginx-proxy.rate_limit |
business-process-failure |
Medium | Resolve inventory stockout logic error | State Check: stock_threshold + Red Herring Penalty |
cascade-system-failure |
Hard | Fix Postgres connection exhaustion | Multi-Step: Query Termination + Config Update |
Validated using inference.py (Phi-3-mini & Qwen2.5-1.5B).
- Software Incident: 0.88 / 1.00
- Business Process Failure: 0.72 / 1.00
- Cascade System Failure: 0.48 / 1.00
# The EpisodeManager receives the frontend custom scenario JSON
# Broadcasts 'episode_start' natively over the WebSocket to synchronize the UI
await broadcast("episode_start", {
"scenario": active_scenario,
"agent_a_model": settings.AGENT_A_MODEL
})# Agents interact sequentially. The Investigator attempts a solution
# while the Validator challenges it. Both agents have access to dynamic system execution.
client, model_name = model_manager.get_client(agent_id)
stream = await client.chat.completions.create(
model=model_name,
messages=injected_history,
tools=available_tools, # e.g. fix_proposer, run_terminal_command
stream=True
)# Heavy CPU blocking is completely bypassed.
# Semantic embedding computations map strictly into the Ollama GPU pipeline.
@lru_cache(maxsize=256)
def get_embedding(text: str) -> List[float]:
response = httpx.post("http://localhost:11434/api/embeddings", json={
"model": "all-minilm",
"prompt": text
}, timeout=60.0)
return response.json().get("embedding", [])| Layer | Technology | Why |
|---|---|---|
| Frontend Framework | React 18 (Vite) | Lightning fast HMR, component isolation |
| Frontend Styling | Tailwind CSS | Utility-first tactical glassmorphism |
| Backend Framework | FastAPI | Async Python, explicit endpoint mapping |
| Transport Layer | WebSockets | Word-by-word streaming across UI boundaries |
| Local AI Engine | Ollama | Native device acceleration, absolute privacy |
| Remote Provider | HuggingFace Inference API | Drop-in SaaS alternatives |
| SSH Connectivity | Paramiko | Secure remote shell execution for Lab Nodes |
| Data Persistence | LocalStorage & .env Injection |
Avoids over-architected SQL constraints |
- Python 3.10+
- Node.js 18+
- Ollama (installed locally for model hosting)
- Optional: A remote Linux VM (Ubuntu/Kali) with SSH enabled for Lab Node mode
cd backend
# Create and activate virtual environment
python -m venv venv
# source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
# Install all dependencies
pip install -r requirements.txt# This exposes the core REST API and the WebSocket simulation tunnel
python main.pyOpen a new terminal tab:
cd frontend
# Install Node.js dependencies
npm install
# Start the Vite development server
npm run devThe application is now fully accessible at http://localhost:5173.
To run the simulation locally without cloud API keys, you must ensure you pull suitable reasoning models through Ollama:
ollama run qwen2.5:3b # Excellent validator logic footprint
ollama run dolphin-llama3 # Uncensored investigative assertions
ollama pull all-minilm # Mandatory for semantic similarity scoringNEXUS-AI includes a comprehensive test suite to ensure environment stability and specification compliance.
# Run the OpenEnv specification validator
python openenv_validator.py
# Run unit tests for core logic
pip install pytest
pytest tests/Developed by: Ashish Menon & Vector


