Your AI assistant lives in a safe box — Sandboxed · Memory-Equipped · Skill-Pluggable · Beginner-Friendly
╭━━━━━━━━━━━━╮
┃ OpenCapyBox┃
┃ ∩ ∩ ┃
┃ (◕ ᴥ ◕) ┃
┃ ~~~~~ ┃
╰━━━━━━━━━━━━╯
Features · Screenshots · Quick Start · Architecture · Skills · Memory · Deployment · Contributing
OpenCapyBox is an open-source full-stack AI agent platform. Like a capybara chilling in the water, your AI assistant lives safely inside a sandboxed container — executing code, processing documents, searching the web, managing files, while continuously building memory and learning new skills.
| Capybara Trait | OpenCapyBox Capability | |
|---|---|---|
| 🛁 | Soaks in safe waters | One OpenSandbox container per user, fully isolated |
| 🧠 | Great memory, knows all friends | Layered memory system (USER.md / MEMORY.md / SOUL.md), learns as you use it |
| 🤝 | Friends with everyone | Multi-model compatible (Qwen / GLM / Kimi / DeepSeek / MiniMax), hot-swap anytime |
| 🎒 | Can carry anything | 40+ pluggable skills, enable official skills in one click or upload custom ones |
| ⏰ | Scheduled routines | Cron task system, AI autonomously runs periodic jobs |
| 🌐 | Chill but reliable | Full sandbox-isolated execution, beginner-friendly, all operations visible in the UI |
Declarative registration via models.yaml, supporting both Anthropic and OpenAI protocols — no code changes needed:
| Model | Protocol | Platform | Features |
|---|---|---|---|
| Qwen3.5-plus | OpenAI | Alibaba DashScope | Chain-of-thought, multimodal |
| GLM-4.7 / GLM-5 | OpenAI | Alibaba DashScope | Chain-of-thought |
| Kimi-2.5 | OpenAI | Alibaba DashScope | Chain-of-thought, multimodal |
| DeepSeek-V3.2 | OpenAI | Alibaba DashScope | Chain-of-thought, long context |
| MiniMax-M2 | Anthropic | MiniMax | Native thinking |
💡 Want to add a new model? Just add an entry in models.yaml (click to expand)
my-new-model:
display_name: "My New Model"
provider: openai # openai or anthropic
api_base: "https://api.example.com/v1"
api_key: "${MY_API_KEY}" # References env variable from .env
model_name: "my-model-name"
max_tokens: 32768
reasoning_format: reasoning_content # none / reasoning_content / anthropic_thinking
reasoning_split: true
enable_thinking: true
supports_image: false
enabled: true
tags: [thinking]See the header comments in models.yaml for full configuration reference.
- Each user gets an isolated OpenSandbox container
- All code execution, file operations, and shell commands run inside the container
- Persistent storage mount — no data loss
- File upload/download/search all proxied through the sandbox — users don't need to worry about the internals
| File | Purpose | Plain English |
|---|---|---|
USER.md |
User profile — your preferences and habits | "It remembers what you like" |
MEMORY.md |
Long-term memory — accumulated knowledge | "It remembers what you've talked about" |
SOUL.md |
Personality definition — tone and style | "Its personality is shaped by you" |
Supports BM25 keyword + vector semantic + RRF fusion hybrid retrieval — the more you use it, the better it understands you.
- Claude warm-tone design — soft colors, content-first
- Streaming output — thinking process and tool calls visible in real-time
- Skill Manager — category tags, toggle-style enable/disable
- Agent Config Panel — directly edit SOUL.md / USER.md / MEMORY.md
- Cron Dashboard — task list + execution history at a glance
- File Panel — preview and download sandbox files
Define Cron jobs via the manage_cron tool, and your AI assistant runs them autonomously:
- Visual task dashboard in the frontend
- Manual trigger / pause support
- Full execution history
| Category | Tools | Description |
|---|---|---|
| 📁 File Ops | Read / Write / Edit | Read, write, and string-replace edit files in sandbox |
| 💻 Shell | Bash / BashOutput / BashKill | Execute commands in container, background process support |
| 🔍 Web Search | GLMSearch / BatchSearch | Bocha search engine, parallel batch search |
| 🧠 Memory | RecordDailyLog / SearchMemory | Layered persistent memory + hybrid retrieval |
| 📝 Session Notes | SessionNote / RecallNote | Cross-turn context preservation |
| ⏰ Cron | ManageCron | DB-backed cron worker |
| 🎒 Skills | GetSkill | 40+ dynamically loadable professional skills |
| 🔌 MCP | MCP Tools | Model Context Protocol tool integration |
OpenCapyBox supports external tool services via MCP (Model Context Protocol). Configuration file at src/agent/config/mcp.json:
{
"mcpServers": {
"my-mcp-server": {
"description": "My MCP Server",
"type": "stdio",
"command": "npx",
"args": ["-y", "@example/mcp-server"],
"env": { "API_KEY": "your-key" },
"disabled": false
}
}
}typesupportsstdio(local process) andstreamable-http(remote HTTP)- Set
"disabled": trueto temporarily disable an MCP service - Use
MCP_CONFIG_PATHenv variable to customize the config file path
Streaming conversation + AI thinking process unfolding in real-time, tool calls fully visible.
Category tag filtering, toggle-style enable/disable, easily manage 40+ official skills.
Directly edit SOUL.md / USER.md / MEMORY.md to shape your AI assistant's personality and memory.
Task list + execution history, manual trigger and status tracking support.
Browse sandbox files, preview and download Agent-generated artifacts.
Markdown rendered preview with source view and one-click download.
- Python 3.10+
- Node.js 16+
- uv (Python package manager)
- OpenSandbox (optional, sandbox execution environment)
git clone https://github.com/RonaldJEN/OpenCapyBox.git
cd OpenCapyBox
# Install Python dependencies
uv sync
# Install frontend dependencies
cd frontend && npm install && cd ..cp .env.example .envEdit .env with at minimum:
# === Required ===
LLM_API_KEY=your-dashscope-key # Alibaba DashScope unified key
SIMPLE_AUTH_USERS=demo:demo123 # Login users (format: user:pass,user2:pass2)
# === Database ===
# Replace the placeholder from .env.example before starting the backend.
# The PostgreSQL database must exist and have pgvector installed:
# CREATE EXTENSION IF NOT EXISTS vector;
DATABASE_URL=postgresql://user:password@host:5432/opencapybox
# === OpenSandbox (optional) ===
SANDBOX_DOMAIN=localhost:8080
SANDBOX_API_KEY=your-sandbox-key
# === Others ===
# AGENT_MAX_STEPS=100
# AGENT_TOKEN_LIMIT=200000The PostgreSQL URL in .env.example is only a template. Configure a real PostgreSQL database and enable the pgvector extension before running uv run uvicorn ...; otherwise startup or database initialization will fail.
# Start backend (port 8000)
uv run uvicorn src.api.main:app --reload --port 8000
# In a new terminal, start frontend (port 3000)
cd frontend && npm run devOpen http://localhost:3000 and log in with demo / demo123.
cd deploy/docker
# Set up environment variables
cp ../../.env.example ../../.env
# Edit .env and fill in your API Key
# Start
docker-compose up -d
# View logs
docker-compose logs -fOpenCapyBox uses AG-UI (Agent User Interaction Protocol) for frontend-backend communication. AG-UI is an event-driven protocol designed for AI Agent scenarios, defining 22 standardized event types (lifecycle, text messages, thinking process, tool calls, state management, etc.), streamed to the frontend via SSE. Compared to traditional request-response patterns, it enables Agent's multi-step reasoning, tool calls, and chain-of-thought to be presented to users in real-time, incrementally, with lastSequence reconnection support to ensure no events are lost after SSE disconnection.
📖 Detailed protocol docs at docs/Capy-project-md/ag-ui-md/
┌──────────────────────────────────────────────────────────────────┐
│ Frontend — React 18 + TypeScript + Vite + TailwindCSS │
│ Session Mgmt · Streaming Render · Model Switch · Skill Mgmt │
├──────────────┬───────────────────────────────────────────────────┤
│ │ REST API + SSE (AG-UI Event Protocol) │
├──────────────▼───────────────────────────────────────────────────┤
│ Backend — FastAPI + SQLAlchemy + PostgreSQL + pgvector │
│ JWT Auth · Turn Orchestration · Agent Pool · Memory · Cron │
├──────────────┬───────────────────────────────────────────────────┤
│ │ Agent ↔ LLM Provider / OpenSandbox │
├──────────────▼───────────────────────────────────────────────────┤
│ Agent Core — Python Async Execution Engine │
│ Multi-step Reasoning · Tool Calls · Token Cache · Summarize │
├──────────────┬──────────────────┬────────────────────────────────┤
│ ▼ ▼ │
│ LLM Providers OpenSandbox │
│ Qwen / GLM / Kimi / Containerized Code Execution │
│ DeepSeek / MiniMax One Sandbox Per User │
└──────────────────────────────────────────────────────────────────┘
The chat backend splits each conversational turn into three responsibilities: the gateway only translates protocol, the orchestrator owns the full lifecycle, and the runtime guarantees consistency. Each layer does exactly one thing and never crosses boundaries, with PostgreSQL as the single source of truth.
flowchart TB
Client["Frontend / future external channels"]
subgraph Gateway["① Gateway + Adapter (thin translation)"]
G["HTTP/SSE encode · auth · heartbeat keepalive"]
A["Web Adapter: normalize requests from any channel into one Turn contract"]
end
subgraph Orchestrator["② TurnOrchestrator (lifecycle brain)"]
O["create/resume Round · run lock · cancel token<br/>background producer: execution decoupled from connection"]
end
subgraph Runtime["③ In-process runtime (consistency)"]
BUS["EventBus<br/>event persistence + replay/subscribe"]
RCS["CompletionService<br/>single terminal entry"]
CANCEL["CancelService<br/>precise cancellation"]
end
DB[("PostgreSQL · single source of truth")]
Client --> Gateway
Gateway --> Orchestrator
Orchestrator --> Runtime
Runtime --> DB
Orchestrator -.->|execution independent of connection| Client
What each layer buys you
| Layer | Problem it solves | Benefit |
|---|---|---|
| ① Gateway + Adapter | Protocol details (HTTP/SSE, auth, field formats) tangled with business logic | Business layer is transport-agnostic; adding a new channel (WeChat, Slack, etc.) means writing one adapter — zero changes to the orchestrator |
| ② TurnOrchestrator | Round creation, locks, cancellation, background tasks scattered across routes | Lifecycle is centralized; execution is decoupled from the SSE connection — the run keeps going when the browser disconnects, and reconnects resume losslessly |
| ③ Runtime trio | Event interleaving, duplicate terminal commits, mis-targeted cancels under concurrency | EventBus guarantees replay consistency; CompletionService makes terminals single-entry; CancelService hits exactly the current run and never kills a new one |
📐 Detailed contracts and failure modes at docs/specs/chat-spec.md
OpenCapyBox/
├── src/
│ ├── agent/ # Agent core engine
│ │ ├── agent.py # Main loop (token cache, context summary, event gen)
│ │ ├── event_emitter.py # AG-UI event emitter
│ │ ├── llm/ # LLM clients (Anthropic / OpenAI protocols)
│ │ ├── tools/ # Tool implementations (sandbox file/shell/memory/search/cron/MCP)
│ │ ├── skills/ # 40+ loadable skills (git submodule)
│ │ └── schema/ # Data models & AG-UI event definitions
│ │
│ └── api/ # FastAPI backend
│ ├── main.py # App entry point
│ ├── config.py # pydantic-settings configuration
│ ├── routes/ # API routes (auth/chat/sessions/models/cron/config)
│ ├── services/ # Business logic (agent/sandbox/history/memory/cron)
│ ├── models/ # SQLAlchemy ORM models
│ └── schemas/ # Pydantic request/response models
│
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # UI components (ChatV2/SessionList/ArtifactsPanel/...)
│ │ ├── services/ # API clients
│ │ ├── utils/ # Message parsing/content chunking/file handling
│ │ └── types/ # TypeScript types
│ └── (design system moved to docs/specs/frontend-spec.md)
│
├── tests/ # Python tests (30+ test files)
├── docs/ # Project documentation
├── deploy/ # Docker + deployment scripts
├── models.yaml # LLM model registry
├── pyproject.toml # Python project configuration
└── .env.example # Environment variable template
Skills follow the Agent Skills Spec — each Skill is a standalone folder containing a SKILL.md. Users can enable/disable skills via the frontend Skill Manager:
| Category | Example Skills | Description |
|---|---|---|
| 📄 Documents | docx, pdf, xlsx, pptx, nano-pdf | Document parsing and generation |
| 💻 Coding | coding-agent, git, github, playwright | Coding assistant and version control |
| 🎨 Design | canvas, frontend-design, tailwind-design-system | UI/UX design assistance |
| 🧠 Meta | skill-creator, self-improving, reflection, memory | Self-evolution and reflection |
| 🔍 Other | oracle, brainstorming, proactive-agent, session-logs | Toolbox |
🚧 Coming soon: Install custom Skills by uploading ZIP packages via the frontend
Currently, you can register new skills by placing Skill folders in the src/agent/skills/ directory.
OpenCapyBox's layered memory makes your AI assistant understand you better over time:
┌─────────────────────────────────────────────┐
│ SOUL.md — Who am I? (Personality) │
├─────────────────────────────────────────────┤
│ USER.md — Who are you? (User Profile) │
├─────────────────────────────────────────────┤
│ MEMORY.md — What have we discussed? │
│ (Long-term Memory) │
├─────────────────────────────────────────────┤
│ AGENTS.md — Team collaboration rules │
└─────────────────────────────────────────────┘
Retrieval: BM25 keyword + Embedding vector + RRF fusion + time decay. Automatically falls back to keyword-only search when Embedding is not configured.
All config files can be edited directly in the frontend Agent Config Panel.
server {
listen 80;
server_name your-domain.com;
location / {
root /var/www/opencapybox/frontend/dist;
try_files $uri $uri/ /index.html;
}
location /api {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_buffering off; # Required for SSE
}
}Click to expand full environment variable list
# === Required ===
LLM_API_KEY= # Alibaba DashScope unified key
SIMPLE_AUTH_USERS=demo:demo123 # Auth users
# === Optional: LLM ===
MINIMAX_API_KEY= # MiniMax dedicated key
EMBEDDING_API_KEY= # Embedding key (falls back to BM25 if empty)
# === Optional: Tools ===
BOCHA_SEARCH_APPCODE= # Bocha search AppCode
# === OpenSandbox ===
SANDBOX_DOMAIN=localhost:8080
SANDBOX_API_KEY=
SANDBOX_IMAGE=code-interpreter-agent:v1.1.0
SANDBOX_PROTOCOL=http
SANDBOX_TIMEOUT_MINUTES=60
SANDBOX_PERSISTENT_STORAGE_ENABLED=true
# === Application ===
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
DATABASE_URL=postgresql://user:password@host:5432/opencapybox
# PostgreSQL must have pgvector installed: CREATE EXTENSION IF NOT EXISTS vector;
# Optional pytest integration database, never point this at production:
TEST_DATABASE_URL=postgresql://user:password@host:5432/opencapybox_test
AUTH_SECRET_KEY= # Auto-derived if not set
AUTH_TOKEN_EXPIRE_MINUTES=720
# === Agent ===
UVICORN_WORKERS=1
AGENT_MAX_STEPS=100
AGENT_TOKEN_LIMIT=200000
AGENT_USER_CONCURRENCY_LIMIT=1
# === SSE ===
SSE_HEARTBEAT_INTERVAL=15
SSE_SUBSCRIBE_TIMEOUT=300
# === Embedding ===
EMBEDDING_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
EMBEDDING_MODEL=text-embedding-v4Each module has a standalone spec covering data models, API contracts, behavior semantics, failure modes, and observability:
| Spec | Scope |
|---|---|
| auth-spec.md | Authentication |
| sessions-spec.md | Session management |
| chat-spec.md | Chat / Agent execution / SSE streaming |
| cron-spec.md | Cron scheduled tasks |
| memory-spec.md | Layered memory system |
| sandbox-spec.md | Sandbox interaction |
| models-spec.md | Model registry & switching |
| config-spec.md | Agent config & skills |
| subagent-run-graph-spec.md | Subagent run graph |
| frontend-spec.md | Frontend overview & design system |
| frontend-chat-spec.md | Frontend chat / SSE / reasoning panel |
| frontend-session-spec.md | Frontend session list & switching |
| frontend-panel-spec.md | Frontend drawer panels |
AG-UI protocol details at docs/Capy-project-md/ag-ui-md/.
# Python backend tests
uv run pytest tests/ -v
# With coverage
uv run pytest tests/ -v --cov=src
# Frontend tests
cd frontend && npm run test- Create a tool class in
src/agent/tools/, inheriting from theToolbase class - Register it in
create_agent_tools()insrc/api/services/tool_factory.py - Write tests in
tests/ - Update the relevant spec in
docs/specs/
from src.agent.tools.base import Tool, ToolResult
class MyTool(Tool):
@property
def name(self) -> str:
return "my_tool"
@property
def description(self) -> str:
return "Tool description"
@property
def parameters(self) -> dict:
return {
"type": "object",
"properties": {"param": {"type": "string"}},
"required": ["param"]
}
async def execute(self, param: str) -> ToolResult:
return ToolResult(success=True, content="Result")<type>(<scope>): <description>
feat(agent): add new search tool
fix(frontend): fix message scroll jitter
docs(api): update Cron API documentation
All forms of contribution are welcome!
- Fork this repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'feat: add amazing feature' - Push the branch:
git push origin feature/amazing-feature - Open a Pull Request
Contribution areas: Bug fixes · New tools/skills · New model adapters · UI improvements · Documentation · Performance optimization · i18n
This project is licensed under the Apache License 2.0.
- FastAPI — High-performance async web framework
- React — Modern frontend framework
- OpenSandbox — Alibaba's open-source secure sandbox execution environment
- Anthropic / OpenAI — LLM API protocols
- DashScope — Alibaba Cloud model service platform
- TailwindCSS — Utility-first CSS framework
- Vite — Next-generation frontend build tool
- Parallel subagent execution — subagents run serially today; plan to schedule multiple child tasks concurrently
- Multi-worker deployment + Redis sync — introduce Redis for cross-worker event fanout / cancel registry / run locks, lifting the single-worker assumption
- Pluggable EventBus: upgrade the in-process bus to a backend choice (in-process / Redis Pub-Sub)
- Caching layer for session & memory retrieval
- External channel adapters (WeChat / DingTalk / Slack, etc.) — reusing the TurnOrchestrator layer
- WebSocket bidirectional communication
- More model providers (Gemini, Claude direct, etc.)
- Skill ZIP package upload & install
- Skill marketplace
- Agent workflow orchestration
- Multi-tenant permission system
- Session sharing & collaboration
- Multi-language UI
If OpenCapyBox helps you, please give it a ⭐
Like a capybara — calm, friendly, and surprisingly capable. 🛁






