Skip to content

RonaldJEN/OpenCapyBox

Repository files navigation

🛁 OpenCapyBox

Your AI assistant lives in a safe box — Sandboxed · Memory-Equipped · Skill-Pluggable · Beginner-Friendly

   ╭━━━━━━━━━━━━╮
    ┃ OpenCapyBox┃
   ┃    ∩  ∩    ┃
   ┃   (◕ ᴥ ◕)  ┃
   ┃  ~~~~~  ┃
   ╰━━━━━━━━━━━━╯

Python 3.10+ TypeScript 5.2+ FastAPI React 18 License

Features · Screenshots · Quick Start · Architecture · Skills · Memory · Deployment · Contributing

中文文档


About

OpenCapyBox is an open-source full-stack AI agent platform. Like a capybara chilling in the water, your AI assistant lives safely inside a sandboxed container — executing code, processing documents, searching the web, managing files, while continuously building memory and learning new skills.

Why OpenCapyBox?

Capybara Trait OpenCapyBox Capability
🛁 Soaks in safe waters One OpenSandbox container per user, fully isolated
🧠 Great memory, knows all friends Layered memory system (USER.md / MEMORY.md / SOUL.md), learns as you use it
🤝 Friends with everyone Multi-model compatible (Qwen / GLM / Kimi / DeepSeek / MiniMax), hot-swap anytime
🎒 Can carry anything 40+ pluggable skills, enable official skills in one click or upload custom ones
Scheduled routines Cron task system, AI autonomously runs periodic jobs
🌐 Chill but reliable Full sandbox-isolated execution, beginner-friendly, all operations visible in the UI

✨ Features

🔀 Hot-Swap Multi-Model Support

Declarative registration via models.yaml, supporting both Anthropic and OpenAI protocols — no code changes needed:

Model Protocol Platform Features
Qwen3.5-plus OpenAI Alibaba DashScope Chain-of-thought, multimodal
GLM-4.7 / GLM-5 OpenAI Alibaba DashScope Chain-of-thought
Kimi-2.5 OpenAI Alibaba DashScope Chain-of-thought, multimodal
DeepSeek-V3.2 OpenAI Alibaba DashScope Chain-of-thought, long context
MiniMax-M2 Anthropic MiniMax Native thinking
💡 Want to add a new model? Just add an entry in models.yaml (click to expand)
  my-new-model:
    display_name: "My New Model"
    provider: openai              # openai or anthropic
    api_base: "https://api.example.com/v1"
    api_key: "${MY_API_KEY}"      # References env variable from .env
    model_name: "my-model-name"
    max_tokens: 32768
    reasoning_format: reasoning_content  # none / reasoning_content / anthropic_thinking
    reasoning_split: true
    enable_thinking: true
    supports_image: false
    enabled: true
    tags: [thinking]

See the header comments in models.yaml for full configuration reference.

🛡️ One Sandbox Per User

  • Each user gets an isolated OpenSandbox container
  • All code execution, file operations, and shell commands run inside the container
  • Persistent storage mount — no data loss
  • File upload/download/search all proxied through the sandbox — users don't need to worry about the internals

🧠 Memory That Grows

File Purpose Plain English
USER.md User profile — your preferences and habits "It remembers what you like"
MEMORY.md Long-term memory — accumulated knowledge "It remembers what you've talked about"
SOUL.md Personality definition — tone and style "Its personality is shaped by you"

Supports BM25 keyword + vector semantic + RRF fusion hybrid retrieval — the more you use it, the better it understands you.

🎨 Beginner-Friendly Interface

  • Claude warm-tone design — soft colors, content-first
  • Streaming output — thinking process and tool calls visible in real-time
  • Skill Manager — category tags, toggle-style enable/disable
  • Agent Config Panel — directly edit SOUL.md / USER.md / MEMORY.md
  • Cron Dashboard — task list + execution history at a glance
  • File Panel — preview and download sandbox files

⏰ Scheduled Tasks

Define Cron jobs via the manage_cron tool, and your AI assistant runs them autonomously:

  • Visual task dashboard in the frontend
  • Manual trigger / pause support
  • Full execution history

🔧 Rich Built-in Tools

Category Tools Description
📁 File Ops Read / Write / Edit Read, write, and string-replace edit files in sandbox
💻 Shell Bash / BashOutput / BashKill Execute commands in container, background process support
🔍 Web Search GLMSearch / BatchSearch Bocha search engine, parallel batch search
🧠 Memory RecordDailyLog / SearchMemory Layered persistent memory + hybrid retrieval
📝 Session Notes SessionNote / RecallNote Cross-turn context preservation
⏰ Cron ManageCron DB-backed cron worker
🎒 Skills GetSkill 40+ dynamically loadable professional skills
🔌 MCP MCP Tools Model Context Protocol tool integration

🔌 MCP Tool Integration

OpenCapyBox supports external tool services via MCP (Model Context Protocol). Configuration file at src/agent/config/mcp.json:

{
  "mcpServers": {
    "my-mcp-server": {
      "description": "My MCP Server",
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@example/mcp-server"],
      "env": { "API_KEY": "your-key" },
      "disabled": false
    }
  }
}
  • type supports stdio (local process) and streamable-http (remote HTTP)
  • Set "disabled": true to temporarily disable an MCP service
  • Use MCP_CONFIG_PATH env variable to customize the config file path

📸 Screenshots

Main Chat Interface

Streaming conversation + AI thinking process unfolding in real-time, tool calls fully visible.

Main Chat Interface

Skill Manager

Category tag filtering, toggle-style enable/disable, easily manage 40+ official skills.

Skill Manager

Agent Config Panel

Directly edit SOUL.md / USER.md / MEMORY.md to shape your AI assistant's personality and memory.

Agent Config Panel

Cron Dashboard

Task list + execution history, manual trigger and status tracking support.

Cron Dashboard

Cron Dashboard

File Panel

Browse sandbox files, preview and download Agent-generated artifacts.

File Panel

File Preview

Markdown rendered preview with source view and one-click download.

File Preview

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 16+
  • uv (Python package manager)
  • OpenSandbox (optional, sandbox execution environment)

1. Clone and Install

git clone https://github.com/RonaldJEN/OpenCapyBox.git
cd OpenCapyBox

# Install Python dependencies
uv sync

# Install frontend dependencies
cd frontend && npm install && cd ..

2. Configure Environment

cp .env.example .env

Edit .env with at minimum:

# === Required ===
LLM_API_KEY=your-dashscope-key           # Alibaba DashScope unified key
SIMPLE_AUTH_USERS=demo:demo123           # Login users (format: user:pass,user2:pass2)

# === Database ===
# Replace the placeholder from .env.example before starting the backend.
# The PostgreSQL database must exist and have pgvector installed:
#   CREATE EXTENSION IF NOT EXISTS vector;
DATABASE_URL=postgresql://user:password@host:5432/opencapybox

# === OpenSandbox (optional) ===
SANDBOX_DOMAIN=localhost:8080
SANDBOX_API_KEY=your-sandbox-key

# === Others ===
# AGENT_MAX_STEPS=100
# AGENT_TOKEN_LIMIT=200000

The PostgreSQL URL in .env.example is only a template. Configure a real PostgreSQL database and enable the pgvector extension before running uv run uvicorn ...; otherwise startup or database initialization will fail.

3. Start Services

# Start backend (port 8000)
uv run uvicorn src.api.main:app --reload --port 8000

# In a new terminal, start frontend (port 3000)
cd frontend && npm run dev

Open http://localhost:3000 and log in with demo / demo123.

Docker Deployment

cd deploy/docker

# Set up environment variables
cp ../../.env.example ../../.env
# Edit .env and fill in your API Key

# Start
docker-compose up -d

# View logs
docker-compose logs -f

🏗️ Architecture

AG-UI Protocol

OpenCapyBox uses AG-UI (Agent User Interaction Protocol) for frontend-backend communication. AG-UI is an event-driven protocol designed for AI Agent scenarios, defining 22 standardized event types (lifecycle, text messages, thinking process, tool calls, state management, etc.), streamed to the frontend via SSE. Compared to traditional request-response patterns, it enables Agent's multi-step reasoning, tool calls, and chain-of-thought to be presented to users in real-time, incrementally, with lastSequence reconnection support to ensure no events are lost after SSE disconnection.

📖 Detailed protocol docs at docs/Capy-project-md/ag-ui-md/

┌──────────────────────────────────────────────────────────────────┐
│  Frontend — React 18 + TypeScript + Vite + TailwindCSS          │
│  Session Mgmt · Streaming Render · Model Switch · Skill Mgmt    │
├──────────────┬───────────────────────────────────────────────────┤
│              │  REST API + SSE (AG-UI Event Protocol)            │
├──────────────▼───────────────────────────────────────────────────┤
│  Backend — FastAPI + SQLAlchemy + PostgreSQL + pgvector         │
│  JWT Auth · Turn Orchestration · Agent Pool · Memory · Cron     │
├──────────────┬───────────────────────────────────────────────────┤
│              │  Agent ↔ LLM Provider / OpenSandbox               │
├──────────────▼───────────────────────────────────────────────────┤
│  Agent Core — Python Async Execution Engine                      │
│  Multi-step Reasoning · Tool Calls · Token Cache · Summarize    │
├──────────────┬──────────────────┬────────────────────────────────┤
│              ▼                  ▼                                 │
│  LLM Providers               OpenSandbox                         │
│  Qwen / GLM / Kimi /         Containerized Code Execution       │
│  DeepSeek / MiniMax           One Sandbox Per User               │
└──────────────────────────────────────────────────────────────────┘

Turn Orchestration

The chat backend splits each conversational turn into three responsibilities: the gateway only translates protocol, the orchestrator owns the full lifecycle, and the runtime guarantees consistency. Each layer does exactly one thing and never crosses boundaries, with PostgreSQL as the single source of truth.

flowchart TB
    Client["Frontend / future external channels"]

    subgraph Gateway["① Gateway + Adapter (thin translation)"]
        G["HTTP/SSE encode · auth · heartbeat keepalive"]
        A["Web Adapter: normalize requests from any channel into one Turn contract"]
    end

    subgraph Orchestrator["② TurnOrchestrator (lifecycle brain)"]
        O["create/resume Round · run lock · cancel token<br/>background producer: execution decoupled from connection"]
    end

    subgraph Runtime["③ In-process runtime (consistency)"]
        BUS["EventBus<br/>event persistence + replay/subscribe"]
        RCS["CompletionService<br/>single terminal entry"]
        CANCEL["CancelService<br/>precise cancellation"]
    end

    DB[("PostgreSQL · single source of truth")]

    Client --> Gateway
    Gateway --> Orchestrator
    Orchestrator --> Runtime
    Runtime --> DB
    Orchestrator -.->|execution independent of connection| Client
Loading

What each layer buys you

Layer Problem it solves Benefit
① Gateway + Adapter Protocol details (HTTP/SSE, auth, field formats) tangled with business logic Business layer is transport-agnostic; adding a new channel (WeChat, Slack, etc.) means writing one adapter — zero changes to the orchestrator
② TurnOrchestrator Round creation, locks, cancellation, background tasks scattered across routes Lifecycle is centralized; execution is decoupled from the SSE connection — the run keeps going when the browser disconnects, and reconnects resume losslessly
③ Runtime trio Event interleaving, duplicate terminal commits, mis-targeted cancels under concurrency EventBus guarantees replay consistency; CompletionService makes terminals single-entry; CancelService hits exactly the current run and never kills a new one

📐 Detailed contracts and failure modes at docs/specs/chat-spec.md

Project Structure

OpenCapyBox/
├── src/
│   ├── agent/                    # Agent core engine
│   │   ├── agent.py              # Main loop (token cache, context summary, event gen)
│   │   ├── event_emitter.py      # AG-UI event emitter
│   │   ├── llm/                  # LLM clients (Anthropic / OpenAI protocols)
│   │   ├── tools/                # Tool implementations (sandbox file/shell/memory/search/cron/MCP)
│   │   ├── skills/               # 40+ loadable skills (git submodule)
│   │   └── schema/               # Data models & AG-UI event definitions
│   │
│   └── api/                      # FastAPI backend
│       ├── main.py               # App entry point
│       ├── config.py             # pydantic-settings configuration
│       ├── routes/               # API routes (auth/chat/sessions/models/cron/config)
│       ├── services/             # Business logic (agent/sandbox/history/memory/cron)
│       ├── models/               # SQLAlchemy ORM models
│       └── schemas/              # Pydantic request/response models
│
├── frontend/                     # React frontend
│   ├── src/
│   │   ├── components/           # UI components (ChatV2/SessionList/ArtifactsPanel/...)
│   │   ├── services/             # API clients
│   │   ├── utils/                # Message parsing/content chunking/file handling
│   │   └── types/                # TypeScript types
│   └── (design system moved to docs/specs/frontend-spec.md)
│
├── tests/                        # Python tests (30+ test files)
├── docs/                         # Project documentation
├── deploy/                       # Docker + deployment scripts
├── models.yaml                   # LLM model registry
├── pyproject.toml                # Python project configuration
└── .env.example                  # Environment variable template

🎒 Skill System

Official Skill Library

Skills follow the Agent Skills Spec — each Skill is a standalone folder containing a SKILL.md. Users can enable/disable skills via the frontend Skill Manager:

Category Example Skills Description
📄 Documents docx, pdf, xlsx, pptx, nano-pdf Document parsing and generation
💻 Coding coding-agent, git, github, playwright Coding assistant and version control
🎨 Design canvas, frontend-design, tailwind-design-system UI/UX design assistance
🧠 Meta skill-creator, self-improving, reflection, memory Self-evolution and reflection
🔍 Other oracle, brainstorming, proactive-agent, session-logs Toolbox

Custom Skills

🚧 Coming soon: Install custom Skills by uploading ZIP packages via the frontend

Currently, you can register new skills by placing Skill folders in the src/agent/skills/ directory.

🧠 Memory System

OpenCapyBox's layered memory makes your AI assistant understand you better over time:

┌─────────────────────────────────────────────┐
│  SOUL.md    — Who am I? (Personality)       │
├─────────────────────────────────────────────┤
│  USER.md    — Who are you? (User Profile)   │
├─────────────────────────────────────────────┤
│  MEMORY.md  — What have we discussed?       │
│               (Long-term Memory)            │
├─────────────────────────────────────────────┤
│  AGENTS.md  — Team collaboration rules      │
└─────────────────────────────────────────────┘

Retrieval: BM25 keyword + Embedding vector + RRF fusion + time decay. Automatically falls back to keyword-only search when Embedding is not configured.

All config files can be edited directly in the frontend Agent Config Panel.

🚢 Deployment Guide

Nginx Reverse Proxy

server {
    listen 80;
    server_name your-domain.com;

    location / {
        root /var/www/opencapybox/frontend/dist;
        try_files $uri $uri/ /index.html;
    }

    location /api {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_buffering off;              # Required for SSE
    }
}

Environment Variables Reference

Click to expand full environment variable list
# === Required ===
LLM_API_KEY=                           # Alibaba DashScope unified key
SIMPLE_AUTH_USERS=demo:demo123         # Auth users

# === Optional: LLM ===
MINIMAX_API_KEY=                       # MiniMax dedicated key
EMBEDDING_API_KEY=                     # Embedding key (falls back to BM25 if empty)

# === Optional: Tools ===
BOCHA_SEARCH_APPCODE=                  # Bocha search AppCode

# === OpenSandbox ===
SANDBOX_DOMAIN=localhost:8080
SANDBOX_API_KEY=
SANDBOX_IMAGE=code-interpreter-agent:v1.1.0
SANDBOX_PROTOCOL=http
SANDBOX_TIMEOUT_MINUTES=60
SANDBOX_PERSISTENT_STORAGE_ENABLED=true

# === Application ===
DEBUG=false
CORS_ORIGINS=["http://localhost:3000"]
DATABASE_URL=postgresql://user:password@host:5432/opencapybox
# PostgreSQL must have pgvector installed: CREATE EXTENSION IF NOT EXISTS vector;
# Optional pytest integration database, never point this at production:
TEST_DATABASE_URL=postgresql://user:password@host:5432/opencapybox_test
AUTH_SECRET_KEY=                        # Auto-derived if not set
AUTH_TOKEN_EXPIRE_MINUTES=720

# === Agent ===
UVICORN_WORKERS=1
AGENT_MAX_STEPS=100
AGENT_TOKEN_LIMIT=200000
AGENT_USER_CONCURRENCY_LIMIT=1

# === SSE ===
SSE_HEARTBEAT_INTERVAL=15
SSE_SUBSCRIBE_TIMEOUT=300

# === Embedding ===
EMBEDDING_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
EMBEDDING_MODEL=text-embedding-v4

📖 Development Guide

Spec-Driven Documentation

Each module has a standalone spec covering data models, API contracts, behavior semantics, failure modes, and observability:

Spec Scope
auth-spec.md Authentication
sessions-spec.md Session management
chat-spec.md Chat / Agent execution / SSE streaming
cron-spec.md Cron scheduled tasks
memory-spec.md Layered memory system
sandbox-spec.md Sandbox interaction
models-spec.md Model registry & switching
config-spec.md Agent config & skills
subagent-run-graph-spec.md Subagent run graph
frontend-spec.md Frontend overview & design system
frontend-chat-spec.md Frontend chat / SSE / reasoning panel
frontend-session-spec.md Frontend session list & switching
frontend-panel-spec.md Frontend drawer panels

AG-UI protocol details at docs/Capy-project-md/ag-ui-md/.

Running Tests

# Python backend tests
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ -v --cov=src

# Frontend tests
cd frontend && npm run test

Adding New Tools

  1. Create a tool class in src/agent/tools/, inheriting from the Tool base class
  2. Register it in create_agent_tools() in src/api/services/tool_factory.py
  3. Write tests in tests/
  4. Update the relevant spec in docs/specs/
from src.agent.tools.base import Tool, ToolResult

class MyTool(Tool):
    @property
    def name(self) -> str:
        return "my_tool"

    @property
    def description(self) -> str:
        return "Tool description"

    @property
    def parameters(self) -> dict:
        return {
            "type": "object",
            "properties": {"param": {"type": "string"}},
            "required": ["param"]
        }

    async def execute(self, param: str) -> ToolResult:
        return ToolResult(success=True, content="Result")

Commit Convention

<type>(<scope>): <description>

feat(agent): add new search tool
fix(frontend): fix message scroll jitter
docs(api): update Cron API documentation

🤝 Contributing

All forms of contribution are welcome!

  1. Fork this repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'feat: add amazing feature'
  4. Push the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Contribution areas: Bug fixes · New tools/skills · New model adapters · UI improvements · Documentation · Performance optimization · i18n

📄 License

This project is licensed under the Apache License 2.0.

🙏 Acknowledgments

  • FastAPI — High-performance async web framework
  • React — Modern frontend framework
  • OpenSandbox — Alibaba's open-source secure sandbox execution environment
  • Anthropic / OpenAI — LLM API protocols
  • DashScope — Alibaba Cloud model service platform
  • TailwindCSS — Utility-first CSS framework
  • Vite — Next-generation frontend build tool

🗺️ Roadmap

⚡ Performance & Scalability

  • Parallel subagent execution — subagents run serially today; plan to schedule multiple child tasks concurrently
  • Multi-worker deployment + Redis sync — introduce Redis for cross-worker event fanout / cancel registry / run locks, lifting the single-worker assumption
  • Pluggable EventBus: upgrade the in-process bus to a backend choice (in-process / Redis Pub-Sub)
  • Caching layer for session & memory retrieval

🔌 Channels & Integrations

  • External channel adapters (WeChat / DingTalk / Slack, etc.) — reusing the TurnOrchestrator layer
  • WebSocket bidirectional communication
  • More model providers (Gemini, Claude direct, etc.)

🎒 Skills & Ecosystem

  • Skill ZIP package upload & install
  • Skill marketplace
  • Agent workflow orchestration

👥 Collaboration & Multi-tenancy

  • Multi-tenant permission system
  • Session sharing & collaboration
  • Multi-language UI

If OpenCapyBox helps you, please give it a ⭐

Like a capybara — calm, friendly, and surprisingly capable. 🛁

Report Bug · Feature Request · Discussions

About

OpenCapyBox 是一个开源全栈 AI Agent 平台,支持沙箱隔离执行、可插拔技能、分层记忆与实时 AG-UI 流式交互。

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors