Skip to content

Latest commit

 

History

History
323 lines (250 loc) · 16.4 KB

File metadata and controls

323 lines (250 loc) · 16.4 KB

Containerized Toolbox

Problem

The original architecture runs toolbox CLI tools as child processes inside the Ghost container. This forces the Ghost Dockerfile to install every runtime dependency any toolbox might need (Python 3.10, uv, Node, etc.). When a toolbox workspace pins Python 3.10 but Debian Bookworm ships 3.11, the tools break. Each new toolbox adds more dependencies to a single container that becomes increasingly fragile.

Additionally, all agents shared a single OS user in the toolbox container, so any agent could read or write any other agent's workspace. The sandbox provided only lexical path checks, not OS-level isolation.

Architecture

Each toolbox runs as a separate Docker container with its own runtime environment. The Ghost engine communicates with toolbox containers over HTTP via a lightweight exec server. Ghost is a pure control plane — it does not mount /mnt/workspace and has no direct filesystem access to agent workspaces.

┌──────────────────────────────────────────────────────────────────────┐
│                        Docker Compose Network                        │
│                                                                      │
│  ┌─────────────────────┐   HTTP    ┌─────────────────────────────┐  │
│  │   Ghost Container   │──────────▸│  Toolbox: devtools          │  │
│  │   (pure control     │           │  (Python, Node, Cursor, ...) │  │
│  │    plane, no         │           │                             │  │
│  │    workspace mount) │           │  toolbox-server :9090       │  │
│  │                     │           │  ├── POST /exec             │  │
│  │  ghost engine       │           │  ├── POST /exec-stream      │  │
│  │  RemoteExecutor ────┤           │  ├── POST /workspace/read   │  │
│  │  RemoteWorkspace ───┤           │  ├── POST /workspace/write  │  │
│  │                     │           │  │                           │  │
│  │  /mnt/data          │           │  │  Per-agent isolation:     │  │
│  └─────────────────────┘           │  │  ├── mount namespaces    │  │
│                                     │  │  ├── per-agent users     │  │
│                                     │  │  └── cgroups (optional)  │  │
│                                     │  │                           │  │
│                                     │  /mnt/workspace             │  │
│                                     │  /mnt/shared-readonly       │  │
│                                     └─────────────────────────────┘  │
│                                                                      │
│  ┌───────────────────────────────────────────────────────────────┐   │
│  │  /mnt/workspace (GCS: agent-workspace) — devtools only       │   │
│  │  Per-agent dirs: /mnt/workspace/{agent_id}/                  │   │
│  └───────────────────────────────────────────────────────────────┘   │
│  ┌───────────────────────────────────────────────────────────────┐   │
│  │  /mnt/shared-readonly — devtools only, read-only             │   │
│  │  site-templates, etc.                                         │   │
│  └───────────────────────────────────────────────────────────────┘   │
│  ┌───────────────────────────────────────────────────────────────┐   │
│  │  /mnt/data (GCS: ghost-data) — Ghost container only          │   │
│  └───────────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────────┘

Key principles:

  • Ghost container is a pure control plane. No Python, no uv, no Node, no workspace mount.
  • The devtools container owns all workspace access and enforces per-agent isolation.
  • Communication is HTTP (simple, debuggable, no Docker socket needed).
  • Per-agent mount namespaces ensure each command sees only its own workspace.
  • Per-agent Linux users provide process-level privilege separation.

Components

Toolbox Server (cmd/toolbox-server/)

A Go binary that runs inside each toolbox container as the entrypoint. It accepts command execution and workspace I/O requests over HTTP, with optional per-agent mount namespace isolation and cgroup resource limits.

API:

Endpoint Method Description
/exec POST Execute a shell command (sandboxed when --multi-agent)
/exec-stream POST Execute and stream stdout line-by-line as NDJSON
/workspace/read POST Read a file from an agent's workspace or shared dir
/workspace/write POST Write a file to an agent's workspace
/healthz GET Health check (returns 200)

POST /exec request:

{
  "command": "multimodal-cli image create --input-text 'a cat' --output /workspace/cat.png",
  "timeout_sec": 120,
  "max_output_bytes": 131072,
  "env": {
    "AGENT_ID": "engineer-acme_12345",
    "AGENT_WORKSPACE": "/mnt/workspace/engineer-acme_12345"
  },
  "cgroup": {
    "memory_mb": 512,
    "cpu_percent": 100,
    "max_pids": 256
  }
}

POST /exec response:

{
  "stdout": "Image saved to /workspace/cat.png",
  "stderr": "",
  "exit_code": 0,
  "duration_ms": 4521,
  "artifacts": [
    {"path": "/mnt/workspace/engineer-acme_12345/cat.png", "size": 245760, "mime_type": "image/png"}
  ]
}

POST /workspace/read request / response:

// Request
{"agent_id": "engineer-acme_12345", "path": "repos/site/index.html"}
// Response
{"content": "<!doctype html>...", "size": 1234}
// Request (shared dir prefix)
{"agent_id": "engineer-acme_12345", "path": "site-templates/style/styles.json"}
// Response
{"content": "{...}", "size": 567}

POST /workspace/write request / response:

// Request
{"agent_id": "engineer-acme_12345", "path": "repos/site/index.html", "content": "<!doctype html>..."}
// Response
{"bytes_written": 1234}

Flags:

Flag Default Description
--port 9090 Listen port
--shell /bin/bash Shell for command execution
--workdir . Base working directory (workspace root)
--multi-agent false Enable per-agent mount namespace isolation
--shared-readonly (deprecated, no-op) Formerly used for bind-mount into sandbox
--shared-dirs Virtual prefix:path pairs for workspace.read
--toolchain-path Extra PATH prefix for sandboxed processes

Flags also read from environment variables as fallback: SHARED_DIRS, TOOLCHAIN_PATH.

Multi-Agent Isolation

When --multi-agent is enabled, the toolbox-server provides three layers of isolation:

1. Mount Namespace Isolation: Each /exec request creates a Linux mount namespace (CLONE_NEWNS via unshare --mount). Inside the namespace:

  • /mnt/workspace/{agent_id} is bind-mounted to /workspace
  • /mnt/workspace is unmounted (lazy) so other agents' directories are invisible
  • /mnt/shared-readonly is inherited from the parent namespace (gcsfuse mounts propagate automatically)

2. Per-Agent Linux Users: The agentUserManager creates a deterministic Linux user for each agent:

  • Username: gis-{fnv32hex} (e.g. gis-a1b2c3d4)
  • UID: (FNV32(agent_id) % 60000) + 10000, incrementing on collision
  • Group: agents (created at container build time)
  • Commands run via setpriv --reuid --regid --init-groups

3. Per-Agent Cgroups (optional): Resource limits per agent at /sys/fs/cgroup/toolbox/{agent_id}/:

  • memory.max — memory limit in bytes
  • cpu.max — CPU quota (percentage × 1000 microseconds per 100ms period)
  • pids.max — maximum number of processes

Toolbox Manifest

The toolbox.yaml manifest format adds a defaults section for per-invocation limits. The old workspace section is retained for backward-compatible local-mode fallback but is ignored when a remote container is available.

name: devtools
description: "Dev toolbox with CLI tools, coding agents, and AI utilities"

defaults:
  timeout_sec: 120
  max_output_bytes: 131072

workspace:            # used only for local-mode fallback
  root: ..
  shell: /bin/bash
  path_add: [scripts/cli-link, .venv/bin]
  env_file: .env

tools:
  - name: multimodal-cli
    binary: multimodal-cli
    description: ...
    usage: ...

Toolbox Dockerfile

The devtools Dockerfile (toolbox/devtools/Dockerfile) installs all toolchains globally so every agent user has access:

  • Base: debian:bookworm-slim with chromium, util-linux, build-essential
  • Toolchain: mise installed to /opt/toolchain/, managing Go, Python, Node, pnpm, claude-code; GitHub CLI (gh) is installed separately as the official .deb to /usr/bin/gh so it is always on the sandbox PATH
  • pip packages: Installed to /opt/toolchain/python/
  • Groups: agents group created for per-agent users
  • Fuse config: user_allow_other added to /etc/fuse.conf for gcsfuse
  • Mount target: /workspace created for bind mounts
  • toolbox-server runs as root (required for mount namespace operations and user creation)

Ghost Engine Changes

sandbox.Runner interface. Both local sandbox.Executor and the RemoteExecutor implement a shared interface. The cli and browser tools accept sandbox.Runner so they can use either local or remote execution.

type Runner interface {
    Run(ctx context.Context, command string) (*Result, error)
    Shutdown()
}

RemoteExecutor (internal/toolbox/remote.go). HTTP client that sends POST /exec requests to the toolbox container. Implements sandbox.Runner. Also supports POST /exec-stream for streaming output.

RemoteWorkspaceClient (internal/toolbox/remote.go). HTTP client for POST /workspace/read and POST /workspace/write endpoints. Used by workspace.read and workspace.write tools when devtools is reachable.

Dual-mode loader (internal/toolbox/loader.go). On load, tries to reach http://toolbox-{name}:9090/healthz. If reachable, creates a RemoteExecutor. If not (local dev, no Docker), falls back to building a local sandbox.Executor from the manifest's workspace section.

Tool wiring (internal/control/control.go). At agent boot:

  1. Probes devtools at localhost:9090 and toolbox-devtools:9090
  2. If reachable, creates RemoteWorkspaceClient for workspace.read/workspace.write and RemoteExecutor for tool.cli/tool.browser
  3. Falls back to local filesystem access and local sandbox when devtools is unavailable

Data Flow

Command Execution (tool.cli, tool.browser, toolbox tools)

LLM tool call
    ↓
Agent → CLITool.Execute(ctx, {command: "npm install"})
    ↓
CLITool → executor.Run(ctx, "npm install")
    ↓  (executor is RemoteExecutor targeting devtools)
RemoteExecutor → POST http://toolbox-devtools:9090/exec
    { command, env: {AGENT_ID}, cgroup: {memory_mb: 512} }
    ↓
toolbox-server:
    1. getOrCreate agent user (gis-a1b2c3d4, uid=12345)
    2. unshare(CLONE_NEWNS)
    3. mount --bind /mnt/workspace/{agent_id} /workspace
    4. umount -l /mnt/workspace
    5. addPIDToCgroup(agent_id, pid)
    6. setpriv --reuid=12345 -- bash -c "npm install"
    ↓
stdout/stderr/exit_code/artifacts[] → JSON response → sandbox.Result → ToolResult → LLM

File I/O (workspace.read, workspace.write)

LLM tool call
    ↓
Agent → ReadFileTool.Execute(ctx, {path: "repos/site/index.html"})
    ↓
RemoteWorkspaceClient → POST http://toolbox-devtools:9090/workspace/read
    { agent_id: "...", path: "repos/site/index.html" }
    ↓
toolbox-server (root, no namespace):
    1. Resolve: /mnt/workspace/{agent_id}/repos/site/index.html
    2. Validate path doesn't escape workspace
    3. Read file, return content
    ↓
JSON response → ToolResult → LLM

Per-Agent Workspace Isolation

The devtools container mounts a PVC at /mnt/workspace. Each agent gets a directory at /mnt/workspace/{agent_id}/.

For every /exec or /exec-stream request, toolbox-server:

  1. Creates (or reuses) a Linux user for the agent with a deterministic UID
  2. Creates a mount namespace via unshare --mount
  3. Bind-mounts the agent's workspace to /workspace inside the namespace
  4. Unmounts /mnt/workspace (lazy) so other agents are invisible
  5. Drops privileges to the agent user via setpriv
  6. Optionally places the process into a per-agent cgroup

The agent process sees only:

  • /workspace — their own files (read-write)
  • /mnt/shared-readonly — shared resources (read-only, inherited gcsfuse mount)
  • System directories — /usr, /bin, /lib, /opt/toolchain

Artifact Detection

The POST /exec request accepts an optional artifact_dir field. When set, toolbox-server:

  1. Snapshots the directory listing before execution (using host path)
  2. Runs the command in the agent's mount namespace
  3. Diffs the directory to find newly created files
  4. Returns new files as artifacts in the response — each with path, size, and mime_type

The RemoteExecutor sets artifact_dir to /mnt/workspace/{agent_id}/ automatically (the host path, not the namespace path).

Sidecar Startup Ordering

In Kubernetes, ghost and toolbox-devtools run as sidecar containers in the same pod. Since containers start concurrently with no ordering guarantee, the shared docker/entrypoint.sh supports a WAIT_FOR_SIDECAR environment variable:

WAIT_FOR_SIDECAR=localhost:9090

When set, the entrypoint polls http://{host}:{port}/healthz until it returns 200 before starting the main process. This eliminates the startup race where ghost boots agents before toolbox-devtools is ready.

Configure WAIT_FOR_SIDECAR=localhost:{port} on the ghost container when toolboxDevtools.enabled is true so ghost waits for the sidecar before boot.

Timeout defaults to 60 seconds and can be overridden with WAIT_FOR_SIDECAR_TIMEOUT.

Require Remote Devtools

When REQUIRE_REMOTE_DEVTOOLS=true is set on the ghost container (typically alongside the devtools sidecar when toolboxDevtools.enabled is true), the control plane will never fall back to local mode for cli, browser, workspace.read, or workspace.write tools. If the devtools sidecar is unreachable after retries, these tools are registered as unavailable stubs that return a clear error message instead of silently producing broken local-path results.

This prevents a class of bugs where the ghost container (which has no /mnt/workspace or /mnt/shared-readonly mounts) attempts local file operations and produces confusing path errors.

The probe retries 3 times with 2-second intervals when REQUIRE_REMOTE_DEVTOOLS is set.

Local Development

Local dev always uses docker-compose (ghost + devtools containers). In docker-compose, workspace uses a Docker named volume (ext4). The mount namespace isolation works identically in both environments.

In docker-compose, WAIT_FOR_SIDECAR and REQUIRE_REMOTE_DEVTOOLS are not set by default. Ghost resolves the devtools container via Docker DNS (toolbox-devtools:9090). If devtools is unreachable (local dev without containers), tools fall back to local sandbox mode using the manifest's workspace section.