|
| 1 | +# Sandbox-First Redesign Specification |
| 2 | + |
| 3 | +This document is a concrete, implementation-ready specification for rebuilding the execution engine around **sandbox-first primitives**. The goal is to replace the current “exec + ulimit” runtime with a Linux namespace + cgroup based sandbox runner while preserving the existing controller/runner flow. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +### Primary |
| 10 | +- **Hard isolation**: separate PID, mount, UTS, and user namespaces for each run. |
| 11 | +- **Deterministic resource limits**: enforce CPU, memory, and PIDs using cgroups. |
| 12 | +- **Minimal host exposure**: run with a scratch filesystem; no network by default. |
| 13 | +- **API compatibility**: keep the existing CodeRunner + Controller flow intact. |
| 14 | + |
| 15 | +### Non-goals (for the initial POC) |
| 16 | +- Full container image support (e.g., OCI images). |
| 17 | +- Multi-node scheduling. |
| 18 | +- Network policy enforcement beyond “no network”. |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Proposed Architecture (Sandbox-First) |
| 23 | + |
| 24 | +### 1) New Interfaces |
| 25 | +Create a new sandbox runtime module that becomes the primary execution abstraction. |
| 26 | + |
| 27 | +```go |
| 28 | +// engine/sandbox/types.go |
| 29 | +package sandbox |
| 30 | + |
| 31 | +type SandboxPolicy struct { |
| 32 | + CpuCores int // cpu quota or shares |
| 33 | + MemoryBytes int64 // memory limit |
| 34 | + PidsMax int // max processes |
| 35 | + TimeoutSec int // wall-clock timeout |
| 36 | + EnableNet bool // default: false |
| 37 | + ReadonlyRoot bool // default: true |
| 38 | +} |
| 39 | + |
| 40 | +type SandboxInput struct { |
| 41 | + SourceFiles map[string][]byte // filename -> contents |
| 42 | + WorkDir string |
| 43 | + Command []string |
| 44 | +} |
| 45 | + |
| 46 | +type SandboxOutput struct { |
| 47 | + Stdout string |
| 48 | + Stderr string |
| 49 | + ExitCode int |
| 50 | +} |
| 51 | + |
| 52 | +type SandboxRunner interface { |
| 53 | + Run(input SandboxInput, policy SandboxPolicy) (*SandboxOutput, error) |
| 54 | +} |
| 55 | +``` |
| 56 | + |
| 57 | +### 2) Linux Sandbox Implementation |
| 58 | +Add a `linux` sandbox implementation using namespaces, cgroups, and seccomp (phase 2). |
| 59 | + |
| 60 | +``` |
| 61 | +engine/sandbox/ |
| 62 | + linux/ |
| 63 | + runner.go |
| 64 | + namespaces.go |
| 65 | + cgroups.go |
| 66 | + filesystem.go |
| 67 | + seccomp.go (phase 2) |
| 68 | +``` |
| 69 | + |
| 70 | +### 3) Integration Points |
| 71 | +- **CodeRunner** builds compile/run commands and passes them to the sandbox runner. |
| 72 | +- **Controller** remains the concurrency + scheduling layer. |
| 73 | +- **RuntimeAgent** becomes a thin wrapper around `SandboxRunner`. |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## Implementation Checklist (Actionable) |
| 78 | + |
| 79 | +### Phase 0 — Repo scaffolding |
| 80 | +- [ ] Create `engine/sandbox` module with types and interfaces. |
| 81 | +- [ ] Add a `linux` subpackage for namespace + cgroup implementation. |
| 82 | +- [ ] Add basic logging utilities (reusing existing `util/print`). |
| 83 | + |
| 84 | +**Success criteria** |
| 85 | +- `go test ./...` succeeds with the new module added. |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +### Phase 1 — Linux namespace runner (no cgroups yet) |
| 90 | +- [ ] Implement `unshare` / `clone` logic (PID + mount + UTS + IPC). |
| 91 | +- [ ] Ensure child process runs with isolated namespace context. |
| 92 | +- [ ] Mount a tmpfs or scratch directory as root; bind-mount language runtimes as needed. |
| 93 | +- [ ] Disable network by default (unshare network namespace, no interfaces). |
| 94 | +- [ ] Route stdout/stderr to parent for capture. |
| 95 | + |
| 96 | +**Success criteria** |
| 97 | +- Running a sandboxed command cannot see host processes (`ps` shows only itself). |
| 98 | +- Running inside sandbox cannot access `/etc/shadow` or host filesystem. |
| 99 | +- No outbound network unless explicitly enabled. |
| 100 | + |
| 101 | +**Testing criteria** |
| 102 | +- ✅ `go test ./engine/sandbox/linux -run TestNamespaces` |
| 103 | +- ✅ Integration test that runs `ls /` inside sandbox and confirms minimal FS. |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +### Phase 2 — Cgroup enforcement |
| 108 | +- [ ] Implement `cgroups.go` with CPU, memory, and pids cgroup setup (v2 preferred). |
| 109 | +- [ ] Apply resource limits before running the child process. |
| 110 | +- [ ] Ensure subprocess trees are restricted. |
| 111 | + |
| 112 | +**Success criteria** |
| 113 | +- CPU-bound infinite loop is throttled and/or killed by cgroup limits. |
| 114 | +- Memory exhaustion triggers OOM kill within sandbox, not host. |
| 115 | +- Fork bomb fails with `pids.max` restriction. |
| 116 | + |
| 117 | +**Testing criteria** |
| 118 | +- ✅ `go test ./engine/sandbox/linux -run TestCgroupLimits` |
| 119 | +- ✅ Manual: run a fork bomb script and ensure it terminates without host impact. |
| 120 | + |
| 121 | +--- |
| 122 | + |
| 123 | +### Phase 3 — Wire into CodeRunner |
| 124 | +- [ ] Replace `RuntimeAgent.SafeRunCmd` usage with `SandboxRunner.Run`. |
| 125 | +- [ ] Convert `RunnerProps` into `SandboxInput`. |
| 126 | +- [ ] Maintain compile → run flow (compile step also sandboxed). |
| 127 | + |
| 128 | +**Success criteria** |
| 129 | +- Existing API calls still return stdout/stderr/errors correctly. |
| 130 | +- All compile/run languages still work (python, node, go, etc.). |
| 131 | + |
| 132 | +**Testing criteria** |
| 133 | +- ✅ `go test ./engine/coderunner/v2 -run TestRunner` |
| 134 | +- ✅ End-to-end: CLI invocation executes Python and C++ in sandbox. |
| 135 | + |
| 136 | +--- |
| 137 | + |
| 138 | +### Phase 4 — Optional (Security hardening) |
| 139 | +- [ ] Add seccomp allowlist for syscalls. |
| 140 | +- [ ] Drop all Linux capabilities. |
| 141 | +- [ ] Set no-new-privileges. |
| 142 | + |
| 143 | +**Success criteria** |
| 144 | +- Common languages still run with restricted syscall profile. |
| 145 | +- Obvious privileged syscalls fail inside sandbox. |
| 146 | + |
| 147 | +**Testing criteria** |
| 148 | +- ✅ `go test ./engine/sandbox/linux -run TestSeccomp` |
| 149 | + |
| 150 | +--- |
| 151 | + |
| 152 | +## Detailed Implementation Notes |
| 153 | + |
| 154 | +### Namespaces |
| 155 | +Use `clone`/`unshare` with: |
| 156 | +- `CLONE_NEWPID` |
| 157 | +- `CLONE_NEWNS` |
| 158 | +- `CLONE_NEWUTS` |
| 159 | +- `CLONE_NEWIPC` |
| 160 | +- `CLONE_NEWNET` |
| 161 | + |
| 162 | +Set up mount namespace with: |
| 163 | +- `mount("tmpfs", "/", "tmpfs", 0, "")` |
| 164 | +- Bind-mount required runtime paths (`/usr/bin/python3`, `/lib`, `/lib64`, etc.) |
| 165 | + |
| 166 | +### Filesystem |
| 167 | +- Create per-job work directory (e.g., `/tmp/sandbox/<job-id>`). |
| 168 | +- Bind-mount that directory as `/work` inside sandbox. |
| 169 | +- Optionally use read-only root with overlayfs if needed. |
| 170 | + |
| 171 | +### Cgroups v2 |
| 172 | +- Create cgroup per job under `/sys/fs/cgroup/sandbox/<job-id>`. |
| 173 | +- Set `memory.max`, `cpu.max`, `pids.max`. |
| 174 | +- Move child PID to cgroup. |
| 175 | + |
| 176 | +### Execution Model |
| 177 | +- Parent sets up sandbox environment. |
| 178 | +- Child executes `execve` within namespace and cgroup. |
| 179 | +- Parent captures stdout/stderr and enforces timeout. |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## Suggested Package Layout |
| 184 | +``` |
| 185 | +engine/ |
| 186 | + sandbox/ |
| 187 | + types.go |
| 188 | + linux/ |
| 189 | + runner.go |
| 190 | + namespaces.go |
| 191 | + filesystem.go |
| 192 | + cgroups.go |
| 193 | + seccomp.go (optional) |
| 194 | +``` |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## Example POC User Flow |
| 199 | +1. CodeRunner receives request. |
| 200 | +2. CodeRunner creates `SandboxInput` and `SandboxPolicy`. |
| 201 | +3. SandboxRunner sets namespaces + cgroups. |
| 202 | +4. Sandbox executes compile and run steps. |
| 203 | +5. Output returned to API. |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## Acceptance Checklist (Final) |
| 208 | +- [ ] `SandboxRunner` interface defined and used. |
| 209 | +- [ ] Linux namespace runner implemented. |
| 210 | +- [ ] Cgroup limits enforced with v2. |
| 211 | +- [ ] CodeRunner integrated and working. |
| 212 | +- [ ] Tests cover namespace isolation + cgroup enforcement. |
| 213 | +- [ ] No network by default. |
| 214 | + |
| 215 | +--- |
| 216 | + |
| 217 | +## Why this design fits your current repo |
| 218 | +- Keeps your existing controller + runner architecture intact. |
| 219 | +- Isolates the new sandbox logic into a dedicated module. |
| 220 | +- Lets you iteratively upgrade security without rewriting everything. |
| 221 | + |
| 222 | +--- |
| 223 | + |
| 224 | +## Next Steps (Recommended) |
| 225 | +1. Implement Phase 0 + Phase 1. |
| 226 | +2. Run minimal compile/run tests with Python + C++. |
| 227 | +3. Add cgroup limits and validate with load tests. |
| 228 | +4. Only then add seccomp and other hardening. |
| 229 | + |
| 230 | +--- |
| 231 | + |
| 232 | +## Success Definition |
| 233 | +The rewrite is successful if: |
| 234 | +- User code runs in a hardened sandbox with **no host FS/network visibility**. |
| 235 | +- Resource limits are enforced at the kernel level (cgroups + namespaces). |
| 236 | +- The public API behavior remains unchanged. |
| 237 | + |
0 commit comments