|
| 1 | +# Spike: non-Docker sandbox mechanism for snap-docker Ubuntu 24.04 (MCP-3232) |
| 2 | + |
| 3 | +**Status:** recommendation · gates [MCP-34](../../) (Non-Docker isolation mode) · resolves design decisions **D2** and **D3** in the [MCP-34 plan](../../). |
| 4 | +**Author:** BackendEngineer · **PoC:** `internal/sandbox/` (this branch). |
| 5 | + |
| 6 | +## TL;DR |
| 7 | + |
| 8 | +Use the **Linux Landlock LSM** (kernel 5.13+) for the writable-scope filesystem |
| 9 | +allowlist, plus **`setrlimit`** for resource caps, plus **best-effort |
| 10 | +`SysProcAttr.Credential{Uid,Gid}`** for uid/gid drop. **Deprioritize user |
| 11 | +namespaces / bubblewrap** — they are blocked by default on the exact hosts we |
| 12 | +target. This matches the plan's D2 assumption and the spike confirms it with a |
| 13 | +working PoC. |
| 14 | + |
| 15 | +The PoC (`internal/sandbox`) proves the load-bearing claim from a Go process: |
| 16 | +Landlock **denies a path outside the allowlist, permits the allowlisted |
| 17 | +read-write subtree, and preserves raw stdin/stdout JSON-RPC framing** — all |
| 18 | +without user namespaces, so it is unaffected by |
| 19 | +`kernel.apparmor_restrict_unprivileged_userns=1`. |
| 20 | + |
| 21 | +## Why Docker fails on these hosts (reproduction target) |
| 22 | + |
| 23 | +On Ubuntu where Docker is installed via **snap**, AppArmor's profile transition |
| 24 | +fights the security flags the scanner sandbox requires |
| 25 | +(`--security-opt no-new-privileges` + a pinned AppArmor profile), so in-container |
| 26 | +commands fail with *operation not permitted* |
| 27 | +(`MCPX_DOCKER_SNAP_APPARMOR`, GH #71; the related systemd/snap-confine variant is |
| 28 | +already detected by `cmd/mcpproxy/doctor_env_snapdocker.go`, repo issue #457). |
| 29 | +The escapes today — remove snap docker / disable scanner / disable isolation — |
| 30 | +are all adoption blockers. We need an isolation path that does **not** depend on |
| 31 | +Docker or on a primitive AppArmor blocks. |
| 32 | + |
| 33 | +## Candidate mechanisms compared |
| 34 | + |
| 35 | +| Mechanism | Unprivileged? | Blocked by Ubuntu 24.04 AppArmor userns restriction? | FS write-allowlist | rlimits | uid/gid drop | Verdict | |
| 36 | +|---|---|---|---|---|---|---| |
| 37 | +| **Landlock LSM** (5.13+) | ✅ yes | ❌ **no** — needs no userns | ✅ path-beneath allowlist | n/a (pair with setrlimit) | ❌ no (orthogonal) | **Chosen** | |
| 38 | +| **user namespaces / bubblewrap** | ✅ yes (in principle) | ⚠️ **yes by default** — `apparmor_restrict_unprivileged_userns=1` blocks `unshare(CLONE_NEWUSER)` unless a per-binary AppArmor profile grants `userns` | ✅ via bind mounts | ✅ | ✅ (maps uid in the ns) | **Deprioritized** | |
| 39 | +| **`setpriv` + `setrlimit` only** | ✅ yes | ❌ no | ❌ none | ✅ | ❌ no (needs CAP_SETUID) | **Floor / fallback** | |
| 40 | + |
| 41 | +### Landlock — chosen (resolves D2) |
| 42 | + |
| 43 | +- **Unprivileged and userns-free.** Landlock confines the calling thread/process |
| 44 | + via three syscalls (`landlock_create_ruleset`, `landlock_add_rule`, |
| 45 | + `landlock_restrict_self`); it requires **no** user or mount namespace, so the |
| 46 | + Ubuntu 23.10+/24.04 `apparmor_restrict_unprivileged_userns=1` default — which |
| 47 | + is exactly what breaks bubblewrap on our target hosts — **does not apply**. |
| 48 | + Confirmed by Chromium's and Ubuntu's own guidance (sources below). |
| 49 | +- **Inherited across `exec`.** A Landlock domain is preserved across `execve` |
| 50 | + and applied to every descendant; a child can only *further* restrict itself, |
| 51 | + never escape. That makes the integration a tiny **re-exec wrapper**: lock the |
| 52 | + OS thread, `Apply()` the ruleset, then `exec` the untrusted `npx`/`uvx` |
| 53 | + command. The proxy keeps owning the child's raw stdin/stdout pipes (D1: native |
| 54 | + launcher, not `process-compose`). |
| 55 | +- **Best-effort across kernels.** `landlock_create_ruleset(NULL,0,VERSION)` |
| 56 | + reports the supported ABI; we mask the handled access-rights down to that ABI |
| 57 | + so the same binary degrades cleanly from 6.10 (ABI 5) to 5.13 (ABI 1). |
| 58 | + Ubuntu 24.04 ships kernel 6.8 → **ABI 4** (adds TCP bind/connect; FS rights |
| 59 | + fully covered). `internal/sandbox/sandbox_linux.go:handledAccessFS`. |
| 60 | +- **No new dependency.** `golang.org/x/sys/unix v0.46` (already a direct |
| 61 | + dependency) ships the `SYS_LANDLOCK_*` numbers and `LandlockRulesetAttr` / |
| 62 | + `LandlockPathBeneathAttr` types. The PoC calls the raw syscalls — satisfies |
| 63 | + the repo's "avoid new dependencies" rule. (For the full build, the maintained |
| 64 | + `github.com/landlock-lsm/go-landlock` library — which also solves Go's |
| 65 | + multi-thread `restrict_self` caveat — is a reasonable alternative; the re-exec |
| 66 | + wrapper sidesteps that caveat by `exec`-ing a single-threaded image |
| 67 | + immediately after `Apply`.) |
| 68 | + |
| 69 | +### user namespaces / bubblewrap — deprioritized (confirms D2) |
| 70 | + |
| 71 | +Bubblewrap builds its sandbox with `unshare(CLONE_NEWUSER)`. Ubuntu 23.10+ |
| 72 | +sets `kernel.apparmor_restrict_unprivileged_userns=1` by default, which **blocks |
| 73 | +unprivileged userns creation unless the program has an AppArmor profile granting |
| 74 | +the `userns` permission**. bubblewrap ships such a profile in recent Ubuntu, but |
| 75 | +a *custom Go binary* spawning userns would be denied on 24.04 out of the box — |
| 76 | +i.e. the userns-first design risks being blocked on the very hosts we target. |
| 77 | +This is the same failure class that breaks Docker-snap; choosing it would trade |
| 78 | +one AppArmor block for another. Deprioritized. |
| 79 | + |
| 80 | +### setpriv + setrlimit only — the floor |
| 81 | + |
| 82 | +No filesystem allowlist at all — only resource caps and (with privilege) |
| 83 | +capability/uid drop. Useful as a graceful fallback when Landlock is unavailable |
| 84 | +(kernel < 5.13 or LSM disabled), but it does **not** meet the "writable-scope |
| 85 | +allowlist" exit criterion on its own. The PoC applies `setrlimit` independently |
| 86 | +of Landlock so this floor is always available. |
| 87 | + |
| 88 | +## Honest limits (must be documented — D2 caveat) |
| 89 | + |
| 90 | +- **No uid/gid separation without privilege.** Landlock restricts *paths*, not |
| 91 | + *identity*. The confined process runs as the **same uid** as mcpproxy; it can |
| 92 | + still touch anything that uid owns *within the allowlist*. Real uid/gid drop |
| 93 | + needs `SysProcAttr.Credential{Uid,Gid}`, which requires root / `CAP_SETUID` |
| 94 | + (server edition under systemd, not the unprivileged desktop case). **Do not |
| 95 | + overclaim Docker parity on uid/gid** for the unprivileged desktop case — set |
| 96 | + it best-effort and surface the limitation. |
| 97 | +- **Filesystem + (on ABI 4+) TCP only.** Landlock does not restrict arbitrary |
| 98 | + syscalls (that is seccomp), nor PID/IPC/network namespaces. A confined process |
| 99 | + can still see `/proc`, signal same-uid processes, and (below kernel 6.7) open |
| 100 | + arbitrary network sockets. Pair with seccomp + `setrlimit(RLIMIT_NPROC)` for |
| 101 | + defense-in-depth in a later iteration; out of scope for this spike. |
| 102 | +- **Allowlist must include the loader + interpreter.** `exec`-ing `npx`/`uvx` |
| 103 | + needs read+execute on the binary, its `node`/`python` runtime, and the shared |
| 104 | + libraries (`/usr`, `/lib`, `/lib64`, …). The launcher must compute and grant |
| 105 | + these RO paths or the child fails to start. (The PoC test grants a generous |
| 106 | + system RO set to demonstrate this.) |
| 107 | + |
| 108 | +## What the PoC proves vs. what still needs the host |
| 109 | + |
| 110 | +**Proven by `internal/sandbox` (runs in CI on `ubuntu-latest` = Ubuntu 24.04 — |
| 111 | +see `.github/workflows/unit-tests.yml`, `go test -race ./...`):** |
| 112 | + |
| 113 | +- `TestLandlockEnforcesFilesystemAllowlist` — re-execs a confined child that |
| 114 | + (1) echoes stdin→stdout (JSON-RPC framing survives), (2) reads+writes inside |
| 115 | + the RW allowlist, (3) is **denied** a secret path outside it. Exit-code |
| 116 | + assertions; skips gracefully if the kernel lacks Landlock. |
| 117 | +- `TestHandledAccessFSMasksByABI` — ABI down-masking is correct. |
| 118 | +- Cross-platform stub (`sandbox_other.go`) keeps macOS/Windows building with a |
| 119 | + documented no-op / fail-closed `ErrUnsupported`. |
| 120 | + |
| 121 | +**Still requires a real snap-docker Ubuntu 24.04 host (deferred to MCP-34 |
| 122 | +child issues #3/#4, where the spawn branch lands):** |
| 123 | + |
| 124 | +- End-to-end launch of an actual `npx` and `uvx` MCP server under the wrapper |
| 125 | + (the PoC proves the *primitive* + passthrough; the server-specific RO |
| 126 | + allowlist tuning is launcher work). |
| 127 | +- Reproducing the `MCPX_DOCKER_SNAP_APPARMOR` Docker failure side-by-side to |
| 128 | + show the Landlock path succeeds where Docker-snap fails. (By construction |
| 129 | + Landlock is unaffected by the AppArmor userns restriction, so it is expected |
| 130 | + to work; this is the empirical confirmation step.) |
| 131 | + |
| 132 | +## Recommendation for the D3 scanner question |
| 133 | + |
| 134 | +The scanner *plugin* runtime is Docker-based (Spec 039) and is the broken path |
| 135 | +on snap-docker hosts. **Recommend D3 option (b): clean, surfaced degradation** — |
| 136 | +run isolated stdio servers under the Landlock `sandbox` launcher, and when |
| 137 | +`isolation.mode: sandbox` is active on a host where the Docker scanner cannot |
| 138 | +run, **skip the Docker scanner pre-flight and surface a health-degraded warning** |
| 139 | +(via the unified `health` field + a `doctor` check, mirroring |
| 140 | +`doctor_env_snapdocker.go`). A native non-Docker scanner path (option a) is a |
| 141 | +larger effort and can follow once the sandbox launcher exists; degradation |
| 142 | +unblocks adoption now and is testable on the snap-docker host. Final call sits |
| 143 | +with the scanner child issue (MCP-34 #4). |
| 144 | + |
| 145 | +## Proposed integration shape (for MCP-34 #2/#3, not built here) |
| 146 | + |
| 147 | +- Config: `isolation.mode: "docker" | "sandbox" | "none"` (global + per-server), |
| 148 | + back-compat-mapped from today's `Enabled`/`DockerIsolation`. New |
| 149 | + `config.Config`/`ServerConfig` fields ⇒ register in |
| 150 | + `TestSaveServerSyncFieldCoverage` `expectedFields` and run `make swagger` |
| 151 | + (prior-art gotcha, memory). |
| 152 | +- Spawn: a fourth branch in `connectStdio` / `buildLauncherCmd` alongside the |
| 153 | + existing docker-isolation / user-`docker run` / shell-wrap branches. On Linux |
| 154 | + with `mode: sandbox`, route through a `mcpproxy sandbox-exec`-style re-exec |
| 155 | + wrapper that calls `sandbox.Apply(spec)` then `exec`s the resolved command; |
| 156 | + reuse the existing `SysProcAttr{Setpgid:true}` process-group cleanup |
| 157 | + (`process_unix.go`). macOS/Windows = documented no-op → effective `none`. |
| 158 | +- `Spec` is already shaped for this: `ReadOnlyPaths` (loader/runtime/binary), |
| 159 | + `ReadWritePaths` (working dir, cache, `/tmp` scope), `Rlimits`, `BestEffort`. |
| 160 | + |
| 161 | +## Sources |
| 162 | + |
| 163 | +- Linux kernel — Landlock (no userns required; ABI versions): |
| 164 | + https://docs.kernel.org/userspace-api/landlock.html |
| 165 | +- Ubuntu — Restricted unprivileged user namespaces (default `=1` since 23.10): |
| 166 | + https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged-user-namespaces |
| 167 | +- Chromium docs — AppArmor userns restrictions vs. Landlock fallback (Landlock |
| 168 | + works where bwrap/userns is blocked): |
| 169 | + https://chromium.googlesource.com/chromium/src/+/main/docs/security/apparmor-userns-restrictions.md |
| 170 | +- bubblewrap blocked on Ubuntu 24.04 by AppArmor userns restriction: |
| 171 | + https://github.com/microsoft/vscode/issues/316046 |
| 172 | +- go-landlock (library + multi-thread `restrict_self` caveat + `landlock-restrict` |
| 173 | + re-exec example): https://github.com/landlock-lsm/go-landlock |
| 174 | +- Repo: `golang.org/x/sys/unix v0.46` Landlock primitives (`go.mod`); |
| 175 | + snap-docker detection `cmd/mcpproxy/doctor_env_snapdocker.go` (issue #457); |
| 176 | + MCP-34 plan decisions D1–D3 (Paperclip plan doc). |
| 177 | +``` |
0 commit comments