Skip to content

Commit e98c80a

Browse files
feat(upstream): native sandbox launcher (Landlock + rlimits) for stdio servers (MCP-34.3) (#768)
* feat(upstream): native sandbox launcher (Landlock + rlimits) for stdio servers (MCP-34.3) Implements the `sandbox` spawn branch alongside docker/plain in connectStdio and buildLauncherCmd, using the Landlock mechanism proven by the MCP-34.1 spike and wired into the mode-aware selector from MCP-34.2. Mechanism: a tiny re-exec wrapper. mcpproxy launches itself as `mcpproxy __sandbox_exec -- <command> [args...]` with the confinement Spec encoded in the environment; the hidden child applies Landlock + setrlimit, does a best-effort uid/gid drop, then execs the real command — so confinement is inherited across execve and the server's stdin/stdout pass straight through with no mux. Reuses the existing SysProcAttr{Setpgid} process-group cleanup. - internal/sandbox: WrapCommand/SpecFromEnv (cross-platform), RunChild (unix + non-unix guard), Available() honest-diagnostic helper. - internal/upstream/core: wrapWithSandbox + buildSandboxSpec (filesystem WRITE allowlist: RO "/", RW working_dir + temp + package caches; RLIMIT_CORE=0 + NOFILE; BestEffort graceful degrade). Wired into both stdio + launcher plain branches behind ResolveMode == sandbox. - cmd/mcpproxy: hidden __sandbox_exec subcommand. - Graceful fallback: non-Linux / Landlock-less kernels degrade to unconfined (effective "none") with a logged diagnostic; fail-closed when BestEffort=false. - docs/features/sandbox-isolation.md documents enforcement, honest limits (no read confinement, uid/gid no-op without root), and platform support. Tests (red→green): WrapCommand argv/env round-trip, SpecFromEnv absent/malformed, buildSandboxSpec write-allowlist defaults; Linux end-to-end (cmd construction, RLIMIT_NOFILE application, write-outside-allowlist denied, stdin/stdout passthrough) + fail-closed refuses to run unconfined. Landlock enforcement runs on ubuntu-latest CI (24.04), mirroring the MCP-34.1 spike's empirical proof. Co-Authored-By: Paperclip <noreply@paperclip.ing> * docs(sandbox): fix broken Docusaurus links on sandbox-isolation page The new docs/features/sandbox-isolation.md linked to ../docker-isolation.md (resolving to the unrouted /docker-isolation.md) and a relative spike path, breaking the Docusaurus `build` (onBrokenLinks: throw) + Cloudflare Pages checks on PR #768. Point both "See also" links at the canonical published routes (/features/docker-isolation, /development/sandbox-spike-mcp-34), matching the pattern sibling pages use. Verified locally: `npm run build` succeeds with zero broken links. Co-Authored-By: Paperclip <noreply@paperclip.ing> --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
1 parent 9702260 commit e98c80a

17 files changed

Lines changed: 732 additions & 0 deletions

cmd/mcpproxy/main.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,7 @@ func main() {
190190

191191
// Add commands to root
192192
rootCmd.AddCommand(serverCmd)
193+
rootCmd.AddCommand(newSandboxExecCommand())
193194
rootCmd.AddCommand(searchCmd)
194195
rootCmd.AddCommand(GetRegistryCommand())
195196
rootCmd.AddCommand(toolsCmd)

cmd/mcpproxy/sandbox_exec_cmd.go

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
package main
2+
3+
import (
4+
"os"
5+
6+
"github.com/spf13/cobra"
7+
8+
"github.com/smart-mcp-proxy/mcpproxy-go/internal/sandbox"
9+
)
10+
11+
// newSandboxExecCommand returns the hidden `__sandbox_exec` subcommand (MCP-34.3).
12+
//
13+
// mcpproxy re-executes itself as `mcpproxy __sandbox_exec -- <command> [args...]`
14+
// to launch a stdio MCP server under native sandbox isolation: the child applies
15+
// Landlock + rlimits confinement (encoded in the environment by the upstream
16+
// launcher) and then execs the real command, replacing this process so stdin/
17+
// stdout pass straight through. It is not meant to be invoked by users directly.
18+
func newSandboxExecCommand() *cobra.Command {
19+
return &cobra.Command{
20+
Use: sandbox.Subcommand + " -- command [args...]",
21+
Short: "internal: re-exec wrapper applying native sandbox confinement (do not call directly)",
22+
Hidden: true,
23+
// Pass everything after the subcommand through untouched — the wrapped
24+
// command has its own flags we must not interpret.
25+
DisableFlagParsing: true,
26+
Run: func(_ *cobra.Command, args []string) {
27+
// cobra may retain the "--" separator; drop a single leading one.
28+
if len(args) > 0 && args[0] == "--" {
29+
args = args[1:]
30+
}
31+
os.Exit(sandbox.RunChild(args, os.Stderr))
32+
},
33+
}
34+
}

docs/features/sandbox-isolation.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Native Sandbox Isolation (Linux, no Docker)
2+
3+
MCPProxy can isolate a stdio MCP server **without Docker** using the Linux
4+
[Landlock LSM](https://docs.kernel.org/userspace-api/landlock.html) plus resource
5+
limits (`setrlimit`). This is the `sandbox` isolation mode (MCP-34), built for
6+
hosts where Docker is unavailable or broken — notably Ubuntu 24.04 with
7+
snap-installed Docker under AppArmor. Unlike bubblewrap / user-namespace
8+
sandboxes, Landlock does **not** require unprivileged user namespaces, so it is
9+
not blocked by `kernel.apparmor_restrict_unprivileged_userns=1` (default on
10+
Ubuntu 23.10+).
11+
12+
## Enabling
13+
14+
Set the isolation **mode** to `sandbox`, globally or per server:
15+
16+
```json
17+
{
18+
"docker_isolation": { "mode": "sandbox" },
19+
"mcpServers": [
20+
{ "name": "obsidian", "command": "uvx", "args": ["obsidian-mcp"],
21+
"isolation": { "mode": "sandbox" } }
22+
]
23+
}
24+
```
25+
26+
A per-server `isolation.mode` wins over the global mode. The legacy
27+
`docker_isolation.enabled` boolean still maps to `docker`/`none` for
28+
back-compat; `mode` supersedes it (see MCP-34.2).
29+
30+
## What it enforces
31+
32+
- **Filesystem write allowlist.** Reads stay broad (the runtime can load
33+
interpreters, `node_modules`, and `site-packages` from anywhere), but **writes**
34+
are denied outside a small allowlist: the server's `working_dir`, the OS temp
35+
dir, and the common package caches (`~/.npm`, `~/.cache`, `~/.local/share`).
36+
Tightening reads is deferred — a read allowlist breaks tool discovery.
37+
- **Resource limits.** Core dumps are disabled (`RLIMIT_CORE=0`, so in-memory
38+
secrets can't spill to disk) and the descriptor table is capped
39+
(`RLIMIT_NOFILE`).
40+
- **Process-group cleanup.** Reuses the existing `Setpgid` group teardown, so a
41+
sandboxed server and its children are killed together on disconnect.
42+
43+
Confinement is applied by a tiny re-exec wrapper (`mcpproxy __sandbox_exec`) that
44+
calls Landlock/`setrlimit` and then `exec`s the real command — so the server's
45+
stdin/stdout pass straight through with no intervening multiplexer.
46+
47+
## Honest limits
48+
49+
- **No uid/gid separation by default.** The wrapper performs a *best-effort*
50+
privilege drop only when it is running as root with a non-root real user. In
51+
the personal edition mcpproxy runs as your user (not root), so this is a
52+
documented no-op — real uid/gid separation needs root or `CAP_SETUID`.
53+
- **Reads are not confined** (write-allowlist only; see above).
54+
- **Graceful degrade.** If the kernel lacks Landlock (pre-5.13, or LSM disabled),
55+
the server still starts but runs **unconfined**, and the host log records a
56+
`DEGRADED/unconfined` diagnostic. This favors availability; use `docker` mode
57+
when you need a hard guarantee.
58+
59+
## Platform support
60+
61+
| Platform | `mode: sandbox` behavior |
62+
|----------|--------------------------|
63+
| **Linux** (kernel 5.13+ with Landlock) | Enforced: write-allowlist + rlimits |
64+
| **Linux** (no Landlock) | Degraded → runs unconfined, logged |
65+
| **macOS / Windows** | Documented **no-op** → effective `none` (Landlock is Linux-only) |
66+
67+
See also: [Docker Isolation](/features/docker-isolation) for the Docker mode, and
68+
the [non-Docker sandbox spike](/development/sandbox-spike-mcp-34) for the
69+
mechanism evaluation.

internal/sandbox/runchild_other.go

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
//go:build !unix
2+
3+
package sandbox
4+
5+
import (
6+
"fmt"
7+
"io"
8+
)
9+
10+
// RunChild is unsupported on non-unix platforms: there is no execve to replace
11+
// the wrapper image. mcpproxy never builds the sandbox re-exec wrapper on these
12+
// platforms (the launcher resolves sandbox mode to a documented no-op), so this
13+
// only guards a misconfigured invocation.
14+
func RunChild(_ []string, diag io.Writer) int {
15+
if diag == nil {
16+
diag = io.Discard
17+
}
18+
fmt.Fprintln(diag, "sandbox: re-exec wrapper unsupported on this platform")
19+
return 2
20+
}

internal/sandbox/runchild_unix.go

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
//go:build unix
2+
3+
package sandbox
4+
5+
import (
6+
"fmt"
7+
"io"
8+
"os"
9+
"os/exec"
10+
"path/filepath"
11+
"syscall"
12+
)
13+
14+
// RunChild is the entrypoint for the hidden `__sandbox_exec` subcommand. argv is
15+
// the command line after the `--` separator: argv[0] is the program to run and
16+
// argv[1:] are its arguments. It decodes the Spec from EnvSpec, applies the
17+
// confinement, performs a best-effort uid/gid drop, then execs the target,
18+
// replacing this process image — so the untrusted server inherits the locked
19+
// Landlock domain and its stdin/stdout/stderr stay wired straight through to the
20+
// parent with no mux.
21+
//
22+
// On success it never returns (execve replaces the image). It returns a non-zero
23+
// exit code on any failure; diag receives human-readable confinement notes and
24+
// errors (os.Stderr in production, so they land in the per-server upstream log).
25+
func RunChild(argv []string, diag io.Writer) int {
26+
if diag == nil {
27+
diag = io.Discard
28+
}
29+
if len(argv) == 0 {
30+
fmt.Fprintln(diag, "sandbox: no command to exec")
31+
return 2
32+
}
33+
34+
spec, ok, err := SpecFromEnv()
35+
if err != nil {
36+
fmt.Fprintln(diag, err)
37+
return 2
38+
}
39+
if !ok {
40+
// The wrapper must never run a command unconfined just because the spec
41+
// went missing — that would silently defeat the isolation request.
42+
fmt.Fprintln(diag, "sandbox: missing spec env; refusing to run unconfined")
43+
return 2
44+
}
45+
46+
// Resolve the target before confinement so a bare command name can be looked
47+
// up on PATH while the filesystem is still fully visible.
48+
target := argv[0]
49+
if filepath.Base(target) == target {
50+
resolved, lerr := exec.LookPath(target)
51+
if lerr != nil {
52+
fmt.Fprintf(diag, "sandbox: lookup %q: %v\n", target, lerr)
53+
return 127
54+
}
55+
target = resolved
56+
}
57+
58+
rep, err := Apply(spec)
59+
if err != nil {
60+
// fail-closed: BestEffort was false and the primitive is unavailable.
61+
fmt.Fprintf(diag, "sandbox: confinement unavailable and fail-closed: %v\n", err)
62+
return 3
63+
}
64+
fmt.Fprintf(diag, "sandbox: %s\n", describeReport(rep))
65+
66+
dropPrivilegesBestEffort(diag)
67+
68+
if err := syscall.Exec(target, argv, os.Environ()); err != nil {
69+
fmt.Fprintf(diag, "sandbox: exec %q: %v\n", target, err)
70+
return 126
71+
}
72+
return 0 // unreachable: Exec replaced the image on success.
73+
}
74+
75+
// describeReport renders a one-line honest summary of what Apply enforced.
76+
func describeReport(rep Report) string {
77+
switch {
78+
case rep.LandlockABI >= 1:
79+
s := fmt.Sprintf("Landlock enforced (ABI %d), %d rlimit(s) set", rep.LandlockABI, rep.RlimitsSet)
80+
if rep.LandlockNote != "" {
81+
s += "; " + rep.LandlockNote
82+
}
83+
return s
84+
case rep.LandlockABI < 0:
85+
return fmt.Sprintf("running DEGRADED/unconfined — %s (%d rlimit(s) set)", rep.LandlockNote, rep.RlimitsSet)
86+
default:
87+
return fmt.Sprintf("%s (%d rlimit(s) set)", rep.LandlockNote, rep.RlimitsSet)
88+
}
89+
}
90+
91+
// dropPrivilegesBestEffort drops to the real uid/gid when the wrapper is running
92+
// as root with a non-root real user (e.g. a setuid/elevated launch). This is a
93+
// best-effort defense-in-depth step: in the personal edition mcpproxy runs as
94+
// the user, not root, so this is a documented no-op. A real, unconditional
95+
// privilege drop requires root/CAP_SETUID and an explicit target uid/gid, which
96+
// is out of scope here.
97+
func dropPrivilegesBestEffort(diag io.Writer) {
98+
euid, ruid := os.Geteuid(), os.Getuid()
99+
if euid != 0 || ruid == 0 {
100+
return // not privileged, or already the real user — nothing to drop.
101+
}
102+
rgid := os.Getgid()
103+
if err := syscall.Setgid(rgid); err != nil {
104+
fmt.Fprintf(diag, "sandbox: best-effort setgid(%d) failed: %v\n", rgid, err)
105+
return
106+
}
107+
if err := syscall.Setuid(ruid); err != nil {
108+
fmt.Fprintf(diag, "sandbox: best-effort setuid(%d) failed: %v\n", ruid, err)
109+
return
110+
}
111+
fmt.Fprintf(diag, "sandbox: dropped privileges to uid=%d gid=%d\n", ruid, rgid)
112+
}

internal/sandbox/sandbox_linux.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,15 @@ func handledAccessFS(abi int) uint64 {
5454
return h
5555
}
5656

57+
// Available reports whether the native sandbox primitive (Landlock) can be
58+
// enforced on this kernel right now. It lets callers log an honest diagnostic
59+
// ("sandbox requested but kernel lacks Landlock; running degraded") and lets
60+
// tests skip enforcement assertions on kernels without Landlock.
61+
func Available() bool {
62+
abi, err := landlockABI()
63+
return err == nil && abi >= 1
64+
}
65+
5766
// Apply confines the current process per spec. On success the calling process
5867
// — and every process it subsequently execs — can only touch the filesystem
5968
// subtrees in the allowlist, under the supplied rlimits. The restriction is

internal/sandbox/sandbox_other.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ package sandbox
1010
// Note: macOS/Windows already have their own first-class sandbox stories
1111
// (Seatbelt / Docker Desktop / Windows containers); this package targets the
1212
// Linux snap-docker gap specifically (see package doc).
13+
// Available reports whether the native sandbox primitive can be enforced. It is
14+
// always false off Linux: Landlock is a Linux-only LSM.
15+
func Available() bool { return false }
16+
1317
func Apply(spec Spec) (Report, error) {
1418
// No filesystem allowlist requested → nothing to enforce, same as Linux.
1519
if !spec.wantsLandlock() {

internal/sandbox/wrap.go

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
package sandbox
2+
3+
import (
4+
"encoding/json"
5+
"fmt"
6+
"os"
7+
)
8+
9+
// Re-exec wrapper protocol (MCP-34.3).
10+
//
11+
// Landlock confines the *current* process and every process it then execs, and
12+
// the confinement is irreversible — so it cannot be applied in-process before
13+
// mcp-go spawns an upstream stdio server. The integration is therefore a tiny
14+
// re-exec wrapper: mcpproxy launches itself as
15+
//
16+
// mcpproxy __sandbox_exec -- <command> [args...]
17+
//
18+
// with the desired confinement encoded in the environment. The child decodes the
19+
// Spec, calls Apply, then execs <command>, replacing its own image so the
20+
// untrusted server inherits the locked-down Landlock domain and the server's
21+
// stdin/stdout/stderr pass straight through with no intervening mux.
22+
const (
23+
// Subcommand is the hidden mcpproxy subcommand that runs the re-exec child.
24+
Subcommand = "__sandbox_exec"
25+
26+
// EnvSpec carries the JSON-encoded Spec from the parent to the re-exec child.
27+
EnvSpec = "MCPPROXY_SANDBOX_SPEC"
28+
)
29+
30+
// WrapCommand builds the argv and extra environment needed to launch
31+
// command/args confined by spec, by re-executing self (the absolute path to the
32+
// running mcpproxy binary) as the sandbox child. The returned extraEnv entries
33+
// must be appended to the child process's environment so SpecFromEnv can decode
34+
// the Spec on the other side.
35+
//
36+
// It performs no syscalls and is safe to call on every platform; whether the
37+
// confinement actually takes effect is decided later by Apply inside the child
38+
// (a no-op on non-Linux / Landlock-less kernels).
39+
func WrapCommand(self string, spec Spec, command string, args []string) (wrappedCommand string, wrappedArgs []string, extraEnv []string, err error) {
40+
enc, err := json.Marshal(spec)
41+
if err != nil {
42+
return "", nil, nil, fmt.Errorf("sandbox: encode spec: %w", err)
43+
}
44+
wrappedArgs = make([]string, 0, len(args)+3)
45+
wrappedArgs = append(wrappedArgs, Subcommand, "--", command)
46+
wrappedArgs = append(wrappedArgs, args...)
47+
extraEnv = []string{EnvSpec + "=" + string(enc)}
48+
return self, wrappedArgs, extraEnv, nil
49+
}
50+
51+
// SpecFromEnv decodes the Spec the parent encoded into EnvSpec. ok is false when
52+
// the variable is absent — i.e. the process was not launched as a sandbox child.
53+
func SpecFromEnv() (spec Spec, ok bool, err error) {
54+
raw, present := os.LookupEnv(EnvSpec)
55+
if !present {
56+
return Spec{}, false, nil
57+
}
58+
if err := json.Unmarshal([]byte(raw), &spec); err != nil {
59+
return Spec{}, true, fmt.Errorf("sandbox: decode %s: %w", EnvSpec, err)
60+
}
61+
return spec, true, nil
62+
}

0 commit comments

Comments
 (0)