sandbox is a minimalist, auditable, and hackable C program that builds a chrooted Linux environment around a target ELF binary or a minimal shell environment, isolating execution in dedicated namespaces with tight controls on filesystem, user privileges, and process capabilities. The target is validated by is_binary() (sandbox.c:138-165), which only checks that the path is executable (access(X_OK)), is a regular file (S_ISREG), and that its first four bytes are the ELF magic \x7fELF — it is not a full ELF-format check. Shell scripts and other non-ELF executables are rejected at this initial check with "<path> is not a binary file", but a malformed file whose first four bytes happen to be \x7fELF passes is_binary() and fails later during rootfs setup (typically at ldd failed for <target> followed by Rootfs setup failed).
- 📦 Builds minimal chroot environments for a binary or a shell session
- 🔒 Isolates with Linux namespaces: mount, PID, UTS (hostname)
- 🚫 Clears the capability bounding set before optional UID/GID drop, then drops all process capabilities using
libcapafter wiping the environment - 👤 Optionally drops to the unprivileged
nobodyuser (--user) - 🔍 Supports tracing with
strace(--trace) - 🧱 Can assemble a target rootfs without running it (
--prepare-only) - 🐚 Shell mode includes fixed essential binaries:
/bin/sh,/bin/ls,/bin/cat,/bin/echo,/bin/mkdir,/bin/rm,/usr/bin/grep,/usr/bin/head,/usr/bin/tail,/usr/bin/wc,/usr/bin/stat,/usr/bin/ldd,/usr/bin/strace,/usr/bin/du - 🏗️ Auto-copies required dynamic libraries with
ldd - 🧩 Extensible:
--extras <file>supports absolute files, relative files resolved against the extras list directory, and directory entries marked with a trailing slash - 🗄️ Auto-populates
/etc/passwdand/etc/groupas needed - 🧹 Wipes environment variables for safety
- 🪶 Less than 1000 lines, easy to audit and extend
-
A C compiler in
CC— the Makefile defaults toclang(CC ?= clang), so plainmakerequiresclang. To build with GCC, runmake CC=gcc. -
libcap development headers and library — provides
<sys/capability.h>and-lcapThese are the only build prerequisites
make preflightchecks (Makefile:21-24): a workingCC,<sys/capability.h>, and successful link with-lcap.# Debian / Ubuntu sudo apt install clang libcap-dev # Fedora / RHEL sudo dnf install clang libcap-devel # Arch sudo pacman -S clang libcap
Distro meta-packages like
build-essential(Debian/Ubuntu) orbase-devel(Arch) are not required — install them only as an optional convenience bundle if you want the wider toolchain.
These are the host-side tools sandbox itself invokes at runtime. For the separate list of binaries that shell mode, target mode, and prepare-only mode copy from the host into the rootfs (a different kind of dependency), see Shell-mode rootfs inventory and Target-mode rootfs inventory below.
-
Root privileges — required for all modes except
--userns(namespaces, chroot, mounts). -
lddon the hostPATH— required in every mode.copy_ldd_deps()invokes it viaexeclp("ldd", "ldd", bin, ...)(sandbox.c:258-264), so any location onPATHworks. Used in both target mode and shell mode to discover and copy shared-library dependencies. Typically provided bylibc-bin(Debian/Ubuntu) orglibc-common(Fedora/RHEL). -
/usr/bin/strace— only invoked under--trace— the trace path buildstrace_argv[0] = "/usr/bin/strace"and runssandbox_exec(trace_argv), whichexecv()s that exact path inside the rootfs (sandbox.c:857-886,sandbox.c:640-645). Required on the host only when--traceis used; without--tracethe program never invokesstrace.# Debian / Ubuntu sudo apt install strace # Fedora / RHEL sudo dnf install strace # Arch sudo pacman -S strace
Shell mode assembles its rootfs by attempting to copy a hardcoded list of host binaries into the chroot so that the interactive shell has a small fixed toolset. These paths are copied from the host (not invoked by sandbox itself) and are therefore separate from the Runtime tool list above. The list is essential_bins[] (sandbox.c:52-68), and setup_essential_environment() (sandbox.c:727-735) attempts each entry from the exact host path shown below; any missing source aborts setup with Failed to copy essential bin: <path>, and a subsequent copy_ldd_deps() failure for the same entry likewise aborts setup:
/bin/sh/bin/ls/bin/cat/bin/echo/bin/mkdir/bin/rm/usr/bin/grep/usr/bin/head/usr/bin/tail/usr/bin/wc/usr/bin/stat/usr/bin/ldd/usr/bin/strace(attempted as a shell-mode tool copy, not invoked by shell mode itself)/usr/bin/du
Target mode and prepare-only mode (invocations that pass a target binary) build the target rootfs via build_rootfs() (sandbox.c:650-713), which copies the target binary plus two fixed host paths into the chroot. These paths are copied from the host (not invoked by sandbox itself) and are therefore separate from the Runtime tool list above:
- Target binary — copied to
/usr/bin/<basename>inside the rootfs, where<basename>is the final path component of the host target. Normal target mode later executes this/usr/bin/<basename>path. If the original target token is an absolute path and differs from/usr/bin/<basename>,build_rootfs()also mirrors the target at that absolute path inside the rootfs so trace mode can run the original absolute path. /bin/sh— unconditionally copied. Setup aborts withFailed to copy /bin/shif the host source is missing (sandbox.c:689-694)./usr/bin/strace— partially best-effort without--trace; strictly required with--trace.build_rootfs()first attemptscopy_file("/usr/bin/strace", ...)(sandbox.c:699-709); if that initial copy fails andtrace_modeis set, setup aborts withFailed to copy strace (required for --trace), otherwise non---tracetarget mode continues withoutstracein the rootfs. But if the copy succeeds and the latercopy_ldd_deps("/usr/bin/strace", rootfs)call fails,build_rootfs()returns-1unconditionally in both trace and non-trace runs.
Needed only for make lint:
-
cppcheck— static analysis forsandbox.c -
shellcheck— static analysis fortests/smoke.sh# Debian / Ubuntu sudo apt install cppcheck shellcheck # Fedora / RHEL sudo dnf install cppcheck ShellCheck # Arch sudo pacman -S cppcheck shellcheck
From the repo root:
makeThis produces ./sandbox. To build with GCC instead of the default clang:
make CC=gccTo remove the built binary:
make cleanAfter building, run the smoke test:
make testThe test target runs tests/smoke.sh, which performs these checks:
- Invokes
./sandboxwith no arguments and verifies the output is exactly:Usage: ./sandbox <rootfs> [<target-binary>] [--user] [--userns] [--prepare-only] [--extras <file>] [--trace <args...>] - Verifies that
./sandboxexists and is executable (-x). - Verifies targeted
--prepare-onlyparser failures for missing target,--trace,--user, and--userns. - When run as root, verifies successful
--prepare-onlyrootfs assembly with/bin/false, confirms a prepare-only/bin/echotarget was copied rather than executed, and checks the prepared root can execute the copied/usr/bin/<basename>path viachroot. When run without root, this assembly check is skipped because the current target rootfs path creates device nodes.
This is still a smoke test suite — there are no unit tests, no CI, and no coverage of runtime sandboxing behavior.
Run the static-analysis linters:
make lintThe lint target runs two checks:
- cppcheck against
sandbox.c:cppcheck --quiet --enable=warning,performance,portability --check-level=reduced --suppress=normalCheckLevelMaxBranches --inline-suppr --error-exitcode=1 sandbox.c - shellcheck against
tests/smoke.sh:shellcheck tests/smoke.sh
Both cppcheck and shellcheck must be installed on the host (see Development (optional) under Prerequisites).
Run the combined test and lint checks:
make validateThe validate target is defined as validate: test lint in Makefile:19 and runs the existing test and lint targets in sequence: first tests/smoke.sh from make test, then the cppcheck and shellcheck invocations from make lint.
Usage: ./sandbox <rootfs> [<target-binary>] [--user] [--userns] [--prepare-only] [--extras <file>] [--trace <args...>]Run as root (e.g., via sudo) unless --userns is used.
<target-binary> is treated as the literal filesystem path token you provide, not something resolved via the host or sandbox PATH. Absolute paths such as /usr/bin/ls work, explicit relative paths such as ./echo-local or subdir/tool work, and a bare name such as ls works only if an executable file by that exact name exists in the current working directory.
After <rootfs>, sandbox scans argv left-to-right until --trace (sandbox.c:759-779). At each position, --user, --userns, --prepare-only, and --extras <file> are recognized as options and may appear either before or after the target binary; --extras also consumes the following token as its list-file path, so that token is not passed through to the target. Any remaining token is positional: the first positional token becomes <target-binary>, and every later positional token is collected into target_args[]. The parser matches exactly those five literal strings — --user, --userns, --prepare-only, --extras, --trace — via strcmp and has no "unknown option" rejection branch (sandbox.c:759-779); any other token, including unrecognized --xxx flags such as --help, --version, or --foo, and mistyped variants like -user or —user (en/em dash), falls through to the same positional arm, so the first such token is assigned to target and later ones are appended to target_args[]. For example, ./sandbox /tmp/x --help sets target = "--help": when run without root it fails the root check at sandbox.c:781-784 first, and under sudo it reaches the binary check and exits with --help is not a binary file (sandbox.c:844-847) — there is no built-in --help/--version output. Whether those later positionals actually reach the target depends on the execution path:
- Normal (non-
--trace) path:sandbox_main()readstarget_args[]and forwards each entry as an argument to the executed target (sandbox.c:621-629). For example,./sandbox /tmp/x2 --userns /bin/echo hiruns/bin/echo hi, and./sandbox /tmp/sb-review /bin/echo one --userns tworuns/bin/echo one two. --tracepath:target_args[]is never read —trace_argv[]is rebuilt only from the selected target binary plus tokens after--trace(sandbox.c:866-885), so any positional collected between<target-binary>and--traceis silently dropped. For example,sudo ./sandbox /tmp/sb-review /bin/echo one --trace twotraces/bin/echo two(the pre---traceoneis dropped). See the--traceparagraph below and the Trace a binary mode for the full rules.
--trace terminates sandbox option parsing: when the parser encounters it, it records the index and immediately breaks out of the option loop (sandbox.c:764-767), after which the traced command line is built from every token after --trace (sandbox.c:866-885). All sandbox flags (--user, --userns, --extras <file>) must therefore appear before --trace; every token after --trace is appended to the target binary's argv and is never interpreted as a flag for sandbox or strace. Intended target arguments must also appear after --trace: any positional tokens collected between <target-binary> and --trace are stored in target_args[], which is only read by the normal execution path in sandbox_main() (sandbox.c:621-629) — the trace path builds trace_argv[] exclusively from tokens after --trace (sandbox.c:866-885) and therefore silently discards those pre---trace positional args. For example, sudo ./sandbox /tmp/x /bin/echo one --trace two traces /bin/echo two (the one is dropped), while sudo ./sandbox /tmp/x /bin/echo --trace one two traces /bin/echo one two. The ordering effect shows up in which error fires: ./sandbox /tmp/x /bin/echo --userns --trace fails with --userns is not compatible with --trace. because --userns was parsed as a sandbox flag before the boundary and tripped the conflict check, whereas ./sandbox /tmp/x /bin/echo --trace --userns (run non-root) fails with This program must be run as root (or use --userns). — parsing stopped at --trace, so the second --userns was passed through to /bin/echo as an argument, the --userns/--trace conflict check never ran, and the root check fired first instead.
--extras <file> is available in shell mode, normal target-binary mode, traced target-binary mode, and prepare-only mode. There is no --extras-specific incompatibility with --user, --userns, --prepare-only, or --trace; the only startup flag conflicts are the ones listed below.
Flag constraints (enforced at startup):
--tracerequires a target binary.--prepare-onlyrequires a target binary.--prepare-onlycannot be combined with--trace,--user, or--userns.--usercannot be combined with--trace.--usernscannot be combined with--trace.--usernscannot be combined with--user.--extrashas no additional startup compatibility restriction.
main() checks --prepare-only constraints before the root check so prepare-only misuse reports a direct error even without sudo. The older runtime constraints still run after the root check, so that ordering affects which stderr line you see first. For example, ./sandbox /tmp/sb --trace fails with This program must be run as root (or use --userns). before it can reach --trace requires a target binary., while sudo ./sandbox /tmp/sb --trace reaches the later check and prints --trace requires a target binary.. Concrete examples (each command shows the exact error produced):
# --userns bypasses the root check, so --userns combinations reach the
# incompatibility checks without sudo:
./sandbox /tmp/sb --userns --trace
# → --userns is not compatible with --trace.
./sandbox /tmp/sb --userns --user
# → --userns is not compatible with --user.
# --user --trace has no --userns, so the root check fires first when
# unsudoed; sudo is required to reach the --user/--trace conflict:
./sandbox /tmp/sb --user --trace
# → This program must be run as root (or use --userns).
sudo ./sandbox /tmp/sb --user --trace
# → --user is not compatible with --trace.
./sandbox /tmp/sb --prepare-only
# → --prepare-only requires a target binary.
./sandbox /tmp/sb /bin/true --prepare-only --userns
# → --prepare-only is not compatible with --userns.Modes:
-
Minimal shell sandbox:
sudo ./sandbox /tmp/mychroot
- Drops you into
/bin/sh.setup_essential_environment()attempts the set of binaries hardcoded inessential_bins[](sandbox.c:52-68) — and no shell tools outside that list:/bin/sh,/bin/ls,/bin/cat,/bin/echo,/bin/mkdir,/bin/rm,/usr/bin/grep,/usr/bin/head,/usr/bin/tail,/usr/bin/wc,/usr/bin/stat,/usr/bin/ldd,/usr/bin/strace,/usr/bin/du. - The copy loop (
sandbox.c:727-735) is strict, not best-effort: anycopy_file()failure for an entry (source absent on the host, destination open/write error, or any other I/O failure) is logged asFailed to copy essential bin: <path>and aborts setup withreturn -1, and a subsequentcopy_ldd_deps()failure for the same binary likewise aborts setup. A successful shell-mode run therefore guarantees the rootfs contains every entry inessential_bins[]; verify on a host where those binaries exist with:sudo find <rootfs>/bin <rootfs>/usr/bin -maxdepth 1 -type f | sort
- Drops you into
-
Run a specific binary:
sudo ./sandbox /tmp/mychroot /usr/bin/ls
<target-binary>is the literal filesystem path tokensandboxchecks and copies; it is notPATH-resolved. Absolute paths such as/usr/bin/lswork, explicit relative paths such as./echo-localorsubdir/toolwork, and a bare name such aslsworks only when an executable file by that exact name exists in the current working directory.<target-binary>must be a regular ELF executable, not a script or a symlink to a non-regular file. It is validated byis_binary()(sandbox.c:138-165), which only checks that the path is executable (access(X_OK)), is a regular file (S_ISREG), and that its first four bytes are the ELF magic\x7fELF. This is a magic-bytes check, not full ELF validation: shell scripts and other non-ELF executables are rejected here with"<path> is not a binary file"(for example,./sandbox /tmp/sb-review /tmp/sb-script --usernson an executable shell script fails immediately with/tmp/sb-script is not a binary file), but a malformed file whose first four bytes happen to be\x7fELF— including an executable regular file containing only those four bytes — passesis_binary()and then fails later duringbuild_rootfs(), typically atldd failed for <target>followed byRootfs setup failed.- Target-mode rootfs assembly is handled by
build_rootfs()(sandbox.c:650-713), not by the shell-modeessential_bins[]path. It always creates the standard directory tree fromdirs[](/bin,/usr/bin,/etc,/proc,/dev,/tmp), always copies the requested target to<rootfs>/usr/bin/<basename>(sandbox.c:666-670), and always copies/bin/shto<rootfs>/bin/sh(sandbox.c:689-694). build_rootfs()also always copies shared-library dependencies for the requested target and for/bin/shby callingcopy_ldd_deps(bin, rootfs)andcopy_ldd_deps("/bin/sh", rootfs)(sandbox.c:695-698).- Normal target execution later runs through
sandbox_main()(sandbox.c:621-630), which alwaysexecv()s/usr/bin/<basename>inside the sandbox. That means the executed path, and therefore the target process'sargv[0], is always/usr/bin/<basename>in the non---tracepath. - If
<target-binary>is an absolute path and that path differs from/usr/bin/<basename>(for example/usr/local/bin/fooor/sbin/foo),build_rootfs()additionally copies the same host binary to<rootfs><absolute-target-path>after creating any missing parent directories (sandbox.c:672-688). That extra copy is conditional on the input path and exists for path compatibility only; the normal non---traceexecv()call still uses/usr/bin/<basename>, not<absolute-target-path>. build_rootfs()also attempts to copy/usr/bin/straceto<rootfs>/usr/bin/straceon every target-mode run (sandbox.c:699-709). If that initialcopy_file()fails, setup aborts only when--traceis active; otherwise non---tracetarget mode continues withoutstracein the rootfs. If the copy succeeds,build_rootfs()then callscopy_ldd_deps("/usr/bin/strace", rootfs), and any failure there aborts setup in both trace and non-trace runs.- After those copies,
build_rootfs()callscreate_dev_nodes(rootfs)andcreate_etc_files(rootfs)(sandbox.c:710). That always ensures<rootfs>/devand<rootfs>/etcexist. Outside--userns,create_dev_nodes()creates/dev/null,/dev/zero, and/dev/ttyas character device nodes; in--userns, it creates placeholder files thatsetup_sandbox_environment()later bind-mounts over with the host devices.create_etc_files()always creates<rootfs>/etc/; it additionally writes/etc/passwdand/etc/grouponly when--useris set (theif(drop_to_nobody)gate atsandbox.c:390-410), and--extrasmay add other files under/etcregardless of mode.
-
Prepare a target rootfs without running it:
sudo ./sandbox /tmp/app-rootfs /bin/true --prepare-only
--prepare-onlyvalidates the target through the same binary validation path as target mode, builds the same target rootfs, applies--extras <file>if provided, and exits0on success.- It can be passed after the rootfs/target like other sandbox flags, or as a leading compatibility form:
sudo ./sandbox --prepare-only /tmp/app-rootfs /bin/true. - It is not a sandbox enforcement mode. It does not run the target, start an interactive shell, call
clone(), callchroot(), mount/proc, install seccomp, drop capabilities, drop UID/GID, or apply Landlock. - It is intended for pipelines where
sandboxis responsible for rootfs assembly and another tool, such aslandlockd, performs the actual runtime policy enforcement. - The current implementation rejects
--prepare-onlywith--trace,--user, or--userns. Because it uses the same non-userns target rootfs assembly path as normal target mode, it should be run as root when device nodes need to be created.
Minimal pipeline:
sudo ./sandbox /tmp/app-rootfs /bin/true --prepare-only landlockd run --policy-file /tmp/app-policy.toml -- /usr/bin/true
Example
/tmp/app-policy.toml:version = 1 [[fs_layer]] handled_access_fs = ["execute", "read_file", "read_dir"] [[fs_layer.rule]] path = "/tmp/app-rootfs" allowed_access = ["execute", "read_file", "read_dir"] [runtime] root = "/tmp/app-rootfs" cwd = "/"
-
Trace a binary (replays a filtered subset of strace-reported paths into the rootfs):
sudo ./sandbox /tmp/mychroot /usr/bin/curl --trace "https://example.com"--tracerequires a target binary and cannot be combined with--useror--userns.--traceconsumes all subsequent argv tokens as arguments to the traced binary, so--user,--userns, and--extras <file>must appear before--trace; otherwise they are silently passed to the target binary (not tosandboxand not tostrace) and the--user/--usernsconflict checks above are bypassed. For example,sudo ./sandbox /tmp/sbroot /bin/echo --trace --usernsruns/bin/echo --usernsinside the sandbox and does not enable user-namespace mode.- Arguments for the traced binary must also appear after
--trace. In trace mode, the first positional token before--traceselects the target binary (the parser setstargeton its first positional atsandbox.c:771-772) and is the only pre---tracepositional that reaches the traced program — it becomes the traced program'sargv[0]viatrace_argv[j++] = trace_target(sandbox.c:883). Any additional positional tokens before--traceare discarded and do not reach the traced program: they are collected intotarget_args[]and consumed only by the non-trace execution path insandbox_main()(sandbox.c:621-629), whereas the trace path appends only tokens after--tracetotrace_argv[](sandbox.c:866-885), so those tokens become the traced program'sargv[1..]. For example,sudo ./sandbox /tmp/sbtrace-arg /bin/echo one --trace twotraces/bin/echo two(the pre---traceoneis dropped), whilesudo ./sandbox /tmp/sbtrace-arg2 /bin/echo --trace one twotraces/bin/echo one two. - In
--tracemode,main()buildstrace_argv[]beforeclone()(sandbox.c:857-886), andtrace_main()only callssandbox_exec(trace_argv)(sandbox.c:595-598);trace_main()does not populatetrace_argv[]itself. sandbox_exec()thenexecv()s/usr/bin/stracebecausetrace_argv[0]is/usr/bin/strace(sandbox.c:640-645,sandbox.c:871-876). The traced program path is passed separately as strace's first non-option command argument: if the user supplied an absolute target path, that argument is the original absolute path; otherwise it is/usr/bin/<basename>(sandbox.c:867-885). That traced-program path is what the traced binary sees as its ownargv[0]; it is not the pathsandbox_exec()directlyexecv()s.- Exact launch form.
--traceperforms a single sandboxedexecv()of/usr/bin/stracewith the fixed token sequence/usr/bin/strace -f -e trace=file -o /tmp/straceXXXXXX <trace_target> <args-after---trace>(sandbox.c:876-886). The-opath is the exact template passed tomkstemp("/tmp/straceXXXXXX")atsandbox.c:858-864, which runs on the host before theclone()call atsandbox.c:888—mkstemp()overwrites the sixXs in place with a unique suffix, so every run gets a fresh/tmp/strace<6chars>pathname (the mkstemp call leaves an empty file at that path on the host). Becausestraceis only exec'd aftersetup_sandbox_environment()callschroot(rootfs)(sandbox.c:453), the trace log strace writes to/tmp/strace<6chars>actually lands inside the rootfs at<rootfs>/tmp/strace<6chars>, which is exactly where the parent reopens it after the child exits (sandbox.c:899-902).<trace_target>is selected atsandbox.c:871-875: the original absolutetargetstring whentarget[0] == '/', otherwise/usr/bin/<basename>(resolved fromtarget_name, whichbuild_rootfs()sets tobasename(bin)atsandbox.c:664).<args-after---trace>is every argv token after--trace, copied verbatim in order (sandbox.c:884-885).trace_main()does nothing except callsandbox_exec(trace_argv)(sandbox.c:595-600), whichexecv()s that argv directly with no wrapper shell. - Exit-status propagation. After the cloned trace child exits,
main()firstwaitpid()s it, then performs the post-trace replay scan plus the best-effortunlink()described below, and only after that returns the traced child's status to the shell:WEXITSTATUS(status)on normal exit, or128 + WTERMSIG(status)on signal termination (sandbox.c:893-898,sandbox.c:902-928). The[sandbox --trace exited with N]/[sandbox --trace killed by signal N]message is printed before the trace-output scrape but reports the same status that is subsequently returned. Concretely,sudo ./sandbox /tmp/sb /bin/true --trace; echo $?prints0, andsudo ./sandbox /tmp/sb /bin/false --trace; echo $?prints1. - Post-trace file replay. After the traced child exits,
sandboxreopens the trace log at<rootfs>/tmp/strace<6chars>and scans it line-by-line (sandbox.c:902-924). The replay is narrower than "every file accessed during the run": for each line it locates the first pair of double quotes and treats the enclosed string as a candidate path; the candidate is then copied into the rootfs only if (1) it is quoted in the trace line, (2) it starts with/(absolute), and (3) it does not contain the substring/... Paths that pass those three filters are copied viacopy_file(path, <rootfs><path>), whose return value is ignored — any failure during this replay pass (missing source, permission error, destination write error) is silently skipped and does not abort the run. After the scan completes,sandboxattempts tounlink()the trace log at<rootfs>/tmp/strace<6chars>(sandbox.c:924); theunlink()return value is also ignored, so on the common success path the file is removed, but a failedunlink()does not abort the run and can leave the trace file in place. A post-run check liketest ! -e <rootfs>/tmp/strace*therefore confirms the expected success path but is not guaranteed by the code.
-
Sandbox as unprivileged user (
nobody):sudo ./sandbox /tmp/mychroot --user
- Cannot be combined with
--traceor--userns.
- Cannot be combined with
-
Rootless mode (user namespace):
./sandbox /tmp/mychroot --userns ./sandbox /tmp/mychroot /usr/bin/ls --userns
- Runs without root by creating a user namespace. Requires
sysctl kernel.unprivileged_userns_clone=1(or equivalent) on the host kernel. - Device nodes (
/dev/null,/dev/zero,/dev/tty) are set up in two phases, becausemknod(2)requiresCAP_MKNODwhich is not available inside an unprivileged user namespace:- Placeholder creation — during rootfs assembly,
create_dev_nodes()takes theuserns_modebranch atsandbox.c:363-369and creates empty regular files at<rootfs>/dev/null,<rootfs>/dev/zero, and<rootfs>/dev/ttyviaopen(path, O_WRONLY | O_CREAT, 0666)instead of callingmknod(). These are not device nodes yet — they are zero-byte regular files that exist only to serve as bind-mount targets. - Bind-mount over the placeholders — later, inside the child,
setup_sandbox_environment()(sandbox.c:441-452) iterates{"null", "zero", "tty"}and performsmount("/dev/<name>", "<rootfs>/dev/<name>", NULL, MS_BIND, NULL)for each, attaching the host's real character devices onto the placeholder files beforechroot.
- Placeholder creation — during rootfs assembly,
- Cannot be combined with
--traceor--user.
- Runs without root by creating a user namespace. Requires
-
Add extra files:
sudo ./sandbox /tmp/mychroot --extras extras.txt sudo ./sandbox /tmp/mychroot /usr/bin/ls --extras extras.txt ./sandbox /tmp/mychroot --userns --extras extras.txt sudo ./sandbox /tmp/mychroot /usr/bin/curl --extras extras.txt --trace "https://example.com"- Lines are read one per
fgets; trailing newline is stripped; entries with length 0 after strip are skipped silently; entries whose first non-whitespace character is#are treated as comments and skipped silently. - Entries containing a parent-directory component (
..) are rejected before any source or destination path is created, including both relative entries such as../outside.txtand absolute entries such as/tmp/foo/../escape. - ABSOLUTE entry (
line[0] == '/'): the entry's absolute host path is preserved under the rootfs, i.e.<rootfs><entry>(e.g./etc/memfdbus/client.conf-><rootfs>/etc/memfdbus/client.conf). - RELATIVE entry (
line[0] != '/'and not blank/comment): resolved as<dir-of-extras-listfile>/<entry>on the host, and copied to<rootfs>/<entry>(i.e. preserving the same relative path under rootfs, e.g. listfile/work/extras.txtwith linevar/run/iouringd/-> source/work/var/run/iouringd/, dest<rootfs>/var/run/iouringd/). The listfile directory is computed once viadirname()of a writable copy of the listfile argument before the read loop. - If an entry ends with
/, it denotes a directory: the destination directory<rootfs>/<resolved>is created with mkdir -p semantics (every missing parent created with mode 0755) and no file copy is performed. This is the supported way to materialise socket/IPC parent directories such as iouringdvar/run/iouringd/or memfdbusvar/run/memfdbus/. - For file entries (no trailing
/), the destination's parent directory chain is created with mkdir -p semantics (mode 0755) before invokingcopy_file(). - Any failure (open of source missing,
copy_file()failure, mkdir failure other thanEEXIST) on any entry causescopy_extras()to return-1after the entry is reported on stderr in the formextras: failed to copy <src> -> <dst>: <strerror>; the function still attempts every remaining line so the user sees the full set of failures, but the non-zero return propagates andmain()exits non-zero (matching the existingFailed to copy extrasbranch). - On success of each entry outside prepare-only mode,
copy_extras()printsextras: copied <src> -> <dst>(file) orextras: created <dst>(directory) to stdout. In prepare-only mode, copied file entries are reported by the commoncopied <src> -> <dst>rootfs copy output; directory entries still printextras: created <dst>. - Example 1, extras list at
/work/extras.txt:With/etc/memfdbus/client.conf bin/iouringd-client var/run/iouringd/<rootfs>set to/tmp/mychroot,/etc/memfdbus/client.confis copied to/tmp/mychroot/etc/memfdbus/client.conf;/work/bin/iouringd-clientis copied to/tmp/mychroot/bin/iouringd-client; and/tmp/mychroot/var/run/iouringd/is created as a directory for IPC/socket use. - Example 2, extras list at
/srv/app/extras.txt:With/etc/memfdbus/client.conf bin/iouringd-client var/run/memfdbus/<rootfs>set to/tmp/mychroot,/etc/memfdbus/client.confis copied to/tmp/mychroot/etc/memfdbus/client.conf;/srv/app/bin/iouringd-clientis copied to/tmp/mychroot/bin/iouringd-client; and/tmp/mychroot/var/run/memfdbus/is created as a directory without requiring that directory to exist on the host.
- Lines are read one per
- Creates a new mount, PID, and UTS namespace
- Builds up a new root filesystem (
<rootfs>) by creating the standard directory tree fromdirs[]:/bin,/usr/bin,/etc,/proc,/dev, and/tmp./etcitself is always created, but/etc/passwdand/etc/groupare only written when--useris passed (theif(drop_to_nobody)gate atsandbox.c:390) - In shell mode,
setup_essential_environment()copies every entry inessential_bins[], then callscreate_dev_nodes()andcreate_etc_files() - In target-binary and prepare-only modes,
build_rootfs()always copies the target to/usr/bin/<basename>, conditionally also copies it to its original absolute path inside the rootfs when that path differs, always copies/bin/sh, always copies shared-library dependencies for the target and/bin/sh, attempts to copy/usr/bin/straceon every run, tolerates only the initialcopy_file("/usr/bin/strace", ...)failure in non---tracemode, aborts on that same failure under--trace, and aborts in both modes if the latercopy_ldd_deps("/usr/bin/strace", rootfs)call fails, and then callscreate_dev_nodes()andcreate_etc_files() - In prepare-only mode,
main()returns immediately afterbuild_rootfs()and optional--extrasprocessing. Each successful file copy printscopied <src> -> <dst>to stdout, followed byTARGET /usr/bin/<basename>for the in-rootfs target path; errors and warnings remain on stderr. It does not enter the runtime setup path, so no namespace clone, chroot,/procmount, seccomp filter, UID/GID drop, capability drop, shell, target execution, or Landlock enforcement occurs. create_dev_nodes()prepares/dev/null,/dev/zero, and/dev/tty: in normal runs they are created as character device nodes, while in--usernsthey start as placeholder files that are later bind-mounted to the host devicescreate_etc_files()always creates<rootfs>/etc/; it additionally writes/etc/passwdand/etc/grouponly when--usersetsdrop_to_nobody(theif(drop_to_nobody)gate atsandbox.c:390-410)- Optionally adds files specified in
--extras, which can copy absolute entries to matching absolute paths under the rootfs, copy relative entries from the extras list directory to matching relative paths under the rootfs, and create directory entries marked with a trailing slash. Blank and comment entries are skipped; entries containing..path components are rejected; per-entry failures are reported, all entries are attempted, and any failure makescopy_extras()return non-zero somain()exits withFailed to copy extras. - Optionally traces binary with
straceto discover runtime file dependencies. After the traced child exits,sandboxparses the trace log under<rootfs>/tmp/strace<6chars>, copies into the rootfs only quoted absolute paths that do not contain/.., ignores anycopy_file()failure during this replay pass, and then attempts tounlink()the trace log (theunlink()return value is also ignored, so removal is best-effort) (sandbox.c:902-924) - Optionally switches to UID/GID 65534 (
nobody) - Optionally creates a user namespace with
--usernsfor rootless operation: writesdenyto/proc/<pid>/setgroupsand maps namespace uid/gid0to the invoking caller's real uid/gid (getuid()/getgid()) via/proc/<pid>/uid_mapand/proc/<pid>/gid_map, so the sandboxed process appears asrootinside the namespace while retaining the caller's identity on the host.--usernscannot be combined with--useror--trace. - In
--usernsmode, the parent process invokeswrite_uid_gid_map()(sandbox.c:101-136) against the child pid to populate these maps: it writesdenyto/proc/<pid>/setgroups, then writes0 <caller-uid> 1\nto/proc/<pid>/uid_mapand0 <caller-gid> 1\nto/proc/<pid>/gid_map, mapping the caller's real UID/GID to UID 0 / GID 0 (root) inside the user namespace. This is distinct from the--userflag, which drops the sandboxed process to UID/GID 65534 (nobody) inside the sandbox;--usernsinstead gives it namespace-root while keeping the caller's identity on the host. - Sets
PR_SET_NO_NEW_PRIVSviaprctl()as the first step ofsetup_sandbox_environment()(sandbox.c:423), beforechrootand before the capability bounding set is cleared or process capabilities are dropped. Once set, the bit is inherited acrossexecve()and prevents the kernel from granting new privileges through setuid/setgid binaries (or file capabilities) executed inside the sandbox. - In
--usernsmode only,setup_sandbox_environment()bind-mounts the host's/dev/null,/dev/zero, and/dev/ttyonto<rootfs>/dev/null,<rootfs>/dev/zero, and<rootfs>/dev/ttyby iterating{"null", "zero", "tty"}and callingmount(src, mnt, NULL, MS_BIND, NULL)for each (sandbox.c:441-452). This step runs only whenuserns_modeis set and happens before thechroot()call onsandbox.c:453, attaching the host's real character devices onto the zero-byte placeholder files thatcreate_dev_nodes()created earlier in the userns branch (sincemknod(2)is not available inside an unprivileged user namespace). - Mounts a fresh
procfilesystem on/procinside the rootfs with the hardening flagsMS_NOSUID | MS_NOEXEC | MS_NODEV—mount("proc", "/proc", "proc", MS_NOSUID | MS_NOEXEC | MS_NODEV, "")atsandbox.c:462. This happens insidesetup_sandbox_environment()after thechroot(rootfs)call atsandbox.c:453and beforedrop_bounding_caps()clears the capability bounding set atsandbox.c:468.MS_NOSUIDprevents setuid/setgid bits from taking effect on anything accessed through this mount,MS_NOEXECblocks execution of any file beneath/proc, andMS_NODEVdisables access to device nodes under the mount. - Clears the capability bounding set, drops to unprivileged UID/GID if requested, wipes environment variables with
clearenv()and restores onlyPATH=/bin:/usr/binandHOME=/, and finally drops all process capabilities - Execution has two setup entrypoints but three final exec outcomes — shell mode
execv()s/bin/sh, normal target modeexecv()s/usr/bin/<basename>inside the chroot, and--traceexecv()s/usr/bin/straceviasandbox_exec()(withstracetracing either the original absolute target path or/usr/bin/<basename>for non-absolute inputs). Both setup entrypoints callsetup_sandbox_environment()(sandbox.c:414-499) beforeexecv(), soPR_SET_NO_NEW_PRIVS, hostname/mount-private setup,chrootinto<rootfs>,/procmount, capability bounding set clear, optional UID/GID drop tonobody,clearenv()followed by restoring onlyPATH=/bin:/usr/binandHOME=/, and the finaldrop_all_caps()all happen in both modes — the only difference between the two setup entrypoints is whether seccomp is installed:- Normal execution (no
--trace) runs throughsandbox_main()(sandbox.c:602-638), which callssetup_sandbox_environment()first and theninstall_seccomp_filter()to install a fail-closed x86_64-only seccomp allowlist beforeexecv(). On non-x86_64 hostsinstall_seccomp_filter()returns an error and the sandbox fails closed without executing the target. - Trace execution (
--trace) runs throughtrace_main()(sandbox.c:595-600), which passes the globaltrace_argvtosandbox_exec()(sandbox.c:640-648).sandbox_exec()still callssetup_sandbox_environment()first — so the chroot, capability bounding-set clear plusdrop_all_caps(), and theclearenv()followed by restoring onlyPATH=/bin:/usr/binandHOME=/above are applied — and then intentionally does not callinstall_seccomp_filter(); it doesexecv(argv[0], argv)withargv[0] == "/usr/bin/strace"(set atsandbox.c:876), sosandbox_exec()execv()s/usr/bin/straceitself, and the traced program is launched bystraceas its first non-option argument (thetrace_targetslot thatmain()fills in atsandbox.c:871-875and places attrace_argv[6]atsandbox.c:882-883, afterstrace's-f -e trace=file -o <tmpfile>options). No seccomp filter is applied, because the allowlist would otherwise blockptraceand the syscallsstraceneeds.
- Normal execution (no
- Final
execv()splits across three distinct paths:- Shell mode
execv's/bin/shinsandbox_main()(sandbox.c:632-634). - Normal target mode
execv's/usr/bin/<basename>insandbox_main()(sandbox.c:621-630), regardless of the host path the user supplied. --traceexecv's/usr/bin/straceviasandbox_exec()(sandbox.c:640-645), with the traced-program path being the original absolutetargetwhen it begins with/, otherwise/usr/bin/<basename>(sandbox.c:871-875).
- Shell mode
- Namespaces: all three
clone()sites — shell mode atsandbox.c:811,--traceatsandbox.c:887, and the normal target path atsandbox.c:933— use the same base flag setCLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, so the sandbox always enters exactly three new namespaces: UTS (hostname), PID (process IDs), and mount (filesystem). A fourth user namespace is added only under--userns, which OR's inCLONE_NEWUSERatsandbox.c:813(shell mode) andsandbox.c:935(normal target path); the--tracesite atsandbox.c:887never addsCLONE_NEWUSERbecause--traceand--usernsare mutually exclusive. Network, IPC, and cgroup namespaces are not created in any of the three paths — see Limitations & Roadmap - Mount-namespace isolation: inside
setup_sandbox_environment(),mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL)(sandbox.c:434) runs afterPR_SET_NO_NEW_PRIVS/sethostname()and before any bind mounts, thechroot(), and the/procmount. TheMS_REC | MS_PRIVATEflags recursively change the propagation type of every mount beneath/in the child's new mount namespace to private, so subsequent mount operations performed inside the sandbox — the--usernsbind-mounts of/dev/{null,zero,tty}atsandbox.c:441-452and themount("proc", "/proc", "proc", ...)call atsandbox.c:462— do not propagate back to the host's mount namespace (or to any peer namespace that would otherwise share propagation with/). Without this step a newCLONE_NEWNSmount namespace would inherit shared propagation from the host and leak the sandbox's mounts outward. - Capabilities: the bounding set is cleared before the optional UID/GID drop; all process capability sets are dropped only after the environment reset
- No environment variables:
clearenv()runs after the UID/GID drop, thenPATH=/bin:/usr/binandHOME=/are restored, and only after that are all process capabilities dropped - User
nobody: further restricts privilege for untrusted code (unless tracing) - User namespace (
--userns): optional rootless mode. Writesdenyto/proc/<pid>/setgroupsand maps namespace uid/gid0to the invoking caller's real uid/gid (getuid()/getgid()) via/proc/<pid>/uid_mapand/proc/<pid>/gid_map. The process isrootinside the namespace but keeps the caller's identity on the host — it is not a drop tonobody. Mutually exclusive with--userand--trace. - Seccomp: only the normal execution path (
sandbox_main()→install_seccomp_filter()→execv()) installs a filter — a small fail-closed allowlist compiled only for x86_64 (the entire filter is inside#if defined(__x86_64__)ininstall_seccomp_filter()). Inside that compile block, the BPF filter's first check runs at runtime: it loadsseccomp_data.archand returnsSECCOMP_RET_KILL_PROCESSwhenever the running kernel reports anything other thanAUDIT_ARCH_X86_64(sandbox.c:504-507). The filter additionally kills on the x32 ABI by testing the syscall number against__X32_SYSCALL_BITand returningSECCOMP_RET_KILL_PROCESSwhen that bit is set (sandbox.c:509-510). After the allowlisted syscalls, the filter ends with a trailingSECCOMP_RET_KILL_PROCESS(sandbox.c:574), so any syscall that does not match the allowlist falls through to that default-kill terminator. On other architectures the function returns-1andsandbox_main()exits with status 1 beforeexecv(), so the normal execution path will not start on non-x86_64 hosts — it does not fall back to running unfiltered. The--tracepath runstrace_main()(sandbox.c:595-600), which passestrace_argvtosandbox_exec()(sandbox.c:640-648);sandbox_exec()thenexecv()s/usr/bin/strace(itsargv[0], set atsandbox.c:876) and the traced program is launched bystraceas its first non-option argument (thetrace_targetslot placed attrace_argv[6]atsandbox.c:882-883). This path intentionally does not callinstall_seccomp_filter()and runs entirely unfiltered on every architecture (including non-x86_64), because the allowlist would block theptraceand related syscallsstracedepends on. "Unfiltered" here means only seccomp is skipped:sandbox_exec()still callssetup_sandbox_environment()(sandbox.c:414-499) first, soPR_SET_NO_NEW_PRIVS, the chroot//procsetup, the capability bounding-set clear plusdrop_all_caps(), and theclearenv()followed by restoring onlyPATH=/bin:/usr/binandHOME=/still apply under--traceexactly as they do insandbox_main()(sandbox.c:602-638) — only the seccomp layer is omitted - Not a container runtime, but a tight, auditable educational sandbox
- For maximum isolation, use on a dedicated VM or test system
- If running untrusted code, combine with system-level controls (AppArmor, SELinux, VM isolation)
Build and run a minimal shell sandbox:
sudo ./sandbox /tmp/sandbox-root
# You are now in a sandboxed /bin/shRun an ELF binary with minimal rootfs. In normal target mode, sandbox_main() always executes /usr/bin/<basename> inside the sandbox, even if the host path you supplied was different:
sudo ./sandbox /tmp/sandbox-root /bin/echo hello
# inside the sandbox, the executed path and argv[0] are /usr/bin/echoTrace mode is different: sandbox_exec() still executes /usr/bin/strace, but the traced program path passed to strace stays /bin/echo when the original target was absolute:
sudo ./sandbox /tmp/sandbox-root /bin/echo --trace hello
# sandbox_exec() execv()s /usr/bin/strace
# strace then runs /bin/echo, so the traced program sees argv[0] == /bin/echo- Requires root unless
--usernsis used for rootless operation via user namespaces - Seccomp hardening is compiled only for x86_64; on non-x86_64 hosts the normal execution path refuses to run (
install_seccomp_filter()returns-1andsandbox_main()exits with status 1 beforeexecv()— the normal path does not fall back to running without seccomp). Inside the x86_64 compile block, the BPF filter loadsseccomp_data.archand returnsSECCOMP_RET_KILL_PROCESSwhenever the running kernel reports anything other thanAUDIT_ARCH_X86_64(sandbox.c:504-507), additionally kills on the x32 ABI by testing the syscall number against__X32_SYSCALL_BIT(sandbox.c:509-510), and ends with a trailingSECCOMP_RET_KILL_PROCESSso any syscall that does not match the allowlist is killed (sandbox.c:574). The--tracepath runstrace_main()(sandbox.c:595-600), which passestrace_argvtosandbox_exec()(sandbox.c:640-648);sandbox_exec()thenexecv()s/usr/bin/strace(itsargv[0], set atsandbox.c:876) and the traced program is launched bystraceas its first non-option argument (thetrace_targetslot attrace_argv[6],sandbox.c:882-883). This path deliberately skipsinstall_seccomp_filter()entirely, so--traceruns without any seccomp filter on every architecture (including non-x86_64). Note that seccomp is the only sandbox layer--traceskips:sandbox_exec()still callssetup_sandbox_environment()(sandbox.c:414-499) before the unfilteredexecv()of/usr/bin/strace, soPR_SET_NO_NEW_PRIVS, the chroot//procsetup, capability bounding-set clear plusdrop_all_caps(), and theclearenv()+PATH=/bin:/usr/bin/HOME=/reset all still run in--tracemode exactly as in the normalsandbox_main()path (sandbox.c:602-638) - No cgroup or resource limiting
- Network, IPC, and cgroup namespaces are not created — the clone-flag sets at
sandbox.c:811(shell mode),sandbox.c:887(--trace), andsandbox.c:933(normal target path) are allCLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS(withCLONE_NEWUSERadditionally OR'd in atsandbox.c:813/sandbox.c:935under--userns, but never at the--tracesite), with noCLONE_NEWNET,CLONE_NEWIPC, orCLONE_NEWCGROUP. The sandboxed process therefore shares the host's network stack (interfaces, routing, listening sockets, abstract-unix namespace), System V / POSIX IPC objects and POSIX message queues, and cgroup hierarchy with the host --usernsrequires unprivileged user namespaces to be enabled on the host and cannot be combined with--traceor--user. It writessetgroups=denyand maps namespace uid/gid0to the invoking caller's real uid/gid (getuid()/getgid()), so the sandboxed process isrootinside the namespace but keeps the caller's identity on the host
Pull requests and feature requests are welcome!
File issues or send PRs on GitHub.
This tool is for research purposes.
Do not rely on it for strong security isolation of malicious code in production environments.