Skip to content

Commit cf51779

Browse files
JAORMXclaude
andcommitted
Add guest-side seccomp BPF filter to harden package
Add ApplySeccomp() to guest/harden which installs a two-tier seccomp blocklist: kill-on-sight for exploit indicators (io_uring, ptrace, bpf, kexec, module loading) and EPERM for operational blocks (mount, namespaces, perf, keyring, landlock, dangerous socket families). Integrate into boot.Run() via WithSeccomp(true) option for consumers that want the simple path. Consumers needing custom post-boot privileged ops can call harden.ApplySeccomp() directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1c4cf39 commit cf51779

File tree

8 files changed

+516
-26
lines changed

8 files changed

+516
-26
lines changed

docs/SECURITY.md

Lines changed: 58 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -151,12 +151,19 @@ process, providing natural process-level separation.
151151
For security-critical deployments, layer additional isolation around
152152
the runner process:
153153

154-
### seccomp (recommended)
154+
### seccomp — guest-side (built-in)
155+
156+
The `guest/harden` package provides a built-in seccomp BPF blocklist
157+
filter for the guest VM. Enable it with `boot.WithSeccomp(true)` or
158+
call `harden.ApplySeccomp()` directly. See [Seccomp BPF filter](#seccomp-bpf-filter)
159+
for the full blocklist.
160+
161+
### seccomp — runner-side (recommended)
155162

156163
Restrict the runner's syscalls to only what libkrun and gvisor-tap-vsock
157-
need. This is the highest-impact hardening measure. After a guest escape,
158-
the attacker lands in a seccomp jail that limits what syscalls they can
159-
make.
164+
need. This is the highest-impact host-side hardening measure. After a
165+
guest escape, the attacker lands in a seccomp jail that limits what
166+
syscalls they can make.
160167

161168
```
162169
Runner process
@@ -205,8 +212,8 @@ lifetime with no extra coordination.
205212

206213
## Guest Hardening
207214

208-
The `guest/harden` package provides reusable kernel and capability
209-
hardening for microVM init processes. It is guest-side code
215+
The `guest/harden` package provides reusable kernel, capability, and
216+
syscall hardening for microVM init processes. It is guest-side code
210217
(`//go:build linux`) with no CGO or krun dependencies.
211218

212219
### Recommended usage
@@ -217,6 +224,10 @@ Call the hardening functions in your guest init boot sequence:
217224
2. Call `harden.KernelDefaults(logger)` to apply sysctls.
218225
3. Perform all privileged operations (mounts, network config, chown).
219226
4. Call `harden.DropBoundingCaps(keep...)` as the last privileged step.
227+
5. Call `harden.ApplySeccomp()` as the very last hardening step.
228+
229+
Alternatively, use `boot.WithSeccomp(true)` to have the boot sequence
230+
apply the seccomp filter automatically after SSH starts.
220231

221232
### Kernel sysctls
222233

@@ -267,6 +278,47 @@ Note: `no_new_privs` does not affect `setresuid`/`setresgid` syscalls
267278
used by Go's `SysProcAttr.Credential` — credential switching for SSH
268279
sessions continues to work after the bit is set.
269280

281+
### Seccomp BPF filter
282+
283+
`ApplySeccomp()` installs a seccomp BPF blocklist filter on all OS
284+
threads. The filter uses `SECCOMP_FILTER_FLAG_TSYNC` to synchronize
285+
across threads and sets `no_new_privs`. Once applied, it is inherited
286+
by all child processes and cannot be removed.
287+
288+
The blocklist is split into two tiers:
289+
290+
**Exploitation indicators** (`SECCOMP_RET_KILL_PROCESS`): syscalls that
291+
legitimate workloads never call — any attempt strongly indicates
292+
active exploitation.
293+
294+
| Syscall | Reason |
295+
|---------|--------|
296+
| `io_uring_setup/enter/register` | Prolific source of kernel CVEs |
297+
| `ptrace`, `process_vm_readv/writev` | Process debugging / cross-process memory access |
298+
| `kexec_load/file_load` | Kernel replacement |
299+
| `init_module/finit_module/delete_module` | Kernel module loading |
300+
| `bpf` | eBPF — kernel attack surface |
301+
| `seccomp` | Prevent installing additional filters |
302+
303+
**Operational blocks** (`SECCOMP_RET_ERRNO`/`EPERM`): syscalls that
304+
runtimes or build tools might innocuously probe — fail gracefully.
305+
306+
| Category | Syscalls |
307+
|----------|----------|
308+
| Filesystem manipulation | `mount`, `umount2`, `pivot_root`, `chroot`, `fsopen`, `fsconfig`, `fspick`, `move_mount`, `open_tree` |
309+
| Namespace creation | `unshare`, `setns`, `clone` (with `CLONE_NEW*` flags) |
310+
| Side-channel attacks | `perf_event_open`, `userfaultfd` |
311+
| Kernel keyring | `add_key`, `request_key`, `keyctl` |
312+
| Landlock | `landlock_create_ruleset`, `landlock_add_rule`, `landlock_restrict_self` |
313+
| Dangerous socket families | `AF_NETLINK`, `AF_PACKET`, `AF_KEY`, `AF_ALG`, `AF_VSOCK` |
314+
315+
`AF_INET`, `AF_INET6`, and `AF_UNIX` remain allowed for normal
316+
workload operation.
317+
318+
Note: `clone3` is NOT blocked because glibc 2.34+ uses it for
319+
`fork()`. Its namespace flags live inside a struct pointer (`arg0`)
320+
which seccomp BPF cannot inspect.
321+
270322
### Filesystem hardening
271323

272324
Consumers should lock down `/root/` (mode `0700`) after completing

go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ require (
99
github.com/cenkalti/backoff/v5 v5.0.3
1010
github.com/containers/gvisor-tap-vsock v0.8.8
1111
github.com/creack/pty v1.1.24
12+
github.com/elastic/go-seccomp-bpf v1.6.0
1213
github.com/gofrs/flock v0.13.0
1314
github.com/google/go-containerregistry v0.21.2
1415
github.com/klauspost/compress v1.18.4

go.sum

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ github.com/docker/go-connections v0.5.0 h1:USnMq7hx7gwdVZq1L49hLXaFtUdTADjXGp+uj
3939
github.com/docker/go-connections v0.5.0/go.mod h1:ov60Kzw0kKElRwhNs9UlUHAE/F9Fe6GLaXnqyDdmEXc=
4040
github.com/docker/go-units v0.5.0 h1:69rxXcBk27SvSaaxTtLh/8llcHD8vYHT7WSdRZ/jvr4=
4141
github.com/docker/go-units v0.5.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk=
42+
github.com/elastic/go-seccomp-bpf v1.6.0 h1:NYduiYxRJ0ZkIyQVwlSskcqPPSg6ynu5pK0/d7SQATs=
43+
github.com/elastic/go-seccomp-bpf v1.6.0/go.mod h1:5tFsTvH4NtWGfpjsOQD53H8HdVQ+zSZFRUDSGevC0Kc=
4244
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
4345
github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U=
4446
github.com/foxcpp/go-mockdns v1.2.0 h1:omK3OrHRD1IWJz1FuFBCFquhXslXoF17OvBS6JPzZF0=
@@ -148,8 +150,6 @@ go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
148150
go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
149151
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
150152
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
151-
golang.org/x/crypto v0.48.0 h1:/VRzVqiRSggnhY7gNRxPauEQ5Drw9haKdM0jqfcCFts=
152-
golang.org/x/crypto v0.48.0/go.mod h1:r0kV5h3qnFPlQnBSrULhlsRfryS2pmewsg+XfMgkVos=
153153
golang.org/x/crypto v0.49.0 h1:+Ng2ULVvLHnJ/ZFEq4KdcDd/cfjrrjjNSXNzxg0Y4U4=
154154
golang.org/x/crypto v0.49.0/go.mod h1:ErX4dUh2UM+CFYiXZRTcMpEcN8b/1gxEuv3nODoYtCA=
155155
golang.org/x/lint v0.0.0-20200302205851-738671d3881b/go.mod h1:3xt1FjdF8hUf6vQPIChWIBhFzV8gjjsPE/fR3IyQdNY=
@@ -158,29 +158,23 @@ golang.org/x/mod v0.33.0 h1:tHFzIWbBifEmbwtGz65eaWyGiGZatSrT9prnU8DbVL8=
158158
golang.org/x/mod v0.33.0/go.mod h1:swjeQEj+6r7fODbD2cqrnje9PnziFuw4bmLbBZFrQ5w=
159159
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
160160
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
161-
golang.org/x/net v0.50.0 h1:ucWh9eiCGyDR3vtzso0WMQinm2Dnt8cFMuQa9K33J60=
162-
golang.org/x/net v0.50.0/go.mod h1:UgoSli3F/pBgdJBHCTc+tp3gmrU4XswgGRgtnwWTfyM=
163161
golang.org/x/net v0.51.0 h1:94R/GTO7mt3/4wIKpcR5gkGmRLOuE/2hNGeWq/GBIFo=
164162
golang.org/x/net v0.51.0/go.mod h1:aamm+2QF5ogm02fjy5Bb7CQ0WMt1/WVM7FtyaTLlA9Y=
165163
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
166-
golang.org/x/sync v0.19.0 h1:vV+1eWNmZ5geRlYjzm2adRgW2/mcpevXNg50YZtPCE4=
167-
golang.org/x/sync v0.19.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
168164
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
169165
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
170166
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
171167
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
172168
golang.org/x/sys v0.0.0-20210616094352-59db8d763f22/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
173169
golang.org/x/sys v0.2.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
174170
golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
175-
golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
176-
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
177171
golang.org/x/sys v0.42.0 h1:omrd2nAlyT5ESRdCLYdm3+fMfNFE/+Rf4bDIQImRJeo=
178172
golang.org/x/sys v0.42.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
179-
golang.org/x/term v0.40.0 h1:36e4zGLqU4yhjlmxEaagx2KuYbJq3EwY8K943ZsHcvg=
180-
golang.org/x/term v0.40.0/go.mod h1:w2P8uVp06p2iyKKuvXIm7N/y0UCRt3UfJTfZ7oOpglM=
173+
golang.org/x/term v0.41.0 h1:QCgPso/Q3RTJx2Th4bDLqML4W6iJiaXFq2/ftQF13YU=
174+
golang.org/x/term v0.41.0/go.mod h1:3pfBgksrReYfZ5lvYM0kSO0LIkAl4Yl2bXOkKP7Ec2A=
181175
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
182-
golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk=
183-
golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA=
176+
golang.org/x/text v0.35.0 h1:JOVx6vVDFokkpaq1AEptVzLTpDe9KGpj5tR4/X+ybL8=
177+
golang.org/x/text v0.35.0/go.mod h1:khi/HExzZJ2pGnjenulevKNX1W67CUy0AsXcNubPGCA=
184178
golang.org/x/time v0.5.0 h1:o7cqy6amK/52YcAKIPlM3a+Fpj35zvRj2TP+e1xFSfk=
185179
golang.org/x/time v0.5.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM=
186180
golang.org/x/tools v0.0.0-20200130002326-2f3ba24bd6e7/go.mod h1:TB2adYChydJhpapKDTa4BR/hXlZSLoq2Wpct/0txZ28=

guest/boot/boot.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ import (
3636
// 8. Parse SSH authorized keys
3737
// 9. Drop bounding capabilities + set no_new_privs
3838
// 10. Start SSH server
39+
// 11. Apply seccomp BPF filter (if enabled via [WithSeccomp])
3940
func Run(logger *slog.Logger, opts ...Option) (shutdown func(), err error) {
4041
cfg := defaultConfig()
4142
for _, o := range opts {
@@ -156,6 +157,15 @@ func Run(logger *slog.Logger, opts ...Option) (shutdown func(), err error) {
156157
}
157158
}()
158159

160+
// 11. Apply seccomp BPF filter (if enabled). This must be the last
161+
// step — all mounts, networking, and privileged operations are done.
162+
if cfg.seccomp {
163+
logger.Info("applying seccomp BPF filter")
164+
if err := harden.ApplySeccomp(); err != nil {
165+
return nil, fmt.Errorf("applying seccomp filter: %w", err)
166+
}
167+
}
168+
159169
logger.Info("sandbox init ready", "ssh_port", cfg.sshPort)
160170

161171
return func() { srv.Close() }, nil

guest/boot/options.go

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ type config struct {
3232
userGID uint32
3333
lockdownRoot bool
3434
sshAgentForwarding bool
35+
seccomp bool
3536
}
3637

3738
func defaultConfig() *config {
@@ -114,3 +115,19 @@ func WithSSHHostKeyPath(path string) Option {
114115
func WithSSHAgentForwarding(enabled bool) Option {
115116
return optionFunc(func(c *config) { c.sshAgentForwarding = enabled })
116117
}
118+
119+
// WithSeccomp controls whether a seccomp BPF blocklist filter is
120+
// applied as the last step of the boot sequence. When enabled, the
121+
// filter blocks dangerous syscalls (io_uring, ptrace, bpf, mount,
122+
// namespace creation, etc.) while allowing normal workload operation.
123+
//
124+
// The filter is applied after the SSH server starts so all privileged
125+
// operations are already complete. It is inherited by all child
126+
// processes via fork+exec.
127+
//
128+
// Consumers that need to perform additional privileged operations
129+
// after boot (e.g. custom mounts) should leave this disabled and
130+
// call [harden.ApplySeccomp] manually at the appropriate time.
131+
func WithSeccomp(enabled bool) Option {
132+
return optionFunc(func(c *config) { c.seccomp = enabled })
133+
}

guest/harden/doc.go

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,14 @@
33

44
//go:build linux
55

6-
// Package harden provides guest-side kernel and capability hardening for
7-
// microVM init processes. It restricts kernel information leaks, limits
8-
// unprivileged access to dangerous subsystems, and drops unneeded
9-
// capabilities from the bounding set.
6+
// Package harden provides guest-side kernel, capability, and syscall
7+
// hardening for microVM init processes. It restricts kernel information
8+
// leaks, limits unprivileged access to dangerous subsystems, drops
9+
// unneeded capabilities from the bounding set, and installs a seccomp
10+
// BPF filter that blocks dangerous syscalls.
1011
//
11-
// Consumers (e.g. apiary-init) should call [KernelDefaults] early in the
12-
// boot sequence (after /proc is mounted) and [DropBoundingCaps] last,
13-
// just before starting the workload, so that all privileged operations
14-
// (mounts, network config, chown) are already complete.
12+
// Consumers should call [KernelDefaults] early in the boot sequence
13+
// (after /proc is mounted), [DropBoundingCaps] after all privileged
14+
// operations are complete, and [ApplySeccomp] as the very last
15+
// hardening step — after mounts, networking, and SSH are ready.
1516
package harden

0 commit comments

Comments
 (0)