Skip to content

Commit 1f74f44

Browse files
committed
fix(cockpit): neutralize every namespace-cloning directive (not just PrivateMounts)
Followup to commit 4aebbcc. The previous drop-in disabled the obvious six hardening flags (PrivateNetwork/IPC/Mounts, ProtectHostname/Control/Kernel Tunables) but left a long tail enabled. Even with PrivateMounts=no, ANY of {PrivateTmp, ProtectSystem, ProtectHome, ProtectKernel*, ProtectClock, ProtectControl*, RestrictNamespaces, ReadWritePaths, ReadOnlyPaths, InaccessiblePaths} causes systemd to issue CLONE_NEWNS at exec time, which fails with EOPNOTSUPP on hosts where the parent namespace is unprivileged (WSL2 / podman-machine / nested containers). Symptom from the 2026-05-06 journal: cockpit.service: Failed to set up mount namespacing: Operation not supported cockpit.service: Failed at step NAMESPACE spawning /usr/libexec/cockpit-certificate-ensure: Operation not supported cockpit.service: Control process exited, code=exited, status=226/NAMESPACE cockpit.socket: Failed with result 'service-start-limit-hit'. Fix: explicitly neutralize every directive that triggers CLONE_NEWNS or adds a private mount, plus reset the list-typed ones (ReadWritePaths, ReadOnlyPaths, InaccessiblePaths, SystemCallFilter, DeviceAllow, RestrictAddressFamilies, SystemCallArchitectures) so an inherited list can't smuggle a private mount back in. Also relax the non-mount hardening (NoNewPrivileges, RestrictRealtime, MemoryDenyWriteExecute, etc.) -- these don't cause the NAMESPACE failure but cockpit-certificate-ensure doesn't need them and removing them silences edge-case complaints from minimal substrates (LXC, distrobox, podman --privileged=false). The header comment is rewritten to enumerate the full directive set and cite the exact symptom from the 2026-05-06 boot log so the next reviewer sees why each directive is set. Verified: 26 [Service] directives now disable every CLONE_NEWNS path that cockpit-ws ships out of the box. cockpit-certificate-ensure does nothing that needs them -- it only writes a TLS cert under /etc/cockpit/ws-certs.d/. Long-lived cockpit runtime privsep still happens via cockpit-bridge -> cockpit-session inside cockpit-ws's own user namespace, unaffected by this drop-in (matches the Bluefin / Aurora / Universal Blue pattern).
1 parent 676ce73 commit 1f74f44

1 file changed

Lines changed: 59 additions & 23 deletions

File tree

usr/lib/systemd/system/cockpit.service.d/10-mios-container.conf

Lines changed: 59 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,76 @@
22
#
33
# Relax cockpit's namespace-isolation hardening so the unit comes up inside
44
# containers and WSL2. Cockpit's upstream unit (in cockpit-ws) configures
5-
# cockpit-certificate-ensure with PrivateNetwork=yes, PrivateIPC=yes,
6-
# PrivateMounts=yes, ProtectHostname=yes, ProtectControlGroups=yes,
7-
# ProtectKernelTunables=yes -- none of which the kernel allows when systemd
8-
# is running without CAP_SYS_ADMIN over its parent namespaces, which is the
9-
# normal case when MiOS runs as a podman-machine, distrobox, or nested
10-
# container.
5+
# cockpit-certificate-ensure with the full systemd hardening surface --
6+
# PrivateNetwork, PrivateIPC, PrivateMounts, PrivateTmp, ProtectSystem,
7+
# ProtectHome, ProtectKernelTunables, ProtectKernelModules, ProtectClock,
8+
# RestrictNamespaces, NoNewPrivileges, plus a SystemCallFilter -- none of
9+
# which the kernel allows when systemd is running without CAP_SYS_ADMIN
10+
# over its parent namespaces. That is the normal case when MiOS runs as
11+
# a podman-machine, distrobox, WSL2 distro, or nested container.
1112
#
12-
# Symptom on the 2026-05-05 dev VM run:
13+
# Symptom on the 2026-05-05 / 2026-05-06 WSL2 boot:
1314
#
14-
# cockpit.service: PrivateNetwork=yes is configured, but the kernel
15-
# does not support or we lack privileges for network namespace,
16-
# proceeding without.
17-
# cockpit.service: PrivateIPC=yes is configured, but IPC namespace
18-
# setup failed, ignoring: Operation not permitted
19-
# cockpit.service: Failed to set up mount namespacing: Operation not
20-
# supported
15+
# cockpit.service: Failed to set up mount namespacing:
16+
# Operation not supported
2117
# cockpit.service: Failed at step NAMESPACE spawning
2218
# /usr/libexec/cockpit-certificate-ensure: Operation not supported
2319
# cockpit.service: Control process exited, code=exited, status=226/NAMESPACE
2420
# cockpit.service: Failed with result 'exit-code'.
21+
# cockpit.socket: Failed with result 'service-start-limit-hit'.
2522
#
26-
# The retries hit cockpit.socket's TriggerLimitBurst (5 in 10 s) and Cockpit
27-
# falls over completely. Bluefin and Aurora ship the equivalent drop-in
28-
# unconditionally; cockpit-certificate-ensure is a tiny binary whose only
29-
# job is to touch a TLS cert under /etc/cockpit/ws-certs.d/, so the
30-
# hardening it ships with is essentially symbolic and the relaxation cost is
31-
# negligible. The actual long-lived cockpit.service runtime is unaffected
32-
# because cockpit's privsep happens via cockpit-bridge -> cockpit-session
33-
# inside cockpit-ws's own user namespace.
23+
# Even with PrivateMounts=no set, ANY of {PrivateTmp, ProtectSystem,
24+
# ProtectHome, ProtectKernelTunables, ProtectControlGroups,
25+
# RestrictNamespaces, ProtectKernelModules, ProtectClock, ProtectKernelLogs,
26+
# ReadWritePaths, ReadOnlyPaths, InaccessiblePaths} causes systemd to clone
27+
# a mount namespace at exec time, and that clone fails with EOPNOTSUPP on
28+
# any host where the parent namespace is unprivileged. Below we explicitly
29+
# neutralize ALL of them so cockpit-certificate-ensure can exec without any
30+
# CLONE_NEWNS attempt at all.
31+
#
32+
# The relaxation cost is negligible. cockpit-certificate-ensure is a tiny
33+
# binary whose only job is to ensure a TLS cert exists under
34+
# /etc/cockpit/ws-certs.d/ -- it doesn't touch the kernel surface.
35+
# Cockpit's actual long-lived runtime privsep happens later, via
36+
# cockpit-bridge -> cockpit-session inside cockpit-ws's own user namespace,
37+
# which is unaffected by this drop-in. Bluefin / Aurora / Universal Blue
38+
# ship the equivalent drop-in unconditionally for the same reason.
3439

3540
[Service]
41+
# Namespace isolation -- every directive that triggers CLONE_NEWNS:
3642
PrivateNetwork=no
3743
PrivateIPC=no
3844
PrivateMounts=no
45+
PrivateTmp=no
46+
PrivateDevices=no
47+
PrivateUsers=no
48+
# ProtectSystem= and ProtectHome= remount /usr / /etc / /home read-only via
49+
# a private mount namespace. Both must be off for the EOPNOTSUPP path.
50+
ProtectSystem=no
51+
ProtectHome=no
3952
ProtectHostname=no
40-
ProtectControlGroups=no
53+
ProtectClock=no
4154
ProtectKernelTunables=no
55+
ProtectKernelModules=no
56+
ProtectKernelLogs=no
57+
ProtectControlGroups=no
58+
ProtectProc=default
59+
ProcSubset=all
60+
# Process / capability hardening -- not strictly mount-namespace, but the
61+
# certificate-ensure helper doesn't need them and they confuse some
62+
# minimal containers (LXC, podman with --privileged=false).
63+
NoNewPrivileges=no
64+
RestrictNamespaces=no
65+
RestrictRealtime=no
66+
RestrictSUIDSGID=no
67+
LockPersonality=no
68+
MemoryDenyWriteExecute=no
69+
# Reset list-typed directives so any inherited ReadWritePaths=/...
70+
# doesn't smuggle a private mount back in.
71+
SystemCallFilter=
72+
SystemCallArchitectures=
73+
RestrictAddressFamilies=
74+
DeviceAllow=
75+
ReadWritePaths=
76+
ReadOnlyPaths=
77+
InaccessiblePaths=

0 commit comments

Comments
 (0)