Retry shim start without userns on clone failure by dmcgowan · Pull Request #150 · containerd/nerdbox

dmcgowan · 2026-04-07T23:01:58Z

cloneMntNs sets CLONE_NEWUSER|CLONE_NEWNS on the child, but clone can fail for reasons the proactive AppArmor sysctl check cannot detect — seccomp filters, other LSM policies, or EACCES when inherited socket fds cross the user namespace boundary after exec triggers capability recomputation.

Return whether namespace flags were set so the caller can distinguish a namespace-related Start failure from an unrelated one. On failure, rebuild the command without clone flags and retry, degrading gracefully to no mount isolation rather than failing the container start entirely.

Copilot

Pull request overview

This PR makes shim startup more resilient on Linux by retrying Start() without user/mount namespace clone flags when the initial start fails after enabling user namespaces, allowing the shim to degrade to no mount isolation instead of failing the container start.

Changes:

Change cloneMntNs to return a boolean indicating whether userns/mntns clone flags were applied.
On Unix, retry shim cmd.Start() without namespace isolation when the first start fails and cloneMntNs had enabled user namespaces.
Update non-Linux stub implementation to match the new cloneMntNs signature.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
internal/shim/manager/mount_other.go	Update non-Linux `cloneMntNs` stub to return `bool` (always `false`).
internal/shim/manager/mount_linux.go	Return whether userns/mntns clone flags were applied so callers can distinguish the path taken.
internal/shim/manager/manager_unix.go	Retry shim start without namespace isolation when a userns-enabled start fails; add logging for the fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T23:06:55Z

internal/shim/manager/manager_unix.go

+		cmd.ExtraFiles = append(cmd.ExtraFiles, sockets[0].f)
+		if opts.Debug && len(sockets) > 1 {
+			cmd.ExtraFiles = append(cmd.ExtraFiles, sockets[1].f)


The retry path rebuilds cmd.ExtraFiles by indexing into sockets (always appending sockets[0] and conditionally sockets[1]). This is tightly coupled to the current socket creation logic and will silently break if additional sockets are ever added (or ordering changes). Consider rebuilding ExtraFiles by iterating over the existing sockets slice (or factoring the socket-attachment logic into a helper) so the retry stays correct as the surrounding code evolves.

Suggested change

cmd.ExtraFiles = append(cmd.ExtraFiles, sockets[0].f)

if opts.Debug && len(sockets) > 1 {

cmd.ExtraFiles = append(cmd.ExtraFiles, sockets[1].f)

for _, socket := range sockets {

cmd.ExtraFiles = append(cmd.ExtraFiles, socket.f)

Copilot · 2026-04-07T23:06:55Z

internal/shim/manager/manager_unix.go

+	userns := cloneMntNs(ctx, cmd)

 	if err := cmd.Start(); err != nil {
-		return params, err
+		if !userns {
+			return params, err


The new fallback behavior (retrying cmd.Start() without namespace isolation when a userns-enabled start fails) isn’t covered by current tests. Consider adding an integration test that forces the first cmd.Start() to fail after cloneMntNs sets userns flags and asserts the retry succeeds and stdout remains parseable JSON.

cloneMntNs sets CLONE_NEWUSER|CLONE_NEWNS on the child, but clone can fail for reasons the proactive AppArmor sysctl check cannot detect — seccomp filters, other LSM policies, or EACCES when inherited socket fds cross the user namespace boundary after exec triggers capability recomputation. Return whether namespace flags were set so the caller can distinguish a namespace-related Start failure from an unrelated one. On failure, rebuild the command without clone flags and retry, degrading gracefully to no mount isolation rather than failing the container start entirely. Signed-off-by: Derek McGowan <derek@mcg.dev>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T06:31:12Z

internal/shim/manager/mount_linux.go

 // unprivileged user namespaces), the function logs a warning and the shim
 // will run without mount isolation.


The header comment says the function "logs a warning" when namespace creation is not possible, but the implementation explicitly avoids logging (and now returns false). Please update the comment to match actual behavior (e.g., "skips mount isolation" / "returns false") to avoid misleading future readers.

Suggested change

// unprivileged user namespaces), the function logs a warning and the shim

// will run without mount isolation.

// unprivileged user namespaces), the shim runs without mount isolation

// and this function returns false.

Copilot · 2026-04-11T06:31:13Z

internal/shim/manager/manager_unix.go

 	if err := cmd.Start(); err != nil {
-		return params, err
+		if !userns {
+			return params, err
+		}
+		// clone(CLONE_NEWUSER) can fail for reasons not covered by the
+		// proactive AppArmor check — e.g. seccomp filters, LSM policies,
+		// or EACCES from the child's capability recomputation when
+		// inherited socket fds cross the user namespace boundary after
+		// exec. Retry without namespace isolation rather than failing
+		// the container start.
+		//
+		// Note: we cannot log here — during "start" the logger output
+		// goes to stderr which containerd captures as part of the
+		// bootstrap response (CombinedOutput), corrupting the JSON.
+		cmd, err = newCommand(ctx, id, opts.Address, opts.TTRPCAddress, opts.Debug)
+		if err != nil {
+			return params, err
+		}
+		cmd.ExtraFiles = append(cmd.ExtraFiles, sockets[0].f)
+		if opts.Debug && len(sockets) > 1 {
+			cmd.ExtraFiles = append(cmd.ExtraFiles, sockets[1].f)
+		}
+		if err := cmd.Start(); err != nil {
+			return params, err
+		}


If the retry without userns also fails, the returned error currently only reflects the second failure, losing the original clone-related error which is often the key diagnostic. Preserve the first Start() error and wrap/aggregate it with the retry error so users can see both causes.

Copilot · 2026-04-11T06:31:13Z

internal/shim/manager/manager_unix.go

 	if err := cmd.Start(); err != nil {
-		return params, err
+		if !userns {
+			return params, err
+		}
+		// clone(CLONE_NEWUSER) can fail for reasons not covered by the
+		// proactive AppArmor check — e.g. seccomp filters, LSM policies,
+		// or EACCES from the child's capability recomputation when
+		// inherited socket fds cross the user namespace boundary after
+		// exec. Retry without namespace isolation rather than failing
+		// the container start.


New fallback behavior (retrying Start() without user+mount namespaces after a clone failure) is not covered by existing integration tests. Please add a test that forces clone(CLONE_NEWUSER) to fail (e.g., via a seccomp filter/LSM simulation in test harness if available) and asserts Start still succeeds and returns valid bootstrap params.

Copilot AI review requested due to automatic review settings April 7, 2026 23:01

Copilot started reviewing on behalf of dmcgowan April 7, 2026 23:02 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

dmcgowan force-pushed the retry-on-userns-failure branch from ccfb76d to c010af3 Compare April 8, 2026 23:21

Copilot AI review requested due to automatic review settings April 11, 2026 06:28

dmcgowan force-pushed the retry-on-userns-failure branch from c010af3 to f68d962 Compare April 11, 2026 06:28

Copilot started reviewing on behalf of dmcgowan April 11, 2026 06:28 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry shim start without userns on clone failure#150

Retry shim start without userns on clone failure#150
dmcgowan wants to merge 1 commit intocontainerd:mainfrom
dmcgowan:retry-on-userns-failure

dmcgowan commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// unprivileged user namespaces), the function logs a warning and the shim
		// will run without mount isolation.

Conversation

dmcgowan commented Apr 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants