-
Notifications
You must be signed in to change notification settings - Fork 349
feat(envd): run envd at SCHED_FIFO 1, reset user processes via wrapper #2684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c4098f1
bad6807
d02cab5
70c1d4c
1acfcc4
fbce53f
27c8fa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -96,13 +96,14 @@ func New( | |
| // User command string for logging (without the internal wrapper details). | ||
| userCmd := strings.Join(append([]string{req.GetProcess().GetCmd()}, req.GetProcess().GetArgs()...), " ") | ||
|
|
||
| // Wrap the command in a shell that sets the OOM score and nice value before exec-ing the actual command. | ||
| // This eliminates the race window where grandchildren could inherit the parent's protected OOM score (-1000) | ||
| // or high CPU priority (nice -20) before the post-start calls had a chance to correct them. | ||
| // nice(1) applies a relative adjustment, so we compute the delta from the current (inherited) nice to the target. | ||
| // Reset oom_score_adj, drop SCHED_FIFO via chrt, drop the SYS_NICE | ||
| // ambient cap, then apply nice. | ||
| niceDelta := defaultNice - currentNice() | ||
| oomWrapperScript := fmt.Sprintf(`echo %d > /proc/$$/oom_score_adj && exec /usr/bin/nice -n %d "${@}"`, defaultOomScore, niceDelta) | ||
| wrapperArgs := append([]string{"-c", oomWrapperScript, "--", req.GetProcess().GetCmd()}, req.GetProcess().GetArgs()...) | ||
| wrapperScript := fmt.Sprintf( | ||
| `echo %d > /proc/$$/oom_score_adj && exec /usr/bin/chrt --other 0 /usr/bin/setpriv --ambient-caps -all -- /usr/bin/nice -n %d "${@}"`, | ||
| defaultOomScore, niceDelta, | ||
| ) | ||
| wrapperArgs := append([]string{"-c", wrapperScript, "--", req.GetProcess().GetCmd()}, req.GetProcess().GetArgs()...) | ||
|
Comment on lines
+99
to
+106
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 envd version in Extended reasoning...The convention CLAUDE.md:133 states explicitly: "Version in Behavioral changes in this PR This PR contains three clearly non-docs changes to envd runtime behavior:
None of these are comment/doc-only. Yet Why this matters semantically (not just procedurally) The version constant is consumed by version-based feature detection in Proof / step-by-step
Fix Bump |
||
| cmd := exec.CommandContext(ctx, "/bin/sh", wrapperArgs...) | ||
|
|
||
| uid, gid, err := permissions.GetUserIdUints(user) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Nit: the wrapper's
setpriv --ambient-caps -allonly clears the ambient set, not the inheritable set. Go'sAmbientCapsmachinery raises CAP_SYS_NICE in both ambient and inheritable beforePR_CAP_AMBIENT_RAISE(see Go'ssyscall/exec_linux.goforkAndExecInChild1), and the inheritable bit survives execve. So the user command ends up withCAP_SYS_NICEstill in P(inh), contradicting the comment's stated intent ("drop the SYS_NICE ambient cap"). Trivial fix: append--inh-caps -allto thesetprivinvocation.Extended reasoning...
What's leaking
The wrapper script on
handler.go:102-105does:The stated intent (from the diff comment) is to "drop the SYS_NICE ambient cap, then apply nice". But after this chain runs, the final user process actually inherits
CAP_SYS_NICEin its inheritable capability set.Why
When
SysProcAttr.AmbientCapsis non-empty, Go does not just raise ambient. Looking at/usr/local/go/src/syscall/exec_linux.goforkAndExecInChild1 (lines 519–524):It then calls
capset()and only afterwardsPR_CAP_AMBIENT_RAISE. That's required by the kernel — you can't raise ambient unless the cap is already in both permitted and inheritable. So by the time/bin/shexec's, P(perm) and P(inh) both contain CAP_SYS_NICE.Across execve (capabilities(7)):
P'(inheritable) = P(inheritable)— the inheritable set is preserved, only permitted/effective get recomputed against file caps.setpriv --ambient-caps -allper its man page issuesPR_CAP_AMBIENT_LOWERfor each capability; it does not callcapset()and so does not touch the inheritable set. So after the entiresh → chrt → setpriv → nice → user_cmdchain, the user process still hasCAP_SYS_NICEin P(inh).Step-by-step proof
/bin/shwithAmbientCaps=[CAP_SYS_NICE]. Go raises permitted+inheritable, then ambient. After execve/bin/shhas: P(perm)={SYS_NICE}, P(inh)={SYS_NICE}, P(amb)={SYS_NICE}./bin/shexec'schrt— caps unchanged through execve except ambient bits get promoted into permitted+effective per ambient rules. Still P(inh)={SYS_NICE}.chrtexec'ssetpriv --ambient-caps -all. setpriv clears the ambient set → P(amb)={}. But P(inh) is untouched → P(inh)={SYS_NICE}.nice, then user_cmd. Across each execve, P(inh) is preserved. Final state for the user command: P(inh)={SYS_NICE}, P(amb)={}, P(perm)/P(eff) computed from F(inh)/F(perm) of the user binary.Verifiable from a shell inside the sandbox with
grep ^Cap /proc/self/status—CapInhwill be0000000000800000(bit 23 = CAP_SYS_NICE) instead of0000000000000000.Impact (why this is a nit, not a real bug)
For ordinary binaries with no file capabilities,
P'(permitted) = (F(perm) & bset) | (F(inh) & P(inh)) | P(amb)yields zero — F(inh) is empty, so the inheritable leak grants nothing. Exploitation requires a binary withCAP_SYS_NICE+iefile caps, and:CAP_SETFCAP).So in practice the leak is inert. I'm filing this as a nit rather than
normal: it's a defense-in-depth refinement that matches the comment's stated intent, not a user-visible escalation path. Acknowledging the refutation: yes, the refuter is correct that exploitability is essentially zero in the default sandbox, which is why I'm not arguing this blocks the merge.Fix
Optionally also clear
CAP_SYS_NICEfrom the bounding set (--bounding-set -allor scoped) for a tighter posture. With--inh-caps -all, P(inh) on the final user process is empty, matching the wrapper's stated intent.