Skip to content

feat(envd): run envd at SCHED_FIFO 1, reset user processes via wrapper#2684

Closed
ValentaTomas wants to merge 7 commits into
mainfrom
feat/envd-rt-priority
Closed

feat(envd): run envd at SCHED_FIFO 1, reset user processes via wrapper#2684
ValentaTomas wants to merge 7 commits into
mainfrom
feat/envd-rt-priority

Conversation

@ValentaTomas

@ValentaTomas ValentaTomas commented May 17, 2026

Copy link
Copy Markdown
Member

envd at SCHED_FIFO priority 1; user processes reset to SCHED_OTHER via wrapper + AmbientCaps[CAP_SYS_NICE] + setpriv. socat keeps RT intentionally.

@cla-bot cla-bot Bot added the cla-signed label May 17, 2026
@cursor

cursor Bot commented May 17, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes process launch wrapper to manipulate scheduling, capabilities, and OOM adjustment; failures or missing binaries could prevent user processes from starting or alter runtime priorities unexpectedly.

Overview
User process startup now wraps commands with /bin/sh to write oom_score_adj, force SCHED_OTHER via /usr/bin/chrt, drop ambient caps via /usr/bin/setpriv, then apply nice; this hard-depends on those absolute paths and the script doesn’t handle failures, so missing tools or permission errors writing /proc/$$/oom_score_adj can break process execution. Port-forward socat is explicitly left inheriting envd’s SCHED_FIFO/high priority, which increases the risk of CPU starvation if it misbehaves.

Reviewed by Cursor Bugbot for commit 27c8fa0. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov

codecov Bot commented May 17, 2026

Copy link
Copy Markdown

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
2620 6 2614 7
View the full list of 8 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig

Flake rate in main: 76.90% (Passed 237 times, Failed 789 times)

Stack Traces | 30.1s run time
=== RUN   TestUpdateNetworkConfig
=== PAUSE TestUpdateNetworkConfig
=== CONT  TestUpdateNetworkConfig
--- FAIL: TestUpdateNetworkConfig (30.09s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false

Flake rate in main: 77.43% (Passed 228 times, Failed 782 times)

Stack Traces | 2.75s run time
=== RUN   TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
Executing command curl in sandbox ikel3c2dl5cuhpdicmond
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1363}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
Executing command curl in sandbox ikel3c2dl5cuhpdicmond
    sandbox_network_update_test.go:372: Command [curl] output: event:{start:{pid:1364}}
    sandbox_network_update_test.go:372: Command [curl] output: event:{end:{exit_code:35 exited:true status:"exit status 35" error:"exit status 35"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{start:{pid:1365}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{data:{stdout:"HTTP/2 302 \r\nx-content-type-options: nosniff\r\nlocation: https://dns.google/\r\ndate: Mon, 18 May 2026 08:42:33 GMT\r\ncontent-type: text/html; charset=UTF-8\r\nserver: HTTP server (unknown)\r\ncontent-length: 216\r\nx-xss-protection: 0\r\nx-frame-options: SAMEORIGIN\r\nalt-svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000\r\n\r\n"}}
    sandbox_network_update_test.go:391: Command [curl] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_network_update_test.go:391: Command [curl] completed successfully in sandbox ikel3c2dl5cuhpdicmond
    sandbox_network_update_test.go:391: 
        	Error Trace:	.../api/sandboxes/sandbox_network_out_test.go:74
        	            				.../api/sandboxes/sandbox_network_update_test.go:60
        	            				.../api/sandboxes/sandbox_network_update_test.go:391
        	Error:      	An error is expected but got nil.
        	Test:       	TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false
        	Messages:   	https://8.8.8.8 should be blocked
--- FAIL: TestUpdateNetworkConfig/pause_resume_preserves_allow_internet_access_false (2.75s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV

Flake rate in main: 59.71% (Passed 226 times, Failed 335 times)

Stack Traces | 0s run time
=== RUN   TestTemplateBuildENV
=== PAUSE TestTemplateBuildENV
=== CONT  TestTemplateBuildENV
--- FAIL: TestTemplateBuildENV (0.00s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/templates::TestTemplateBuildENV/ENV_with_multiline_value

Flake rate in main: 60.25% (Passed 219 times, Failed 332 times)

Stack Traces | 8.1s run time
=== RUN   TestTemplateBuildENV/ENV_with_multiline_value
=== PAUSE TestTemplateBuildENV/ENV_with_multiline_value
=== CONT  TestTemplateBuildENV/ENV_with_multiline_value
    build_template_test.go:134: test-ubuntu-env-multiline: [info] Building template 1wbydksbakey0lttsl6e/8fa01c59-ed99-416f-b150-b73f2a76fb03
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] FROM ubuntu:22.04 [ffd709f131f42dfab282de47a91dd2c139e900c1c11fc574b49b517a05ef0a32]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] CACHED [base] DEFAULT USER user [90bdd4afa342293c931373351bf578872dec9179214ba3e8bf9edba311466213]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 1/2] ENV MULTILINE line1
        line2
        line3 [e93da3f3765f20eb6407c336b9e4e0b9321d994ec5f6cb547743a2a4070eed23]
    build_template_test.go:134: test-ubuntu-env-multiline: [info] [builder 2/2] RUN [[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1 [477610d61cdf858776262d3331809539bcbcf16f706aac18515a57337bae1786]
    build_template_test.go:134: test-ubuntu-env-multiline: [error] Build failed: failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1
    build_template_test.go:374: Build failed: {<nil> failed to run command '[[ $(echo "$MULTILINE" | wc -l) -eq 3 ]] || exit 1': exit status 1 0xc00078e4f0}
--- FAIL: TestTemplateBuildENV/ENV_with_multiline_value (8.10s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir

Flake rate in main: 53.92% (Passed 282 times, Failed 330 times)

Stack Traces | 0.39s run time
=== RUN   TestListDir
=== PAUSE TestListDir
=== CONT  TestListDir
--- FAIL: TestListDir (0.39s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestListDir/depth_1_lists_root_directory

Flake rate in main: 57.87% (Passed 225 times, Failed 309 times)

Stack Traces | 0.02s run time
=== RUN   TestListDir/depth_1_lists_root_directory
=== PAUSE TestListDir/depth_1_lists_root_directory
=== CONT  TestListDir/depth_1_lists_root_directory
    filesystem_test.go:97: 
        	Error Trace:	.../tests/envd/filesystem_test.go:97
        	Error:      	Received unexpected error:
        	            	unavailable: 502 Bad Gateway
        	Test:       	TestListDir/depth_1_lists_root_directory
--- FAIL: TestListDir/depth_1_lists_root_directory (0.02s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 66.28% (Passed 235 times, Failed 462 times)

Stack Traces | 77s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:26: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (77.04s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 66.96% (Passed 225 times, Failed 456 times)

Stack Traces | 27.1s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1257}}
Executing command bash in sandbox ix4qd9cxp7m0zl9x17yu0 (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 187 MB\nFree memory before tmpfs mount: 797 MB\nMemory to use in integrity test (80% of free, min 64MB): 637 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"637+0 records in\n637+0 records out\n667942912 bytes (668 MB, 637 MiB) copied, 3.8264 s, 175 MB/s\n\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=637\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 3.79\n\tPercent of CPU this job got: 98%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:03.84\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2712\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 344\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 16\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 829 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox i27lxycc3q25q44q9ml41
Executing command bash in sandbox i27lxycc3q25q44q9ml41 (user: root)
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{start:{pid:1273}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{data:{stdout:"bd3c2c65643153dc8bd752ed351c0271e6974987ad2be5bf36cfdf1947444350\n"}}
    sandbox_memory_integrity_test.go:74: Command [bash] output: event:{end:{exited:true status:"exit status 0"}}
    sandbox_memory_integrity_test.go:74: Command [bash] completed successfully in sandbox i27lxycc3q25q44q9ml41
Executing command bash in sandbox i27lxycc3q25q44q9ml41 (user: root)
    sandbox_memory_integrity_test.go:99: Command [bash] output: event:{start:{pid:1276}}
    sandbox_memory_integrity_test.go:100: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:100
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox i27lxycc3q25q44q9ml41: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (27.06s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The AmbientCaps field in syscall.SysProcAttr requires a slice of uintptr, so unix.CAP_SYS_NICE must be cast to uintptr to prevent a compilation error. The systemd unit also requires AmbientCapabilities=CAP_SYS_NICE to ensure the capability is preserved across user ID changes, as the permitted set is otherwise cleared before ambient capabilities are set.

Comment thread packages/envd/internal/services/process/handler/handler_caps_linux.go Outdated
Comment thread packages/orchestrator/pkg/template/build/core/rootfs/files/envd.service.tpl Outdated
@ValentaTomas

Copy link
Copy Markdown
Member Author

Superseded by #2700 (move envd into a dedicated network namespace), which addresses the root cause.

@ValentaTomas ValentaTomas reopened this May 17, 2026
@ValentaTomas ValentaTomas marked this pull request as ready for review May 18, 2026 07:00
ValentaTomas and others added 5 commits May 18, 2026 00:23
Mirrors envd's existing Nice=-20 CPU priority with a real-time scheduling
class so envd preempts customer SCHED_OTHER work during pause/resume
storms. Uses the lowest RT priority (1) so envd cannot starve kernel
threads or other RT services; the default kernel.sched_rt_runtime_us
throttle (95% per 1s) caps total RT bandwidth so a hypothetical envd
busy-loop cannot DoS the system.

User-spawned processes are reset to SCHED_OTHER (nice 0, no ambient
caps) via the existing /bin/sh wrapper: CAP_SYS_NICE is passed through
setuid via SysProcAttr.AmbientCaps so chrt(1) can drop the RT policy,
then setpriv(1) strips the ambient cap so the user command cannot
re-raise itself.

socat intentionally inherits SCHED_FIFO + Nice=-20: port forwarding is
infrastructure-critical and dropping connections under load is much
worse than the small RT budget cost.
@ValentaTomas ValentaTomas force-pushed the feat/envd-rt-priority branch from c00ceb7 to 1acfcc4 Compare May 18, 2026 07:23

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1acfcc4e93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/envd/internal/services/process/handler/handler_caps_linux.go Outdated
@ValentaTomas ValentaTomas marked this pull request as draft May 18, 2026 07:57
@ValentaTomas ValentaTomas marked this pull request as ready for review May 18, 2026 08:05

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Comment thread packages/envd/internal/services/process/handler/handler.go Outdated
Comment on lines +99 to +106
// Reset oom_score_adj, drop SCHED_FIFO via chrt, drop the SYS_NICE
// ambient cap, then apply nice.
niceDelta := defaultNice - currentNice()
oomWrapperScript := fmt.Sprintf(`echo %d > /proc/$$/oom_score_adj && exec /usr/bin/nice -n %d "${@}"`, defaultOomScore, niceDelta)
wrapperArgs := append([]string{"-c", oomWrapperScript, "--", req.GetProcess().GetCmd()}, req.GetProcess().GetArgs()...)
wrapperScript := fmt.Sprintf(
`echo %d > /proc/$$/oom_score_adj && exec /usr/bin/chrt --other 0 /usr/bin/setpriv --ambient-caps -all -- /usr/bin/nice -n %d "${@}"`,
defaultOomScore, niceDelta,
)
wrapperArgs := append([]string{"-c", wrapperScript, "--", req.GetProcess().GetCmd()}, req.GetProcess().GetArgs()...)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 envd version in packages/envd/pkg/version.go is still 0.5.24 but this PR introduces three behavioral envd changes (chrt+setpriv wrapper for user processes, CAP_SYS_NICE ambient cap, SCHED_FIFO scheduling for envd itself). CLAUDE.md:133 and packages/envd/README.md both explicitly require a version bump on every behavioral change. Bump to 0.5.25.

Extended reasoning...

The convention

CLAUDE.md:133 states explicitly: "Version in pkg/version.go must be bumped on every behavioral change (not comments/docs-only changes)". packages/envd/README.md echoes the same rule. Recent git history shows this is followed strictly — every recent envd PR bumped the version: #2683 (0.5.23 → 0.5.24, NFS umount), #2680 (0.5.22 → 0.5.23), #2676 (0.5.21 → 0.5.22).

Behavioral changes in this PR

This PR contains three clearly non-docs changes to envd runtime behavior:

  1. handler.go:99-103 — user process wrapper now exec's /usr/bin/chrt --other 0 /usr/bin/setpriv --ambient-caps -all -- /usr/bin/nice instead of just /usr/bin/nice.
  2. handler_caps_linux.go (new) — adds CAP_SYS_NICE to user processes' AmbientCaps.
  3. envd.service.tpl:20-21 — adds CPUSchedulingPolicy=fifo and CPUSchedulingPriority=1 to the envd systemd unit.

None of these are comment/doc-only. Yet packages/envd/pkg/version.go still reads const Version = "0.5.24", unchanged from before this PR.

Why this matters semantically (not just procedurally)

The version constant is consumed by version-based feature detection in packages/shared/pkg/utils/version.go (IsGTEVersion, CheckEnvdVersionForSnapshot, MinEnvdVersionForSnapshot) and the orchestrator pins behavior to envd version via -envd-version. Skipping the bump means callers cannot distinguish the new wrapper/scheduling behavior from the previous build, so any feature-gated logic that ought to key on these changes will silently misclassify rootfs images carrying this code.

Proof / step-by-step

  1. cat packages/envd/pkg/version.goconst Version = "0.5.24".
  2. git log --oneline packages/envd/pkg/version.go | head shows monotonic bumps tied to every prior envd-touching PR (0.5.19 → 0.5.24 in five PRs).
  3. git diff main -- packages/envd packages/orchestrator/pkg/template/build/core/rootfs/files/envd.service.tpl shows the three behavioral hunks above.
  4. git diff main -- packages/envd/pkg/version.go is empty → convention violated.

Fix

Bump packages/envd/pkg/version.go to const Version = "0.5.25" (or higher) so the new wrapper/scheduling behavior is identifiable from the version constant. Trivial one-line change.

Comment on lines +102 to +105
wrapperScript := fmt.Sprintf(
`echo %d > /proc/$$/oom_score_adj && exec /usr/bin/chrt --other 0 /usr/bin/setpriv --ambient-caps -all -- /usr/bin/nice -n %d "${@}"`,
defaultOomScore, niceDelta,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: the wrapper's setpriv --ambient-caps -all only clears the ambient set, not the inheritable set. Go's AmbientCaps machinery raises CAP_SYS_NICE in both ambient and inheritable before PR_CAP_AMBIENT_RAISE (see Go's syscall/exec_linux.go forkAndExecInChild1), and the inheritable bit survives execve. So the user command ends up with CAP_SYS_NICE still in P(inh), contradicting the comment's stated intent ("drop the SYS_NICE ambient cap"). Trivial fix: append --inh-caps -all to the setpriv invocation.

Extended reasoning...

What's leaking

The wrapper script on handler.go:102-105 does:

exec /usr/bin/chrt --other 0 /usr/bin/setpriv --ambient-caps -all -- /usr/bin/nice -n N USER_CMD

The stated intent (from the diff comment) is to "drop the SYS_NICE ambient cap, then apply nice". But after this chain runs, the final user process actually inherits CAP_SYS_NICE in its inheritable capability set.

Why

When SysProcAttr.AmbientCaps is non-empty, Go does not just raise ambient. Looking at /usr/local/go/src/syscall/exec_linux.go forkAndExecInChild1 (lines 519–524):

// Add the c capability to the permitted and inheritable capability mask,
// otherwise we will not be able to add it to the ambient capability mask.
caps.data[capToIndex(c)].permitted   |= capToMask(c)
caps.data[capToIndex(c)].inheritable |= capToMask(c)

It then calls capset() and only afterwards PR_CAP_AMBIENT_RAISE. That's required by the kernel — you can't raise ambient unless the cap is already in both permitted and inheritable. So by the time /bin/sh exec's, P(perm) and P(inh) both contain CAP_SYS_NICE.

Across execve (capabilities(7)): P'(inheritable) = P(inheritable) — the inheritable set is preserved, only permitted/effective get recomputed against file caps.

setpriv --ambient-caps -all per its man page issues PR_CAP_AMBIENT_LOWER for each capability; it does not call capset() and so does not touch the inheritable set. So after the entire sh → chrt → setpriv → nice → user_cmd chain, the user process still has CAP_SYS_NICE in P(inh).

Step-by-step proof

  1. envd has CAP_SYS_NICE in P(perm) (runs as root, full bounding set).
  2. envd forks /bin/sh with AmbientCaps=[CAP_SYS_NICE]. Go raises permitted+inheritable, then ambient. After execve /bin/sh has: P(perm)={SYS_NICE}, P(inh)={SYS_NICE}, P(amb)={SYS_NICE}.
  3. /bin/sh exec's chrt — caps unchanged through execve except ambient bits get promoted into permitted+effective per ambient rules. Still P(inh)={SYS_NICE}.
  4. chrt exec's setpriv --ambient-caps -all. setpriv clears the ambient set → P(amb)={}. But P(inh) is untouched → P(inh)={SYS_NICE}.
  5. setpriv exec's nice, then user_cmd. Across each execve, P(inh) is preserved. Final state for the user command: P(inh)={SYS_NICE}, P(amb)={}, P(perm)/P(eff) computed from F(inh)/F(perm) of the user binary.

Verifiable from a shell inside the sandbox with grep ^Cap /proc/self/statusCapInh will be 0000000000800000 (bit 23 = CAP_SYS_NICE) instead of 0000000000000000.

Impact (why this is a nit, not a real bug)

For ordinary binaries with no file capabilities, P'(permitted) = (F(perm) & bset) | (F(inh) & P(inh)) | P(amb) yields zero — F(inh) is empty, so the inheritable leak grants nothing. Exploitation requires a binary with CAP_SYS_NICE+ie file caps, and:

  • The sandbox user can't install one (they lack CAP_SETFCAP).
  • No standard Linux distribution ships one.

So in practice the leak is inert. I'm filing this as a nit rather than normal: it's a defense-in-depth refinement that matches the comment's stated intent, not a user-visible escalation path. Acknowledging the refutation: yes, the refuter is correct that exploitability is essentially zero in the default sandbox, which is why I'm not arguing this blocks the merge.

Fix

-/usr/bin/setpriv --ambient-caps -all -- /usr/bin/nice -n %d "${@}"
+/usr/bin/setpriv --ambient-caps -all --inh-caps -all -- /usr/bin/nice -n %d "${@}"

Optionally also clear CAP_SYS_NICE from the bounding set (--bounding-set -all or scoped) for a tighter posture. With --inh-caps -all, P(inh) on the final user process is empty, matching the wrapper's stated intent.

@ValentaTomas

Copy link
Copy Markdown
Member Author

Closing — not following up. SCHED_FIFO/AmbientCapabilities approach turned out to be CI-flaky and the real leverage in this area is #2688 (cgroup freeze) + the customer-validated A11 (Async io_engine), not SCHED_FIFO. May revisit if there's evidence envd is CPU-starved even with #2688 frozen-customer.

@ValentaTomas ValentaTomas deleted the feat/envd-rt-priority branch May 19, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants