feat(envd): give envd realtime IO priority, reset for user processes#2681
Conversation
Adds IO scheduling priority for envd so disk I/O from user processes cannot starve envd's own I/O during pause/resume storms. Mirrors the existing CPU priority setup (Nice=-20 + CPUWeight on envd.service, lower cpu.weight on user/PTY/socat sub-cgroups, reset to defaults for spawned user processes). - envd.service: IOSchedulingClass=realtime, IOSchedulingPriority=4, IOWeight=10000 - user/ptys/socats sub-cgroups: lower io.weight (10/50/50 vs envd default 100) - user-process wrapper resets ioprio to best-effort/4 via ionice(1) the same way it already resets nice, so user-spawned grandchildren cannot inherit envd's realtime IO class - tolerate cgroup properties whose controller isn't enabled in subtree_control (ENOENT), so io.weight on hosts without io delegation degrades gracefully instead of failing the cgroup manager
PR SummaryMedium Risk Overview Adds Sets systemd Reviewed by Cursor Bugbot for commit 0eb4606. Bugbot is set up for automated code reviews on this repo. Configure here. |
❌ 8 Tests Failed:
View the full list of 13 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Code Review
The hardcoded absolute paths for /usr/bin/ionice and /usr/bin/nice in the wrapper script introduce a dependency on a specific filesystem layout and will cause process spawning to fail in environments where ionice is missing. Using relative paths and providing a fallback ensures user processes can start even if the IO priority reset fails or the binary is located elsewhere.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: ENOENT check is dead code due to os.WriteFile O_CREATE
- Replaced os.WriteFile with os.OpenFile(O_WRONLY) to correctly receive ENOENT when cgroup property files don't exist, enabling graceful degradation when controllers aren't delegated.
Or push these changes by commenting:
@cursor push fbfba8084a
Preview (fbfba8084a)
diff --git a/packages/envd/internal/services/cgroups/cgroup2.go b/packages/envd/internal/services/cgroups/cgroup2.go
--- a/packages/envd/internal/services/cgroups/cgroup2.go
+++ b/packages/envd/internal/services/cgroups/cgroup2.go
@@ -111,7 +111,10 @@
var errs []error
for name, value := range properties {
- if err := os.WriteFile(filepath.Join(fullPath, name), []byte(value), 0o644); err != nil {
+ propPath := filepath.Join(fullPath, name)
+ // Open without O_CREATE to get ENOENT when controller isn't enabled
+ f, err := os.OpenFile(propPath, os.O_WRONLY, 0)
+ if err != nil {
// Tolerate properties whose controller isn't enabled in
// cgroup.subtree_control (file doesn't exist). Other errors are fatal.
if errors.Is(err, os.ErrNotExist) {
@@ -119,7 +122,15 @@
continue
}
errs = append(errs, fmt.Errorf("failed to write cgroup property %q: %w", name, err))
+ continue
}
+ _, writeErr := f.Write([]byte(value))
+ closeErr := f.Close()
+ if writeErr != nil {
+ errs = append(errs, fmt.Errorf("failed to write cgroup property %q: %w", name, writeErr))
+ } else if closeErr != nil {
+ errs = append(errs, fmt.Errorf("failed to close cgroup property %q: %w", name, closeErr))
+ }
}
if len(errs) > 0 {
return -1, errors.Join(errs...)You can send follow-ups to the cloud agent here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: io.weight values use wrong format, need "default" prefix
- Added 'default' prefix to all io.weight values in main.go to comply with cgroup v2 format and prevent EINVAL errors with BFQ scheduler.
Or push these changes by commenting:
@cursor push 6a176846ad
Preview (6a176846ad)
diff --git a/packages/envd/main.go b/packages/envd/main.go
--- a/packages/envd/main.go
+++ b/packages/envd/main.go
@@ -243,13 +243,13 @@
opts := []cgroups.Cgroup2ManagerOption{
cgroups.WithCgroup2ProcessType(cgroups.ProcessTypePTY, "ptys", map[string]string{
"cpu.weight": "200",
- "io.weight": "50",
+ "io.weight": "default 50",
"memory.high": fmt.Sprintf("%d", memoryHigh),
"memory.max": fmt.Sprintf("%d", memoryMax),
}),
cgroups.WithCgroup2ProcessType(cgroups.ProcessTypeSocat, "socats", map[string]string{
"cpu.weight": "150",
- "io.weight": "50",
+ "io.weight": "default 50",
"memory.min": fmt.Sprintf("%d", 5*megabyte),
"memory.low": fmt.Sprintf("%d", 8*megabyte),
}),
@@ -257,7 +257,7 @@
"memory.high": fmt.Sprintf("%d", memoryHigh),
"memory.max": fmt.Sprintf("%d", memoryMax),
"cpu.weight": "50",
- "io.weight": "10",
+ "io.weight": "default 10",
}),
}
if cgroupRoot != "" {You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 5301533. Configure here.


envd at IOSchedulingClass=realtime + IOWeight=10000; lower io.weight on user/PTY/socat sub-cgroups; user-spawned processes get ioprio reset via the existing wrapper.