fix(core): read pod cgroup limits instead of node limits in resource metrics by leosvelperez · Pull Request #35622 · nrwl/nx

leosvelperez · 2026-05-08T13:55:18Z

Current Behavior

When running inside a Linux container or Kubernetes pod, resource metrics report the host node's CPU and memory totals instead of the pod's limits. A pod with constrained CPU/memory shows the underlying node's resources.

The same gap exists for process-isolated Windows containers running inside a Windows Job Object (e.g. Docker --cpus / --memory): metrics report host values rather than the Job's limits.

Expected Behavior

Resource metrics report the effective CPU and memory limits enforced by the kernel for the calling process — derived from the cgroup it belongs to on Linux, or the Job Object on Windows. macOS native processes continue to report host values (no equivalent enforcement primitive exists).

Implementation Details

Linux (`cgroup` module)

Resolves the calling process's actual cgroup directory by parsing /proc/self/cgroup + /proc/self/mountinfo. Reads cpu.max / memory.max (cgroup v2) or cpu.cfs_{quota,period}_us / memory.limit_in_bytes (cgroup v1, including the systemd cpu,cpuacct co-mount).

Walks from the leaf up to the mount point and takes the minimum finite limit found at any level — the kernel enforces the tightest ancestor's limit (hierarchical enforcement), so leaf-only reads can overreport when a pod-level cgroup is tighter than the container's (common in K8s VPA in-place resize and Burstable QoS).

Composition with sched_getaffinity covers cpuset / taskset restrictions. We deliberately bypass std::thread::available_parallelism() because rust-lang/rust's implementation applies the cgroup quota with floor division internally, which would silently underreport fractional quotas (e.g. 1.5 cores → 1). Instead we ceil quota / period, matching HotSpot JVM, Go 1.25, .NET, and the num_cpus crate.

mountinfo path fields containing spaces / tabs / newlines / backslashes are kernel-encoded as \040 / \011 / \012 / \134 per man 5 proc_pid_mountinfo; we unescape before joining with the cgroup path. /proc/self/cgroup itself emits paths raw (verified across kernels v4.18 → v6.13), so no unescape is needed there.

Replaces the prior leaf-only path lookups (which broke on cgroup v1 co-mount and any non-namespaced container setup). The in-tree module is preferred over sysinfo's cgroup_limits() (now parent-aware in 0.39 — see Additional Changes) because it also covers CPU quota and the v1 co-mount + bind-mount edge cases in a single place we control. 30 unit tests cover cgroup discovery, parsing, ancestor walk, and the v1 / v2 / co-mount / bind-mount / cgroupns=host cases.

Windows (`job_object` module)

Detects Job Object resource limits via the Win32 API:

CPU: ceil(host_cpu_count × CpuRate / 10000) from JobObjectCpuRateControlInformation when HARD_CAP is set (Docker --cpus translates to HARD_CAP). Plus the popcount of the Job's affinity mask (when LIMIT_AFFINITY is set), and GetProcessAffinityMask (covers Job + manual + system intersections). Takes the minimum.
Memory: minimum of ProcessMemoryLimit / JobMemoryLimit / MaximumWorkingSetSize from JobObjectExtendedLimitInformation, gated on the corresponding LIMIT_* flags. Mirrors HotSpot and .NET.
Skipped: WEIGHT_BASED rate control (relative priority, not a hard limit) and soft-cap rate control (kernel allows transient bursts) — neither maps to a defensible "available cores" number.

Any Win32 failure is treated as "no information"; the caller falls back to host values. No new crate dependency — the existing winapi dep is extended with jobapi, jobapi2, processthreadsapi, winbase, and winnt features.

Known limitation: nested Job hierarchies

QueryInformationJobObject(NULL, ...) returns the immediate Job's settings (per MSDN: "If the job is nested, the immediate job of the calling process is used.") and Win32 exposes no documented API to enumerate parent Jobs.

Per JOBOBJECT_CPU_RATE_CONTROL_INFORMATION Remarks, "the rates set for the job represent its portion of the CPU rate that is allocated to its parent job" — nested rates compose multiplicatively. So with parent HARD_CAP 50% × child HARD_CAP 50%, the effective rate is 25% of host but we read the child's 50% and report ceil(host × 0.5). Per-process / per-job memory has the same shape (kernel enforces min across the chain; we see only the immediate Job).

Affinity is unaffected — GetProcessAffinityMask returns the kernel-effective mask. HotSpot and .NET CoreCLR exhibit the same memory limitation; HARD_CAP CPU-rate detection goes beyond them for Docker --cpus parity in the common single-silo case, accepting the nested-Job overreport as the documented cost.

Cross-platform

SystemInfo { cpuCores, totalMemory } shape is unchanged — consumers do not need to update. macOS native processes continue to report host values (no container-style enforcement primitive exists; container runtimes on macOS run Linux VMs where the Linux path applies).

The module doc-comments cross-reference Go 1.25 internal/runtime/cgroup, libuv src/unix/linux.c, OpenJDK cgroupSubsystem_linux.cpp + os_windows.cpp, .NET CoreCLR gc/unix/cgroup.cpp + gc/windows/gcenv.windows.cpp, and Rust stdlib library/std/src/sys/thread/unix.rs::cgroups.

Additional Changes

Two infrastructure chores are bundled into this PR as separate commits:

`chore(core): bump sysinfo to 0.39.1`

Bumps sysinfo from 0.37.2 → 0.39.1. No source changes were required — all signatures we use (System, Process, Pid, Signal, Disks, the *RefreshKind types, UpdateKind, MINIMUM_CPU_UPDATE_INTERVAL) are unchanged. Cargo.lock deltas:

New transitive dep objc2-open-directory from sysinfo 0.39's soundness fix for user retrieval on Apple targets.
windows family bumped to 0.62.x per sysinfo's new constraint, which lets the graph converge on single versions of windows-core, windows-link, windows-result, and windows-strings — four duplicate windows-* entries are deduplicated as a result.

sysinfo 0.39.0 also added Process::cgroup_limits() and parent-cgroup memory walking inside System::cgroup_limits() — overlapping conceptually with the in-tree cgroup module introduced by this PR. The in-tree module is kept because it also covers CPU quota (sysinfo's helpers are memory-only), the cgroup v1 co-mount, and bind-mount path translation in a single implementation under our control. Future consolidation onto the upstream APIs is possible but explicitly out of scope here.

`chore(repo): bump mise rust to 1.95.0 to match rust-toolchain.toml`

rust-toolchain.toml was bumped to 1.95.0 in #35665 to unblock the sysinfo upgrade, but mise.toml was left at 1.90.0. CI installs Rust via mise (which exports RUSTUP_TOOLCHAIN, overriding rust-toolchain.toml), so CI continued to run on 1.90.0 and failed the sysinfo 0.39 MSRV check until this commit. The two files now agree.

netlify · 2026-05-08T13:55:27Z

✅ Deploy Preview for nx-docs ready!

Name	Link
🔨 Latest commit	`a443689`
🔍 Latest deploy log	https://app.netlify.com/projects/nx-docs/deploys/6a0ee60ebfe84d0008ca5f0e
😎 Deploy Preview	https://deploy-preview-35622--nx-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify · 2026-05-08T13:55:27Z

✅ Deploy Preview for nx-dev ready!

Name	Link
🔨 Latest commit	`a443689`
🔍 Latest deploy log	https://app.netlify.com/projects/nx-dev/deploys/6a0ee60ee0dabb000812ea15
😎 Deploy Preview	https://deploy-preview-35622--nx-dev.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

nx-cloud · 2026-05-08T13:55:59Z

View your CI Pipeline Execution ↗ for commit a443689

Command	Status	Duration	Result
`nx affected --targets=lint,test,build,e2e,e2e-c...`	✅ Succeeded	29m 5s	View ↗
`nx run-many -t check-imports check-lock-files c...`	✅ Succeeded	3s	View ↗
`nx-cloud record -- pnpm nx-cloud conformance:check`	✅ Succeeded	9s	View ↗
`nx build workspace-plugin`	✅ Succeeded	4m 2s	View ↗
`nx-cloud record -- nx sync:check`	✅ Succeeded	18s	View ↗
`nx-cloud record -- nx format:check`	✅ Succeeded	<1s	View ↗

☁️ Nx Cloud last updated this comment at 2026-05-21 11:38:09 UTC

## Current Behavior `rust-toolchain.toml` pins Rust to `1.94.0`. This blocks upgrading `sysinfo` to `0.39.x`, which requires Rust `1.95` and brings upstream support for several cgroup limit features that nx currently needs to implement locally. ## Expected Behavior Toolchain bumped to `1.95.0`. Verified: - `cargo build -p nx` produces zero new warnings vs. 1.94. - `cargo build -p nx --all-targets` (compiles tests too) produces zero new warnings vs. 1.94 — identical warning set. - Clippy delta: +11 new clippy lints (style suggestions only — not gating, none correctness-related). ## Related Issue(s) N/A — maintenance/hygiene change. Unblocks the sysinfo bump that, in turn, lets PR #35622 drop its memory-side cgroup parsing in favor of upstream `Process::cgroup_limits()` + parent cgroup memory walking (sysinfo PRs [#1643](GuillaumeGomez/sysinfo#1643) and [#1651](GuillaumeGomez/sysinfo#1651)).

FrozenPandaz

Heads up: sysinfo v0.39.0 (released 2026-05-11) now does the parent-cgroup memory walking this PR implements — see PR #1651 and the v0.39.0 CHANGELOG. nx is pinned to sysinfo 0.37.2; I've opened #35665 to bump the Rust toolchain to 1.95 (sysinfo 0.39's MSRV) so we can pull that in.

Suggestion: rebase onto sysinfo 0.39 and drop the memory-side cgroup parsing in cgroup.rs, but keep the genuinely-novel parts — the CPU cpu.max / cpu.cfs_* ancestor walk, cgroup v1 co-mount handling, mountinfo octal-escape parsing, and the Windows JobObject reads (no upstream equivalent for any of those; rust-lang/rust#143709 is the open tracking issue for the Windows side).

One concrete bug worth fixing regardless: read_v1_cpu_quota only treats q <= 0 as unlimited, but cgroup v1 emits the same large positive PAGE_COUNTER_MAX-style sentinel for CPU that read_v1_memory_limit already filters at line 346 — feeding that into cores_from_quota will overflow.

## Current Behavior `rust-toolchain.toml` pins Rust to `1.94.0`. This blocks upgrading `sysinfo` to `0.39.x`, which requires Rust `1.95` and brings upstream support for several cgroup limit features that nx currently needs to implement locally. ## Expected Behavior Toolchain bumped to `1.95.0`. Verified: - `cargo build -p nx` produces zero new warnings vs. 1.94. - `cargo build -p nx --all-targets` (compiles tests too) produces zero new warnings vs. 1.94 — identical warning set. - Clippy delta: +11 new clippy lints (style suggestions only — not gating, none correctness-related). ## Related Issue(s) N/A — maintenance/hygiene change. Unblocks the sysinfo bump that, in turn, lets PR #35622 drop its memory-side cgroup parsing in favor of upstream `Process::cgroup_limits()` + parent cgroup memory walking (sysinfo PRs [#1643](GuillaumeGomez/sysinfo#1643) and [#1651](GuillaumeGomez/sysinfo#1651)).

leosvelperez · 2026-05-15T13:14:15Z

@FrozenPandaz, the PR has been rebased and now includes the sysinfo update. As discussed offline, we'll keep the custom cgroup implementation due to sysinfo still lacking some functionality.

nx-cloud

Nx Cloud has identified a flaky task in your failed CI:

🔂 Since the failure was identified as flaky, we triggered a CI rerun by adding an empty commit to this branch.

View detailed reasoning in Nx Cloud ↗

_{🎓 Learn more about Self-Healing CI on nx.dev}

## Current Behavior `rust-toolchain.toml` pins Rust to `1.94.0`. This blocks upgrading `sysinfo` to `0.39.x`, which requires Rust `1.95` and brings upstream support for several cgroup limit features that nx currently needs to implement locally. ## Expected Behavior Toolchain bumped to `1.95.0`. Verified: - `cargo build -p nx` produces zero new warnings vs. 1.94. - `cargo build -p nx --all-targets` (compiles tests too) produces zero new warnings vs. 1.94 — identical warning set. - Clippy delta: +11 new clippy lints (style suggestions only — not gating, none correctness-related). ## Related Issue(s) N/A — maintenance/hygiene change. Unblocks the sysinfo bump that, in turn, lets PR nrwl#35622 drop its memory-side cgroup parsing in favor of upstream `Process::cgroup_limits()` + parent cgroup memory walking (sysinfo PRs [nrwl#1643](GuillaumeGomez/sysinfo#1643) and [nrwl#1651](GuillaumeGomez/sysinfo#1651)).

…metrics

Completes the toolchain bump from #35665, which updated rust-toolchain.toml to 1.95.0 but left mise.toml at 1.90.0. CI installs rust via mise (sets RUSTUP_TOOLCHAIN, overriding rust-toolchain.toml), so CI was still on 1.90.0 — failing the sysinfo 0.39.1 MSRV check.

- Read GetProcessAffinityMask unconditionally so manual SetProcessAffinityMask is honored whether or not the process is in a Job Object. Matches the Linux arm's unconditional sched_getaffinity call and the behavior of Go, .NET, libuv, and OpenJDK's no-Job branch. - Drop the redundant JOB_OBJECT_LIMIT_AFFINITY read; the kernel intersects Job-imposed affinity into the process mask, so the unconditional GetProcessAffinityMask already covers it. - Extract shared cgroup/Job Object math into a cfg-free metrics_math module with cross-OS unit tests; align cgroup v1 and Job Object memory filtering via a shared predicate. - Emit tracing::debug! on Win32 and /proc fallback paths so silent failures are diagnosable.

leosvelperez self-assigned this May 8, 2026

This comment was marked as outdated.

Sign in to view

leosvelperez force-pushed the nxc-4445 branch from 0045564 to 4e05f9d Compare May 8, 2026 15:12

leosvelperez marked this pull request as ready for review May 8, 2026 16:32

leosvelperez requested a review from a team as a code owner May 8, 2026 16:32

leosvelperez requested a review from AgentEnder May 8, 2026 16:32

FrozenPandaz mentioned this pull request May 12, 2026

chore(core): bump Rust toolchain to 1.95.0 #35665

Merged

FrozenPandaz requested changes May 13, 2026

View reviewed changes

leosvelperez force-pushed the nxc-4445 branch 2 times, most recently from 489d9d0 to 4198ea9 Compare May 15, 2026 11:07

This comment was marked as outdated.

Sign in to view

leosvelperez requested a review from FrozenPandaz May 15, 2026 13:14

nx-cloud Bot reviewed May 15, 2026

View reviewed changes

leosvelperez added 4 commits May 21, 2026 08:44

fix(core): read pod cgroup limits instead of node limits in resource …

110b778

…metrics

chore(core): bump sysinfo to 0.39.1

2c9e859

leosvelperez force-pushed the nxc-4445 branch from 469c965 to a443689 Compare May 21, 2026 11:01

FrozenPandaz approved these changes Jun 2, 2026

View reviewed changes

FrozenPandaz merged commit 6c9d5e0 into master Jun 2, 2026
25 checks passed

FrozenPandaz deleted the nxc-4445 branch June 2, 2026 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): read pod cgroup limits instead of node limits in resource metrics#35622

fix(core): read pod cgroup limits instead of node limits in resource metrics#35622
FrozenPandaz merged 4 commits into
masterfrom
nxc-4445

leosvelperez commented May 8, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 8, 2026 •

edited

Loading

Uh oh!

netlify Bot commented May 8, 2026 •

edited

Loading

Uh oh!

nx-cloud Bot commented May 8, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

FrozenPandaz left a comment •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

leosvelperez commented May 15, 2026

Uh oh!

nx-cloud Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leosvelperez commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current Behavior

Expected Behavior

Implementation Details

Linux (cgroup module)

Windows (job_object module)

Known limitation: nested Job hierarchies

Cross-platform

Additional Changes

chore(core): bump sysinfo to 0.39.1

chore(repo): bump mise rust to 1.95.0 to match rust-toolchain.toml

Uh oh!

netlify Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nx-docs ready!

Uh oh!

netlify Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nx-dev ready!

Uh oh!

nx-cloud Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

FrozenPandaz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

leosvelperez commented May 15, 2026

Uh oh!

nx-cloud Bot left a comment

Choose a reason for hiding this comment

Nx Cloud has identified a flaky task in your failed CI:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leosvelperez commented May 8, 2026 •

edited

Loading

Linux (`cgroup` module)

Windows (`job_object` module)

`chore(core): bump sysinfo to 0.39.1`

`chore(repo): bump mise rust to 1.95.0 to match rust-toolchain.toml`

netlify Bot commented May 8, 2026 •

edited

Loading

netlify Bot commented May 8, 2026 •

edited

Loading

nx-cloud Bot commented May 8, 2026 •

edited

Loading

FrozenPandaz left a comment •

edited

Loading