[core] Add opt-in swap accounting to memory monitor and scheduler#63793
[core] Add opt-in swap accounting to memory monitor and scheduler#63793preneond wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an opt-in configuration (RAY_count_swap_in_memory_monitor) to fold swap space into Ray's memory monitoring, node resource calculations, and dashboard metrics across both C++ and Python components. The review feedback highlights critical robustness improvements, specifically recommending exception handling when parsing memory.swap.max in C++ to prevent crashes on extremely large values, and wrapping psutil.swap_memory() calls in Python with try-except blocks to avoid failures in restricted or containerized environments.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
Reviewed by Cursor Bugbot for commit da2364f. Configure here.
Adds RAY_count_swap_in_memory_monitor / count_swap_in_memory_monitor (default false) to fold swap space into the memory monitor's total/used accounting, the auto-computed node `memory` resource, and the dashboard Node Memory graph. All changes are behind the flag so existing deployments are unaffected. C++ side: /proc/meminfo SwapTotal/SwapFree, cgroup v1 memory.memsw.*, and cgroup v2 memory.swap.* paths in GetLinuxMemoryBytes / GetCGroupMemoryBytes. Python side: psutil.swap_memory() in resource_and_label_spec.py and reporter_agent.py. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
da2364f to
58ffe4c
Compare
…le-newline in mock - memory_monitor_utils.cc: wrap std::stoll in try/catch for out_of_range (e.g. ULLONG_MAX sentinel) and invalid_argument; use lambda cast in std::all_of to avoid UB on signed chars. - memory_monitor_test_fixture.cc: remove trailing \n from AppendLine calls in MockProcMeminfo and MockCgroupv2Swap — AppendLine already appends std::endl, so the extra \n created blank lines that broke meminfo parsing in tests (double value * 1024 multiplication). - resource_and_label_spec.py, reporter_agent.py: wrap psutil.swap_memory() in try/except to avoid crashing in restricted containers or VMs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
psutil.swap_memory() on Linux opens /proc/meminfo (raises OSError on permission/access errors) and falls back to a C-extension sysinfo() call (also OSError). Other platforms that do not implement swap raise NotImplementedError. Catching bare Exception was too broad; (OSError, NotImplementedError) matches what psutil itself catches internally. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>

Why
On bare-metal nodes with significant swap configured, Ray's OOM monitor kills tasks prematurely. The monitor computes its threshold purely from RAM, so when RAM approaches the threshold Ray starts killing workers — even though the kernel would have handled the overflow via swap without any issue. The Linux OOM killer only fires when both RAM and swap are exhausted; Ray's monitor was firing much earlier, on RAM alone.
Users who deliberately provision swap as overflow capacity get no benefit from it — Ray overrides the kernel's judgment before swap is ever meaningfully used.
The same mismatch affects the scheduler (node
memoryresource excludes swap) and the dashboard Node Memory graph (shows RAM-only total).What
Adds a single opt-in flag
RAY_count_swap_in_memory_monitor(env var) /count_swap_in_memory_monitor(RayConfig), defaultfalse. When enabled, swap is folded into:GetLinuxMemoryBytesreadsSwapTotal/SwapFreefrom/proc/meminfo;GetCGroupMemoryBytesuses cgroup v1memory.memsw.*(RAM+swap combined) or cgroup v2memory.swap.{max,current}(swap-only, added on top ofmemory.{max,current}).memoryresource (resource_and_label_spec.py) — swap total is included in the auto-computed memory capacity seen by the scheduler andray status.reporter_agent.py) —_get_cgroup_aware_swap()adds cgroup-scoped swap so the graph matches what the OOM killer sees in both bare-metal and containerized deployments.All three sites are guarded by the same flag. Flag-off is the existing behavior — no scheduling or OOM-killer change for existing deployments.
Design notes
resource_and_label_spec.pyreads the env var once at import time (consistent with how RayConfig is consumed at node start);reporter_agent.pyre-reads on every poll tick so a hot-restart can pick up the change without rebuilding.memory.memsw.limit_in_bytesis absent (kernel built withoutCONFIG_MEMCG_SWAP), code falls back silently to the existing RAM-only path.memory.swap.maxcan contain the string"max"(unlimited). The patch guards againststd::stollon that sentinel by checkingstd::all_of(..., ::isdigit)before parsing._get_cgroup_aware_swap()reads cgroup v2memory.swap.{max,current}or cgroup v1memory.memsw.*directly, capping with host psutil values, so the dashboard total matches the C++ OOM killer's view inside containers.Test plan
python/ray/tests/unit/test_resource_and_label_spec.py: swap-enabled inflates memory resource, swap-disabled does not, explicit--memorybypasses swap regardless of flag.src/ray/common/tests/memory_monitor_utils_test.cc: Linux meminfo fold, swap ignored when flag off, no-swap host (missing SwapTotal lines), cgroup v2 swap added, cgroup v2 swap ignored when flag off, cgroup v2 unlimited swap not added, cgroup v1 memsw added to total/used, cgroup v1 memsw ignored when flag off.RAY_count_swap_in_memory_monitor=1 ray start --head && ray status— Node Memory should reflect RAM + swap.Files changed
src/ray/common/ray_config_def.hcount_swap_in_memory_monitorbool flagsrc/ray/common/memory_monitor_utils.hFRIEND_TESTdeclarationssrc/ray/common/memory_monitor_utils.ccGetCGroupMemoryBytesandGetLinuxMemoryBytessrc/ray/common/memory_monitor_test_fixture.h/ccMockProcMeminfo,MockCgroupv2Swap,MockCgroupv1Memswtest helperssrc/ray/common/tests/memory_monitor_utils_test.ccpython/ray/_private/resource_and_label_spec.pypython/ray/dashboard/modules/reporter/reporter_agent.py_get_cgroup_aware_swap()python/ray/tests/unit/test_resource_and_label_spec.py🤖 Generated with Claude Code