Skip to content

[core] Add opt-in swap accounting to memory monitor and scheduler#63793

Open
preneond wants to merge 6 commits into
ray-project:masterfrom
preneond:feat/count-swap-in-memory-monitor
Open

[core] Add opt-in swap accounting to memory monitor and scheduler#63793
preneond wants to merge 6 commits into
ray-project:masterfrom
preneond:feat/count-swap-in-memory-monitor

Conversation

@preneond
Copy link
Copy Markdown
Contributor

@preneond preneond commented Jun 2, 2026

Why

On bare-metal nodes with significant swap configured, Ray's OOM monitor kills tasks prematurely. The monitor computes its threshold purely from RAM, so when RAM approaches the threshold Ray starts killing workers — even though the kernel would have handled the overflow via swap without any issue. The Linux OOM killer only fires when both RAM and swap are exhausted; Ray's monitor was firing much earlier, on RAM alone.

Users who deliberately provision swap as overflow capacity get no benefit from it — Ray overrides the kernel's judgment before swap is ever meaningfully used.

The same mismatch affects the scheduler (node memory resource excludes swap) and the dashboard Node Memory graph (shows RAM-only total).

What

Adds a single opt-in flag RAY_count_swap_in_memory_monitor (env var) / count_swap_in_memory_monitor (RayConfig), default false. When enabled, swap is folded into:

  1. C++ memory monitorGetLinuxMemoryBytes reads SwapTotal/SwapFree from /proc/meminfo; GetCGroupMemoryBytes uses cgroup v1 memory.memsw.* (RAM+swap combined) or cgroup v2 memory.swap.{max,current} (swap-only, added on top of memory.{max,current}).
  2. Node memory resource (resource_and_label_spec.py) — swap total is included in the auto-computed memory capacity seen by the scheduler and ray status.
  3. Dashboard Node Memory graph (reporter_agent.py) — _get_cgroup_aware_swap() adds cgroup-scoped swap so the graph matches what the OOM killer sees in both bare-metal and containerized deployments.

All three sites are guarded by the same flag. Flag-off is the existing behavior — no scheduling or OOM-killer change for existing deployments.

Design notes

  • Module-level vs per-call read: resource_and_label_spec.py reads the env var once at import time (consistent with how RayConfig is consumed at node start); reporter_agent.py re-reads on every poll tick so a hot-restart can pick up the change without rebuilding.
  • cgroup v1 memsw fallback: If memory.memsw.limit_in_bytes is absent (kernel built without CONFIG_MEMCG_SWAP), code falls back silently to the existing RAM-only path.
  • cgroup v2 unlimited swap: memory.swap.max can contain the string "max" (unlimited). The patch guards against std::stoll on that sentinel by checking std::all_of(..., ::isdigit) before parsing.
  • Dashboard cgroup-scoped swap: _get_cgroup_aware_swap() reads cgroup v2 memory.swap.{max,current} or cgroup v1 memory.memsw.* directly, capping with host psutil values, so the dashboard total matches the C++ OOM killer's view inside containers.

Test plan

  • Python unit tests — 3 new parametrized cases in python/ray/tests/unit/test_resource_and_label_spec.py: swap-enabled inflates memory resource, swap-disabled does not, explicit --memory bypasses swap regardless of flag.
  • C++ unit tests — 8 new gtests in src/ray/common/tests/memory_monitor_utils_test.cc: Linux meminfo fold, swap ignored when flag off, no-swap host (missing SwapTotal lines), cgroup v2 swap added, cgroup v2 swap ignored when flag off, cgroup v2 unlimited swap not added, cgroup v1 memsw added to total/used, cgroup v1 memsw ignored when flag off.
  • Manual smoke test: RAY_count_swap_in_memory_monitor=1 ray start --head && ray status — Node Memory should reflect RAM + swap.

Files changed

File Change
src/ray/common/ray_config_def.h New count_swap_in_memory_monitor bool flag
src/ray/common/memory_monitor_utils.h cgroup memsw/swap path constants; FRIEND_TEST declarations
src/ray/common/memory_monitor_utils.cc Swap accounting in GetCGroupMemoryBytes and GetLinuxMemoryBytes
src/ray/common/memory_monitor_test_fixture.h/cc MockProcMeminfo, MockCgroupv2Swap, MockCgroupv1Memsw test helpers
src/ray/common/tests/memory_monitor_utils_test.cc 8 new C++ tests (incl. cgroup v1 memsw coverage)
python/ray/_private/resource_and_label_spec.py Fold swap into auto-computed node memory resource
python/ray/dashboard/modules/reporter/reporter_agent.py Cgroup-scoped swap in dashboard memory reporting via _get_cgroup_aware_swap()
python/ray/tests/unit/test_resource_and_label_spec.py 3 new Python tests

🤖 Generated with Claude Code

@preneond preneond requested a review from a team as a code owner June 2, 2026 09:14
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an opt-in configuration (RAY_count_swap_in_memory_monitor) to fold swap space into Ray's memory monitoring, node resource calculations, and dashboard metrics across both C++ and Python components. The review feedback highlights critical robustness improvements, specifically recommending exception handling when parsing memory.swap.max in C++ to prevent crashes on extremely large values, and wrapping psutil.swap_memory() calls in Python with try-except blocks to avoid failures in restricted or containerized environments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/ray/common/memory_monitor_utils.cc
Comment thread python/ray/_private/resource_and_label_spec.py Outdated
Comment thread python/ray/dashboard/modules/reporter/reporter_agent.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit da2364f. Configure here.

Comment thread src/ray/common/memory_monitor_test_fixture.cc
@preneond preneond marked this pull request as draft June 2, 2026 09:52
Adds RAY_count_swap_in_memory_monitor / count_swap_in_memory_monitor
(default false) to fold swap space into the memory monitor's
total/used accounting, the auto-computed node `memory` resource, and
the dashboard Node Memory graph. All changes are behind the flag so
existing deployments are unaffected.

C++ side: /proc/meminfo SwapTotal/SwapFree, cgroup v1 memory.memsw.*,
and cgroup v2 memory.swap.* paths in GetLinuxMemoryBytes /
GetCGroupMemoryBytes.
Python side: psutil.swap_memory() in resource_and_label_spec.py and
reporter_agent.py.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
@preneond preneond force-pushed the feat/count-swap-in-memory-monitor branch from da2364f to 58ffe4c Compare June 2, 2026 09:55
preneond and others added 5 commits June 2, 2026 11:57
…le-newline in mock

- memory_monitor_utils.cc: wrap std::stoll in try/catch for out_of_range
  (e.g. ULLONG_MAX sentinel) and invalid_argument; use lambda cast in
  std::all_of to avoid UB on signed chars.
- memory_monitor_test_fixture.cc: remove trailing \n from AppendLine
  calls in MockProcMeminfo and MockCgroupv2Swap — AppendLine already
  appends std::endl, so the extra \n created blank lines that broke
  meminfo parsing in tests (double value * 1024 multiplication).
- resource_and_label_spec.py, reporter_agent.py: wrap psutil.swap_memory()
  in try/except to avoid crashing in restricted containers or VMs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
psutil.swap_memory() on Linux opens /proc/meminfo (raises OSError on
permission/access errors) and falls back to a C-extension sysinfo()
call (also OSError). Other platforms that do not implement swap
raise NotImplementedError. Catching bare Exception was too broad;
(OSError, NotImplementedError) matches what psutil itself catches
internally.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ondrej Prenek <ondra.prenek@gmail.com>
@preneond preneond marked this pull request as ready for review June 2, 2026 13:49
@ray-gardener ray-gardener Bot added core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant