Skip to content

Commit ba9b455

Browse files
Kunchddavik
authored andcommitted
[Core] (Resource Isolation 14/n) Introduce user slice threshold monitor for resource isolation mode (ray-project#63067)
## Description This PR introduces the threshold memory monitor portion of ray's new resource isolation enabled memory monitoring system. Specifically, it introduces the bottom half of the monitoring system depiction below: <img width="2168" height="1696" alt="image" src="https://github.com/user-attachments/assets/b85fc87f-9731-485f-8eac-dfb56ddddf00" /> The core of this PR introduces changes to the following parts of the memory monitoring components: * `threshold_memory_monitor` where we modify the memory monitor to monitor the _total user application memory_ (defined by user application memory defined in the diagram + total object store memory) usage when resource isolation is enabled. * `kill worker callback` where we pass in the _total user application memory_ usage to the killing policy instead of the system wide usage, so that the killing policy will determine the amount of memory to free based on the user application usage. * `memory_monitor_utils` where we add the ability for us to monitor the total memory usage of the user applications ### Why monitor the _total user application usage_ with a threshold monitor The overarching design of the new monitoring system can be found in the description for ray-project#62705. The TLDR is that resource isolation aims to provide stronger protection for the critical ray system processes through isolation the system process resources from the available user process resources. We accomplish this through the use of cgroups by placing upper bound resource constraints on the user application. However, the cgroup memory upper bound constraint alone is insufficient for handling cases with heavy object store usage. Because object store is shared memory that's shared between the raylet (since it does the allocations) within the system slice and the user workers, it's possible for portions of object store memory usage to be accounted in either the system cgroup or the user cgroup. In the best case scenario, all of object store is accounted within the user cgroup, and the user cgroup upper bound constraint will be able to effectively restrict the user slice from using more memory than is allocated to it. In the worst case scenario, some to all of object store memory may be accounted within the system cgroup. In this scenario, the amount of memory that the user is actually using may not be reflected by the amount of memory that the user cgroup can see as shown in the diagram below. <img width="923" height="370" alt="image" src="https://github.com/user-attachments/assets/04d22466-f541-441c-92b7-876a662b8fc7" /> Thus, we bring the threshold monitor to the rescue. The threshold monitor will be responsible for catching special scenarios like above by checking that the actual user usage (approximated by the sum of the user heap memory usage and the shared memory usage of both the system and user slice) is less than the upper bound we've allocated as available user memory. ### The road ahead This PR only introduces the bottom half of the resource isolation monitoring system shown in the first diagram. The last step for resource isolation after this PR will be to introduce the top half to complete our resource isolation memory monitoring system. --------- Signed-off-by: davik <davik@anyscale.com> Co-authored-by: davik <davik@anyscale.com>
1 parent 4932bfd commit ba9b455

28 files changed

Lines changed: 765 additions & 139 deletions

src/ray/common/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ ray_cc_library(
100100
"//src/ray/common/tests:__pkg__",
101101
],
102102
deps = [
103+
":memory_monitor_utils",
103104
"//src/ray/common/cgroup2:cgroup_test_utils",
104105
"//src/ray/util:logging",
105106
"@com_google_googletest//:gtest",

src/ray/common/cgroup2/cgroup_manager.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,8 @@ Status CgroupManager::AddProcessToSystemCgroup(const std::string &pid) {
392392
return AddProcessToCgroup(system_leaf_cgroup_, pid);
393393
}
394394

395+
std::string CgroupManager::GetSystemCgroupPath() const { return system_cgroup_; }
396+
395397
std::string CgroupManager::GetUserCgroupPath() const { return user_cgroup_; }
396398

397399
StatusOr<std::string> CgroupManager::GetSystemCgroupConstraintValue(

src/ray/common/cgroup2/cgroup_manager.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,11 @@ class CgroupManager : public CgroupManagerInterface {
113113
*/
114114
Status AddProcessToSystemCgroup(const std::string &pid) override;
115115

116+
/**
117+
@return the path to the system cgroup.
118+
*/
119+
std::string GetSystemCgroupPath() const override;
120+
116121
/**
117122
@return the path to the user cgroup.
118123
*/

src/ray/common/cgroup2/cgroup_manager_interface.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,11 @@ class CgroupManagerInterface {
7676
*/
7777
virtual Status AddProcessToSystemCgroup(const std::string &pid) = 0;
7878

79+
/**
80+
@return the path to the system cgroup.
81+
*/
82+
virtual std::string GetSystemCgroupPath() const = 0;
83+
7984
/**
8085
@return the path to the user cgroup.
8186
*/

src/ray/common/cgroup2/linux_cgroup_manager_factory.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ std::unique_ptr<CgroupManagerInterface> CgroupManagerFactory::Create(
6161
int64_t system_memory_bytes_low = system_reserved_memory_bytes;
6262

6363
// Compute user memory limits from proportions
64-
SystemMemorySnapshot memory_snapshot =
65-
MemoryMonitorUtils::TakeSystemMemorySnapshot(cgroup_path);
64+
MemoryUsageSnapshot memory_snapshot =
65+
MemoryMonitorUtils::TakeSystemMemoryUsageSnapshot(cgroup_path);
6666
int64_t total_memory_bytes = memory_snapshot.total_bytes;
6767
float user_memory_proportion_high = RayConfig::instance().user_memory_proportion_high();
6868
float user_memory_proportion_max = RayConfig::instance().user_memory_proportion_max();

src/ray/common/cgroup2/noop_cgroup_manager.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ class NoopCgroupManager : public CgroupManagerInterface {
3939
return Status::OK();
4040
}
4141

42+
std::string GetSystemCgroupPath() const override { return ""; }
43+
4244
std::string GetUserCgroupPath() const override { return ""; }
4345

4446
StatusOr<std::string> GetSystemCgroupConstraintValue(

src/ray/common/memory_monitor_interface.h

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
#include <cstdint>
2020
#include <functional>
21+
#include <optional>
2122
#include <ostream>
2223

2324
#include "absl/container/flat_hash_map.h"
@@ -28,20 +29,40 @@ namespace ray {
2829
/**
2930
* @brief A snapshot of aggregated memory usage across the system.
3031
*/
31-
struct SystemMemorySnapshot {
32+
struct MemoryUsageSnapshot {
33+
/// The current memory usage.
3234
int64_t used_bytes;
3335

34-
/// The total memory available on the system. >= used_bytes.
36+
/// The total memory available on the system.
3537
int64_t total_bytes;
3638

3739
friend std::ostream &operator<<(std::ostream &os,
38-
const SystemMemorySnapshot &memory_snapshot) {
40+
const MemoryUsageSnapshot &memory_snapshot) {
3941
os << "Used bytes: " << memory_snapshot.used_bytes
4042
<< ", Total bytes: " << memory_snapshot.total_bytes;
4143
return os;
4244
}
4345
};
4446

47+
/**
48+
* @brief A snapshot of memory usage within a cgroup.
49+
*/
50+
struct CgroupMemorySnapshot {
51+
/// size of non-file-backed region mappings within the cgroup in bytes.
52+
/// This is an approximation of heap usage for the cgroup.
53+
int64_t anon_memory_bytes;
54+
55+
/// size of shared memory mappings within the cgroup in bytes.
56+
int64_t shmem_memory_bytes;
57+
58+
friend std::ostream &operator<<(std::ostream &os,
59+
const CgroupMemorySnapshot &memory_snapshot) {
60+
os << "Anon memory bytes: " << memory_snapshot.anon_memory_bytes
61+
<< ", Shmem memory bytes: " << memory_snapshot.shmem_memory_bytes;
62+
return os;
63+
}
64+
};
65+
4566
/**
4667
* @brief A snapshot of per-process memory usage.
4768
*/

src/ray/common/memory_monitor_test_fixture.cc

Lines changed: 66 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
#include "ray/common/memory_monitor_test_fixture.h"
1616

17+
#include "ray/common/memory_monitor_utils.h"
1718
#include "ray/util/logging.h"
1819

1920
namespace ray {
@@ -41,31 +42,82 @@ std::string MemoryMonitorTestFixture::MockProcMemoryUsage(pid_t pid,
4142
return proc_dir;
4243
}
4344

44-
std::string MemoryMonitorTestFixture::MockCgroupMemoryUsage(int64_t total_bytes,
45-
int64_t current_bytes,
46-
int64_t inactive_file_bytes,
47-
int64_t active_file_bytes) {
45+
std::string MemoryMonitorTestFixture::MockCgroupv2MemoryUsage(
46+
int64_t total_bytes,
47+
int64_t current_bytes,
48+
std::optional<int64_t> anon_memory_bytes,
49+
std::optional<int64_t> shmem_memory_bytes,
50+
int64_t inactive_file_bytes,
51+
int64_t active_file_bytes) {
4852
auto temp_dir_or = TempDirectory::Create();
4953
RAY_CHECK(temp_dir_or.ok()) << "Failed to create temp directory: "
5054
<< temp_dir_or.status().message();
5155
mock_cgroup_dirs_.push_back(std::move(temp_dir_or.value()));
5256

5357
const std::string &cgroup_path = mock_cgroup_dirs_.back()->GetPath();
5458

55-
mock_cgroup_files_.push_back(std::make_unique<TempFile>(cgroup_path + "/memory.max"));
59+
mock_cgroup_files_.push_back(std::make_unique<TempFile>(
60+
cgroup_path + "/" + MemoryMonitorUtils::kCgroupsV2MemoryMaxPath));
5661
mock_cgroup_files_.back()->AppendLine(std::to_string(total_bytes) + "\n");
5762

58-
mock_cgroup_files_.push_back(
59-
std::make_unique<TempFile>(cgroup_path + "/memory.current"));
63+
mock_cgroup_files_.push_back(std::make_unique<TempFile>(
64+
cgroup_path + "/" + MemoryMonitorUtils::kCgroupsV2MemoryUsagePath));
6065
mock_cgroup_files_.back()->AppendLine(std::to_string(current_bytes) + "\n");
6166

62-
mock_cgroup_files_.push_back(std::make_unique<TempFile>(cgroup_path + "/memory.stat"));
63-
mock_cgroup_files_.back()->AppendLine("anon 123456\n");
64-
mock_cgroup_files_.back()->AppendLine("inactive_file " +
65-
std::to_string(inactive_file_bytes) + "\n");
66-
mock_cgroup_files_.back()->AppendLine("active_file " +
67-
std::to_string(active_file_bytes) + "\n");
68-
mock_cgroup_files_.back()->AppendLine("some_other_key 789\n");
67+
mock_cgroup_files_.push_back(std::make_unique<TempFile>(
68+
cgroup_path + "/" + MemoryMonitorUtils::kCgroupsV2MemoryStatPath));
69+
if (anon_memory_bytes.has_value()) {
70+
mock_cgroup_files_.back()->AppendLine(
71+
std::string(MemoryMonitorUtils::kCgroupsV2MemoryAnonKey) + " " +
72+
std::to_string(*anon_memory_bytes) + "\n");
73+
}
74+
if (shmem_memory_bytes.has_value()) {
75+
mock_cgroup_files_.back()->AppendLine(
76+
std::string(MemoryMonitorUtils::kCgroupsV2MemoryShmemKey) + " " +
77+
std::to_string(*shmem_memory_bytes) + "\n");
78+
}
79+
mock_cgroup_files_.back()->AppendLine(
80+
std::string(MemoryMonitorUtils::kCgroupsV2MemoryStatInactiveFileKey) + " " +
81+
std::to_string(inactive_file_bytes) + "\n");
82+
mock_cgroup_files_.back()->AppendLine(
83+
std::string(MemoryMonitorUtils::kCgroupsV2MemoryStatActiveFileKey) + " " +
84+
std::to_string(active_file_bytes) + "\n");
85+
86+
return cgroup_path;
87+
}
88+
89+
std::string MemoryMonitorTestFixture::MockCgroupv1MemoryUsage(int64_t total_bytes,
90+
int64_t current_bytes,
91+
int64_t inactive_file_bytes,
92+
int64_t active_file_bytes) {
93+
auto temp_dir_or = TempDirectory::Create();
94+
RAY_CHECK(temp_dir_or.ok()) << "Failed to create temp directory: "
95+
<< temp_dir_or.status().message();
96+
mock_cgroup_dirs_.push_back(std::move(temp_dir_or.value()));
97+
98+
const std::string &cgroup_path = mock_cgroup_dirs_.back()->GetPath();
99+
100+
auto memory_dir_or = TempDirectory::Create(cgroup_path + "/memory");
101+
RAY_CHECK(memory_dir_or.ok())
102+
<< "Failed to create temp directory: " << memory_dir_or.status().message();
103+
mock_cgroup_dirs_.push_back(std::move(memory_dir_or.value()));
104+
105+
mock_cgroup_files_.push_back(std::make_unique<TempFile>(
106+
cgroup_path + "/" + MemoryMonitorUtils::kCgroupsV1MemoryMaxPath));
107+
mock_cgroup_files_.back()->AppendLine(std::to_string(total_bytes) + "\n");
108+
109+
mock_cgroup_files_.push_back(std::make_unique<TempFile>(
110+
cgroup_path + "/" + MemoryMonitorUtils::kCgroupsV1MemoryUsagePath));
111+
mock_cgroup_files_.back()->AppendLine(std::to_string(current_bytes) + "\n");
112+
113+
mock_cgroup_files_.push_back(std::make_unique<TempFile>(
114+
cgroup_path + "/" + MemoryMonitorUtils::kCgroupsV1MemoryStatPath));
115+
mock_cgroup_files_.back()->AppendLine(
116+
std::string(MemoryMonitorUtils::kCgroupsV1MemoryStatInactiveFileKey) + " " +
117+
std::to_string(inactive_file_bytes) + "\n");
118+
mock_cgroup_files_.back()->AppendLine(
119+
std::string(MemoryMonitorUtils::kCgroupsV1MemoryStatActiveFileKey) + " " +
120+
std::to_string(active_file_bytes) + "\n");
69121

70122
return cgroup_path;
71123
}

src/ray/common/memory_monitor_test_fixture.h

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818

1919
#include <cstdint>
2020
#include <memory>
21+
#include <optional>
2122
#include <string>
2223
#include <vector>
2324

@@ -74,14 +75,33 @@ class MemoryMonitorTestFixture : public ::testing::Test {
7475
*
7576
* @param total_bytes The value to write to memory.max (total memory limit).
7677
* @param current_bytes The value to write to memory.current (current usage).
78+
* @param anon_memory_bytes The anon to write to memory.stat (anonymous memory usage).
79+
* @param shmem_memory_bytes The shmem to write to memory.stat (shared memory usage).
7780
* @param inactive_file_bytes The inactive_file value in memory.stat.
7881
* @param active_file_bytes The active_file value in memory.stat.
7982
* @return The path to the created mock cgroup directory.
8083
*/
81-
std::string MockCgroupMemoryUsage(int64_t total_bytes,
82-
int64_t current_bytes,
83-
int64_t inactive_file_bytes,
84-
int64_t active_file_bytes);
84+
std::string MockCgroupv2MemoryUsage(int64_t total_bytes,
85+
int64_t current_bytes,
86+
std::optional<int64_t> anon_memory_bytes,
87+
std::optional<int64_t> shmem_memory_bytes,
88+
int64_t inactive_file_bytes,
89+
int64_t active_file_bytes);
90+
91+
/**
92+
* @brief Sets up a mock cgroup v1 directory for emulating memory usage and populates
93+
* the files with the provided mock memory values.
94+
*
95+
* @param total_bytes The value to write to memory/memory.limit_in_bytes.
96+
* @param current_bytes The value to write to memory/memory.usage_in_bytes.
97+
* @param inactive_file_bytes The total_inactive_file value in memory/memory.stat.
98+
* @param active_file_bytes The total_active_file value in memory/memory.stat.
99+
* @return The path to the created mock cgroup root directory.
100+
*/
101+
std::string MockCgroupv1MemoryUsage(int64_t total_bytes,
102+
int64_t current_bytes,
103+
int64_t inactive_file_bytes,
104+
int64_t active_file_bytes);
85105

86106
private:
87107
std::vector<std::unique_ptr<TempDirectory>> mock_proc_dirs_;

0 commit comments

Comments
 (0)