Skip to content

Commit ea0c0e3

Browse files
committed
RHAIENG-2063: Add new RayCluster object
1 parent b4b26b9 commit ea0c0e3

25 files changed

Lines changed: 5459 additions & 110 deletions
Lines changed: 110 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,121 +1,144 @@
11
Ray Cluster Configuration
22
=========================
33

4-
To create Ray Clusters using the CodeFlare SDK a cluster configuration
5-
needs to be created first. This is what a typical cluster configuration
6-
would look like; Note: The values for CPU and Memory are at the minimum
7-
requirements for creating the Ray Cluster.
4+
To create Ray Clusters using the CodeFlare SDK, you create a ``RayCluster`` object
5+
with your desired configuration. Here is a typical cluster configuration
6+
with minimum resource requirements:
87

98
.. code:: python
109
11-
from codeflare_sdk import Cluster, ClusterConfiguration
12-
13-
cluster = Cluster(ClusterConfiguration(
14-
name='ray-example', # Mandatory Field
15-
namespace='default', # Default None
16-
head_cpu_requests=1, # Default 2
17-
head_cpu_limits=1, # Default 2
18-
head_memory_requests=1, # Default 8
19-
head_memory_limits=1, # Default 8
20-
head_extended_resource_requests={'nvidia.com/gpu':0}, # Default 0
21-
worker_extended_resource_requests={'nvidia.com/gpu':0}, # Default 0
22-
num_workers=1, # Default 1
23-
worker_cpu_requests=1, # Default 1
24-
worker_cpu_limits=1, # Default 1
25-
worker_memory_requests=2, # Default 2
26-
worker_memory_limits=2, # Default 2
27-
# image="", # Optional Field
10+
from codeflare_sdk.ray.rayclusters import RayCluster
11+
12+
cluster = RayCluster(
13+
name='ray-example', # Required for standalone clusters
14+
namespace='default', # Default None (uses current namespace)
15+
head_cpu_requests=1, # Default 2
16+
head_cpu_limits=1, # Default 2
17+
head_memory_requests=1, # Default 8 (in Gi)
18+
head_memory_limits=1, # Default 8 (in Gi)
19+
head_accelerators={'nvidia.com/gpu': 0}, # Default {}
20+
worker_accelerators={'nvidia.com/gpu': 0}, # Default {}
21+
num_workers=1, # Default 1
22+
worker_cpu_requests=1, # Default 1
23+
worker_cpu_limits=1, # Default 1
24+
worker_memory_requests=2, # Default 2 (in Gi)
25+
worker_memory_limits=2, # Default 2 (in Gi)
26+
# image="", # Optional: custom Ray image
2827
labels={"exampleLabel": "example", "secondLabel": "example"},
29-
annotations={"key1":"value1", "key2":"value2"},
30-
volumes=[], # See Custom Volumes/Volume Mounts
31-
volume_mounts=[], # See Custom Volumes/Volume Mounts
32-
))
28+
annotations={"key1": "value1", "key2": "value2"},
29+
volumes=[], # See Custom Volumes/Volume Mounts
30+
volume_mounts=[], # See Custom Volumes/Volume Mounts
31+
)
3332
3433
.. note::
35-
The default images used by the CodeFlare SDK for creating
36-
a RayCluster resource depend on the installed Python version:
34+
The default images used by the CodeFlare SDK for creating
35+
a RayCluster resource depend on the installed Python version:
3736

38-
- For Python 3.11: `quay.io/modh/ray:2.52.1-py311-cu121`
37+
- For Python 3.11: `quay.io/modh/ray:2.52.1-py311-cu121`
3938

40-
If you prefer to use a custom Ray image that better suits your
41-
needs, you can specify it in the image field to override the default.
42-
If you are using ROCm compatible GPUs you
43-
can use `quay.io/modh/ray:2.52.1-py311-rocm62`. You can also find
44-
documentation on building a custom image
45-
`here <https://github.com/opendatahub-io/distributed-workloads/tree/main/images/runtime/examples>`__.
39+
If you prefer to use a custom Ray image that better suits your
40+
needs, you can specify it in the image field to override the default.
41+
If you are using ROCm compatible GPUs you
42+
can use `quay.io/modh/ray:2.52.1-py311-rocm62`. You can also find
43+
documentation on building a custom image
44+
`here <https://github.com/opendatahub-io/distributed-workloads/tree/main/images/runtime/examples>`__.
4645

4746
Ray Usage Statistics
48-
-------------------
47+
--------------------
4948

50-
By default, Ray usage statistics collection is **disabled** in Ray Clusters created with the Codeflare SDK. This prevents statistics from being captured and sent externally. If you want to enable usage statistics collection, you can simply set the ``enable_usage_stats`` parameter to ``True`` in your cluster configuration:
49+
By default, Ray usage statistics collection is **disabled** in Ray Clusters created with the CodeFlare SDK. This prevents statistics from being captured and sent externally. If you want to enable usage statistics collection, you can simply set the ``enable_usage_stats`` parameter to ``True`` in your cluster configuration:
5150

5251
.. code:: python
5352
54-
from codeflare_sdk import Cluster, ClusterConfiguration
53+
from codeflare_sdk.ray.rayclusters import RayCluster
5554
56-
cluster = Cluster(ClusterConfiguration(
55+
cluster = RayCluster(
5756
name='ray-example',
5857
namespace='default',
5958
enable_usage_stats=True
60-
))
59+
)
6160
6261
This will automatically set the ``RAY_USAGE_STATS_ENABLED`` environment variable to ``1`` for all Ray pods in the cluster. If you do not set this parameter, usage statistics will remain disabled (``RAY_USAGE_STATS_ENABLED=0``).
6362

6463
The ``labels={"exampleLabel": "example"}`` parameter can be used to
6564
apply additional labels to the RayCluster resource.
6665

67-
After creating their ``cluster``, a user can call ``cluster.apply()`` and
68-
``cluster.down()`` to respectively create or remove the Ray Cluster.
66+
After creating your ``cluster``, call ``cluster.apply()`` to deploy it and
67+
``cluster.down()`` to remove it:
68+
69+
.. code:: python
70+
71+
# Deploy the cluster
72+
cluster.apply()
73+
74+
# Wait for cluster to be ready
75+
cluster.wait_ready()
76+
77+
# Check cluster status
78+
cluster.status()
79+
80+
# Remove the cluster when done
81+
cluster.down()
6982
7083
Custom Volumes/Volume Mounts
7184
----------------------------
72-
| To add custom Volumes and Volume Mounts to your Ray Cluster you need to create two lists ``volumes`` and ``volume_mounts``. The lists consist of ``V1Volume`` and ``V1VolumeMount`` objects respectively.
73-
| Populating these parameters will create Volumes and Volume Mounts for the head and each worker pod.
85+
86+
To add custom Volumes and Volume Mounts to your Ray Cluster you need to create two lists ``volumes`` and ``volume_mounts``. The lists consist of ``V1Volume`` and ``V1VolumeMount`` objects respectively.
87+
Populating these parameters will create Volumes and Volume Mounts for the head and each worker pod.
7488

7589
.. code:: python
7690
77-
from kubernetes.client import V1Volume, V1VolumeMount, V1EmptyDirVolumeSource, V1ConfigMapVolumeSource, V1KeyToPath, V1SecretVolumeSource
78-
# In this example we are using the Config Map, EmptyDir and Secret Volume types
79-
volume_mounts_list = [
80-
V1VolumeMount(
81-
mount_path="/home/ray/test1",
82-
name = "test"
83-
),
84-
V1VolumeMount(
85-
mount_path = "/home/ray/test2",
86-
name = "test2",
87-
),
88-
V1VolumeMount(
89-
mount_path = "/home/ray/test3",
90-
name = "test3",
91-
)
92-
]
93-
94-
volumes_list = [
95-
V1Volume(
96-
name="test",
97-
empty_dir=V1EmptyDirVolumeSource(size_limit="2Gi"),
98-
),
99-
V1Volume(
100-
name="test2",
101-
config_map=V1ConfigMapVolumeSource(
102-
name="test-config-map",
103-
items=[V1KeyToPath(key="test", path="data.txt")]
104-
)
105-
),
106-
V1Volume(
107-
name="test3",
108-
secret=V1SecretVolumeSource(
109-
secret_name="test-secret"
110-
)
111-
)
112-
]
113-
114-
| For more information on creating Volumes and Volume Mounts with Python check out the Python Kubernetes docs (`Volumes <https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1Volume.md>`__, `Volume Mounts <https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1VolumeMount.md>`__).
115-
| You can also find further information on Volumes and Volume Mounts by visiting the Kubernetes `documentation <https://kubernetes.io/docs/concepts/storage/volumes/>`__.
91+
from kubernetes.client import V1Volume, V1VolumeMount, V1EmptyDirVolumeSource, V1ConfigMapVolumeSource, V1KeyToPath, V1SecretVolumeSource
92+
93+
# In this example we are using the Config Map, EmptyDir and Secret Volume types
94+
volume_mounts_list = [
95+
V1VolumeMount(
96+
mount_path="/home/ray/test1",
97+
name="test"
98+
),
99+
V1VolumeMount(
100+
mount_path="/home/ray/test2",
101+
name="test2",
102+
),
103+
V1VolumeMount(
104+
mount_path="/home/ray/test3",
105+
name="test3",
106+
)
107+
]
108+
109+
volumes_list = [
110+
V1Volume(
111+
name="test",
112+
empty_dir=V1EmptyDirVolumeSource(size_limit="2Gi"),
113+
),
114+
V1Volume(
115+
name="test2",
116+
config_map=V1ConfigMapVolumeSource(
117+
name="test-config-map",
118+
items=[V1KeyToPath(key="test", path="data.txt")]
119+
)
120+
),
121+
V1Volume(
122+
name="test3",
123+
secret=V1SecretVolumeSource(
124+
secret_name="test-secret"
125+
)
126+
)
127+
]
128+
129+
# Use in RayCluster
130+
cluster = RayCluster(
131+
name='ray-example',
132+
volumes=volumes_list,
133+
volume_mounts=volume_mounts_list,
134+
)
135+
136+
For more information on creating Volumes and Volume Mounts with Python check out the Python Kubernetes docs (`Volumes <https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1Volume.md>`__, `Volume Mounts <https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1VolumeMount.md>`__).
137+
You can also find further information on Volumes and Volume Mounts by visiting the Kubernetes `documentation <https://kubernetes.io/docs/concepts/storage/volumes/>`__.
116138

117139
GCS Fault Tolerance
118-
------------------
140+
-------------------
141+
119142
By default, the state of the Ray cluster is transient to the head Pod. Whatever triggers a restart of the head Pod results in losing that state, including Ray Cluster history. To make Ray cluster state persistent you can enable Global Control Service (GCS) fault tolerance with an external Redis storage.
120143

121144
To configure GCS fault tolerance you need to set the following parameters:
@@ -139,9 +162,9 @@ Example configuration:
139162

140163
.. code:: python
141164
142-
from codeflare_sdk import Cluster, ClusterConfiguration
165+
from codeflare_sdk.ray.rayclusters import RayCluster
143166
144-
cluster = Cluster(ClusterConfiguration(
167+
cluster = RayCluster(
145168
name='ray-cluster-with-persistence',
146169
num_workers=2,
147170
enable_gcs_ft=True,
@@ -150,8 +173,8 @@ Example configuration:
150173
"name": "redis-password-secret",
151174
"key": "password"
152175
},
153-
# external_storage_namespace="my-custom-namespace" # Optional: Custom namespace for GCS data in Redis
154-
))
176+
# external_storage_namespace="my-custom-namespace" # Optional
177+
)
155178
156179
.. note::
157180
You need to have a Redis instance deployed in your Kubernetes cluster before using this feature.

0 commit comments

Comments
 (0)