|
1 | 1 | { |
2 | | - "cells": [ |
3 | | - { |
4 | | - "cell_type": "markdown", |
5 | | - "id": "9259e514", |
6 | | - "metadata": {}, |
7 | | - "source": [ |
8 | | - "# Submitting a RayJob CR\n", |
9 | | - "\n", |
10 | | - "In this notebook, we will go through the basics of using the SDK to:\n", |
11 | | - " * Define a RayCluster configuration\n", |
12 | | - " * Use this configuration alongside a RayJob definition\n", |
13 | | - " * Submit the RayJob, and allow Kuberay Operator to lifecycle the RayCluster for the RayJob" |
14 | | - ] |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "id": "9259e514", |
| 6 | + "metadata": {}, |
| 7 | + "source": [ |
| 8 | + "# Submitting a RayJob CR\n", |
| 9 | + "\n", |
| 10 | + "In this notebook, we will go through the basics of using the SDK to:\n", |
| 11 | + " * Define a RayCluster configuration\n", |
| 12 | + " * Use this configuration alongside a RayJob definition\n", |
| 13 | + " * Submit the RayJob, and allow Kuberay Operator to lifecycle the RayCluster for the RayJob" |
| 14 | + ] |
| 15 | + }, |
| 16 | + { |
| 17 | + "cell_type": "markdown", |
| 18 | + "id": "18136ea7", |
| 19 | + "metadata": {}, |
| 20 | + "source": [ |
| 21 | + "## Defining and Submitting the RayJob\n", |
| 22 | + "First, we'll need to import the relevant CodeFlare SDK packages. You can do this by executing the below cell." |
| 23 | + ] |
| 24 | + }, |
| 25 | + { |
| 26 | + "cell_type": "code", |
| 27 | + "execution_count": null, |
| 28 | + "id": "51e18292", |
| 29 | + "metadata": {}, |
| 30 | + "outputs": [], |
| 31 | + "source": [ |
| 32 | + "from codeflare_sdk import RayJob, ManagedClusterConfig" |
| 33 | + ] |
| 34 | + }, |
| 35 | + { |
| 36 | + "cell_type": "markdown", |
| 37 | + "id": "649c5911", |
| 38 | + "metadata": {}, |
| 39 | + "source": [ |
| 40 | + "Run the below `oc login` command using your Token and Server URL. Ensure the command is prepended by `!` and not `%`. This will work when running both locally and within RHOAI." |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "code", |
| 45 | + "execution_count": null, |
| 46 | + "id": "dc364888", |
| 47 | + "metadata": {}, |
| 48 | + "outputs": [], |
| 49 | + "source": [ |
| 50 | + "!oc login --token=<your-token> --server=<your-server-url>" |
| 51 | + ] |
| 52 | + }, |
| 53 | + { |
| 54 | + "cell_type": "markdown", |
| 55 | + "id": "5581eca9", |
| 56 | + "metadata": {}, |
| 57 | + "source": [ |
| 58 | + "Next we'll need to define the ManagedClusterConfig. Kuberay will use this to spin up a short-lived RayCluster that will only exist as long as the job" |
| 59 | + ] |
| 60 | + }, |
| 61 | + { |
| 62 | + "cell_type": "code", |
| 63 | + "execution_count": null, |
| 64 | + "id": "3094c60a", |
| 65 | + "metadata": {}, |
| 66 | + "outputs": [], |
| 67 | + "source": [ |
| 68 | + "cluster_config = ManagedClusterConfig(\n", |
| 69 | + " head_memory_requests=6,\n", |
| 70 | + " head_memory_limits=8,\n", |
| 71 | + " num_workers=2,\n", |
| 72 | + " worker_cpu_requests=1,\n", |
| 73 | + " worker_cpu_limits=1,\n", |
| 74 | + " worker_memory_requests=4,\n", |
| 75 | + " worker_memory_limits=6,\n", |
| 76 | + " head_accelerators={'nvidia.com/gpu': 0},\n", |
| 77 | + " worker_accelerators={'nvidia.com/gpu': 0},\n", |
| 78 | + ")" |
| 79 | + ] |
| 80 | + }, |
| 81 | + { |
| 82 | + "cell_type": "markdown", |
| 83 | + "id": "02a2b32b", |
| 84 | + "metadata": {}, |
| 85 | + "source": [ |
| 86 | + "Lastly we can pass the ManagedClusterConfig into the RayJob and submit it. You do not need to worry about tearing down the cluster when the job has completed, that is handled for you!" |
| 87 | + ] |
| 88 | + }, |
| 89 | + { |
| 90 | + "cell_type": "code", |
| 91 | + "execution_count": null, |
| 92 | + "id": "e905ccea", |
| 93 | + "metadata": {}, |
| 94 | + "outputs": [], |
| 95 | + "source": [ |
| 96 | + "job = RayJob(\n", |
| 97 | + " job_name=\"demo-rayjob\",\n", |
| 98 | + " entrypoint=\"python -c 'print(\\\"Hello from RayJob!\\\")'\",\n", |
| 99 | + " cluster_config=cluster_config,\n", |
| 100 | + " namespace=\"your-namespace\",\n", |
| 101 | + " # local_queue is optional. If omitted, the SDK will auto-detect a default\n", |
| 102 | + " # Kueue LocalQueue. If Kueue is not installed, the job runs without it.\n", |
| 103 | + " # local_queue=\"my-queue\",\n", |
| 104 | + ")\n", |
| 105 | + "\n", |
| 106 | + "job.submit()" |
| 107 | + ] |
| 108 | + }, |
| 109 | + { |
| 110 | + "cell_type": "markdown", |
| 111 | + "id": "f3612de2", |
| 112 | + "metadata": {}, |
| 113 | + "source": [ |
| 114 | + "We can check the status of our job by executing the below cell. The status may appear as `unknown` for a time while the RayCluster spins up." |
| 115 | + ] |
| 116 | + }, |
| 117 | + { |
| 118 | + "cell_type": "code", |
| 119 | + "execution_count": null, |
| 120 | + "id": "96d92f93", |
| 121 | + "metadata": {}, |
| 122 | + "outputs": [], |
| 123 | + "source": [ |
| 124 | + "job.status()" |
| 125 | + ] |
| 126 | + } |
| 127 | + ], |
| 128 | + "metadata": { |
| 129 | + "kernelspec": { |
| 130 | + "display_name": "base", |
| 131 | + "language": "python", |
| 132 | + "name": "python3" |
| 133 | + }, |
| 134 | + "language_info": { |
| 135 | + "codemirror_mode": { |
| 136 | + "name": "ipython", |
| 137 | + "version": 3 |
| 138 | + }, |
| 139 | + "file_extension": ".py", |
| 140 | + "mimetype": "text/x-python", |
| 141 | + "name": "python", |
| 142 | + "nbconvert_exporter": "python", |
| 143 | + "pygments_lexer": "ipython3", |
| 144 | + "version": "3.12.7" |
| 145 | + } |
15 | 146 | }, |
16 | | - { |
17 | | - "cell_type": "markdown", |
18 | | - "id": "18136ea7", |
19 | | - "metadata": {}, |
20 | | - "source": [ |
21 | | - "## Defining and Submitting the RayJob\n", |
22 | | - "First, we'll need to import the relevant CodeFlare SDK packages. You can do this by executing the below cell." |
23 | | - ] |
24 | | - }, |
25 | | - { |
26 | | - "cell_type": "code", |
27 | | - "execution_count": null, |
28 | | - "id": "51e18292", |
29 | | - "metadata": {}, |
30 | | - "outputs": [], |
31 | | - "source": [ |
32 | | - "from codeflare_sdk import RayJob, ManagedClusterConfig" |
33 | | - ] |
34 | | - }, |
35 | | - { |
36 | | - "cell_type": "markdown", |
37 | | - "id": "649c5911", |
38 | | - "metadata": {}, |
39 | | - "source": [ |
40 | | - "Run the below `oc login` command using your Token and Server URL. Ensure the command is prepended by `!` and not `%`. This will work when running both locally and within RHOAI." |
41 | | - ] |
42 | | - }, |
43 | | - { |
44 | | - "cell_type": "code", |
45 | | - "execution_count": null, |
46 | | - "id": "dc364888", |
47 | | - "metadata": {}, |
48 | | - "outputs": [], |
49 | | - "source": [ |
50 | | - "!oc login --token=<your-token> --server=<your-server-url>" |
51 | | - ] |
52 | | - }, |
53 | | - { |
54 | | - "cell_type": "markdown", |
55 | | - "id": "5581eca9", |
56 | | - "metadata": {}, |
57 | | - "source": [ |
58 | | - "Next we'll need to define the ManagedClusterConfig. Kuberay will use this to spin up a short-lived RayCluster that will only exist as long as the job" |
59 | | - ] |
60 | | - }, |
61 | | - { |
62 | | - "cell_type": "code", |
63 | | - "execution_count": null, |
64 | | - "id": "3094c60a", |
65 | | - "metadata": {}, |
66 | | - "outputs": [], |
67 | | - "source": [ |
68 | | - "cluster_config = ManagedClusterConfig(\n", |
69 | | - " head_memory_requests=6,\n", |
70 | | - " head_memory_limits=8,\n", |
71 | | - " num_workers=2,\n", |
72 | | - " worker_cpu_requests=1,\n", |
73 | | - " worker_cpu_limits=1,\n", |
74 | | - " worker_memory_requests=4,\n", |
75 | | - " worker_memory_limits=6,\n", |
76 | | - " head_accelerators={'nvidia.com/gpu': 0},\n", |
77 | | - " worker_accelerators={'nvidia.com/gpu': 0},\n", |
78 | | - ")" |
79 | | - ] |
80 | | - }, |
81 | | - { |
82 | | - "cell_type": "markdown", |
83 | | - "id": "02a2b32b", |
84 | | - "metadata": {}, |
85 | | - "source": [ |
86 | | - "Lastly we can pass the ManagedClusterConfig into the RayJob and submit it. You do not need to worry about tearing down the cluster when the job has completed, that is handled for you!" |
87 | | - ] |
88 | | - }, |
89 | | - { |
90 | | - "cell_type": "code", |
91 | | - "execution_count": null, |
92 | | - "id": "e905ccea", |
93 | | - "metadata": {}, |
94 | | - "outputs": [], |
95 | | - "source": [ |
96 | | - "job = RayJob(\n", |
97 | | - " job_name=\"demo-rayjob\",\n", |
98 | | - " entrypoint=\"python -c 'print(\\\"Hello from RayJob!\\\")'\",\n", |
99 | | - " cluster_config=cluster_config,\n", |
100 | | - " namespace=\"your-namespace\"\n", |
101 | | - ")\n", |
102 | | - "\n", |
103 | | - "job.submit()" |
104 | | - ] |
105 | | - }, |
106 | | - { |
107 | | - "cell_type": "markdown", |
108 | | - "id": "f3612de2", |
109 | | - "metadata": {}, |
110 | | - "source": [ |
111 | | - "We can check the status of our job by executing the below cell. The status may appear as `unknown` for a time while the RayCluster spins up." |
112 | | - ] |
113 | | - }, |
114 | | - { |
115 | | - "cell_type": "code", |
116 | | - "execution_count": null, |
117 | | - "id": "96d92f93", |
118 | | - "metadata": {}, |
119 | | - "outputs": [], |
120 | | - "source": [ |
121 | | - "job.status()" |
122 | | - ] |
123 | | - } |
124 | | - ], |
125 | | - "metadata": { |
126 | | - "kernelspec": { |
127 | | - "display_name": "Python 3", |
128 | | - "language": "python", |
129 | | - "name": "python3" |
130 | | - }, |
131 | | - "language_info": { |
132 | | - "codemirror_mode": { |
133 | | - "name": "ipython", |
134 | | - "version": 3 |
135 | | - }, |
136 | | - "file_extension": ".py", |
137 | | - "mimetype": "text/x-python", |
138 | | - "name": "python", |
139 | | - "nbconvert_exporter": "python", |
140 | | - "pygments_lexer": "ipython3", |
141 | | - "version": "3.11.11" |
142 | | - } |
143 | | - }, |
144 | | - "nbformat": 4, |
145 | | - "nbformat_minor": 5 |
| 147 | + "nbformat": 4, |
| 148 | + "nbformat_minor": 5 |
146 | 149 | } |
0 commit comments