Skip to content

Commit fbfd3cb

Browse files
CathalOConnorRHlaurafitzgerald
authored andcommitted
Add migration scripts from RHOAI 2.x to RHOAI 3.x
1 parent 74aa137 commit fbfd3cb

4 files changed

Lines changed: 3136 additions & 0 deletions

File tree

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
# RHOAI RayCluster Migration Guide
2+
3+
This guide walks you through migrating your RayClusters from RHOAI 2.x to RHOAI 3.x.
4+
5+
## Overview
6+
7+
The migration tool helps you:
8+
1. **Back up** your RayCluster configurations before the upgrade
9+
2. **Verify** your cluster is ready for the upgrade (automatic pre-flight checks)
10+
3. **Migrate** your RayClusters after the upgrade is complete
11+
12+
The tool is designed to be safe and predictable:
13+
- **Staged approach**: Test on a single cluster before migrating everything
14+
- **Idempotent**: Safe to run multiple times
15+
- **Non-destructive**: Backups are created, nothing is deleted automatically
16+
17+
---
18+
19+
## Prerequisites
20+
21+
### 1. Python Environment
22+
23+
Python 3.6 or later is required.
24+
25+
```bash
26+
python3 --version
27+
```
28+
29+
### 2. Install Required Packages
30+
31+
```bash
32+
pip install -r ray_cluster_migration_requirements.txt
33+
```
34+
35+
Or install directly:
36+
37+
```bash
38+
pip install kubernetes>=28.1.0 PyYAML>=6.0
39+
```
40+
41+
### 3. Cluster Access
42+
43+
Verify you can connect to your OpenShift cluster:
44+
45+
```bash
46+
oc whoami
47+
oc get rayclusters --all-namespaces
48+
```
49+
50+
---
51+
52+
## Step 1: Pre-Upgrade (Before RHOAI Upgrade)
53+
54+
Run the pre-upgrade command to verify prerequisites and back up your RayClusters:
55+
56+
```bash
57+
python ray_cluster_migration.py pre-upgrade
58+
```
59+
60+
The script will:
61+
1. Prompt for a backup directory (default: `./raycluster-backups`)
62+
2. Run automatic pre-flight checks (permissions, cert-manager, codeflare-operator status)
63+
3. Back up your RayCluster configurations
64+
65+
**If any required checks fail, the script will stop and tell you exactly what needs to be fixed before you can proceed.**
66+
67+
### Pre-Upgrade Options
68+
69+
```bash
70+
# Back up a specific namespace only
71+
python ray_cluster_migration.py pre-upgrade --namespace my-namespace
72+
73+
# Back up a single cluster (for testing)
74+
python ray_cluster_migration.py pre-upgrade --cluster my-cluster --namespace my-namespace
75+
76+
# Specify backup directory directly
77+
python ray_cluster_migration.py pre-upgrade ./my-backup-directory
78+
```
79+
80+
---
81+
82+
## Step 2: Perform the RHOAI Upgrade
83+
84+
Follow your standard RHOAI upgrade procedure to upgrade from RHOAI 2.x to RHOAI 3.x.
85+
86+
---
87+
88+
## Step 3: Post-Upgrade (After RHOAI Upgrade)
89+
90+
After the RHOAI upgrade is complete, migrate your RayClusters.
91+
92+
### Recommended: Staged Migration
93+
94+
We recommend migrating in stages to verify everything works correctly:
95+
96+
**Stage 1: Test with a single cluster**
97+
```bash
98+
# Preview first
99+
python ray_cluster_migration.py post-upgrade --cluster my-cluster --namespace my-namespace --dry-run
100+
101+
# Run the migration
102+
python ray_cluster_migration.py post-upgrade --cluster my-cluster --namespace my-namespace
103+
```
104+
105+
**Stage 2: Migrate a namespace**
106+
```bash
107+
python ray_cluster_migration.py post-upgrade --namespace my-namespace
108+
```
109+
110+
**Stage 3: Migrate all remaining clusters**
111+
```bash
112+
python ray_cluster_migration.py post-upgrade
113+
```
114+
115+
### Post-Upgrade Options
116+
117+
```bash
118+
# Skip confirmation prompt (for automation)
119+
python ray_cluster_migration.py post-upgrade --yes
120+
121+
# Preview changes without making them
122+
python ray_cluster_migration.py post-upgrade --dry-run
123+
```
124+
125+
### Restore from Backup
126+
127+
If your clusters were deleted during the upgrade or you prefer to restore from backup files:
128+
129+
```bash
130+
# Restore all clusters from backup directory
131+
python ray_cluster_migration.py post-upgrade --from-backup ./raycluster-backups
132+
133+
# Restore a single cluster from backup
134+
python ray_cluster_migration.py post-upgrade --from-backup ./raycluster-backups --cluster my-cluster --namespace my-namespace
135+
136+
# Restore from a single backup file
137+
python ray_cluster_migration.py post-upgrade --from-backup ./raycluster-backups/raycluster-my-cluster-my-namespace.yaml
138+
```
139+
140+
**Important:** `--from-backup` will **delete** any existing cluster with the same name before creating it from the backup. This ensures a clean restore.
141+
142+
---
143+
144+
## Check Migration Status
145+
146+
Check which clusters need migration at any time:
147+
148+
```bash
149+
python ray_cluster_migration.py list
150+
```
151+
152+
---
153+
154+
## Troubleshooting
155+
156+
### Pre-flight check failed: cert-manager not detected
157+
158+
Install cert-manager via OperatorHub in your OpenShift cluster:
159+
160+
1. Go to OperatorHub in the OpenShift console
161+
2. Search for "cert-manager"
162+
3. Install the cert-manager operator
163+
4. Wait for it to be ready
164+
5. Run pre-upgrade again
165+
166+
### Pre-flight check failed: codeflare-operator not Removed
167+
168+
Update your DataScienceCluster:
169+
170+
```bash
171+
oc patch datasciencecluster <dsc-name> --type merge -p '{"spec":{"components":{"codeflare":{"managementState":"Removed"}}}}'
172+
```
173+
174+
### Migration failed for a cluster
175+
176+
1. Check the error message for details
177+
2. Verify cluster health: `oc get raycluster <name> -n <namespace>`
178+
3. Retry just that cluster: `python ray_cluster_migration.py post-upgrade --cluster <name> --namespace <namespace>`
179+
180+
### Route not available after migration
181+
182+
Routes may take 30-60 seconds to be created. Check directly:
183+
184+
```bash
185+
oc get httproute -n <namespace>
186+
```
187+
188+
---
189+
190+
## Quick Reference
191+
192+
```bash
193+
# Before RHOAI upgrade
194+
python ray_cluster_migration.py pre-upgrade
195+
196+
# After RHOAI upgrade (staged approach)
197+
python ray_cluster_migration.py post-upgrade --cluster test-cluster --namespace dev --dry-run
198+
python ray_cluster_migration.py post-upgrade --cluster test-cluster --namespace dev
199+
python ray_cluster_migration.py post-upgrade --namespace dev
200+
python ray_cluster_migration.py post-upgrade
201+
202+
# Check status anytime
203+
python ray_cluster_migration.py list
204+
```
205+
206+
## Command Reference
207+
208+
| Command | Description |
209+
|---------|-------------|
210+
| `pre-upgrade` | Run pre-flight checks and backup RayClusters |
211+
| `post-upgrade` | Migrate RayClusters after RHOAI upgrade |
212+
| `list` | Show all RayClusters and their migration status |
213+
| `delete` | [Advanced] Delete RayClusters |
214+
| `import` | [Advanced] Restore RayClusters from backup |
215+
216+
### Common Options
217+
218+
| Option | Description |
219+
|--------|-------------|
220+
| `--cluster NAME` | Target a specific cluster (requires `--namespace`) |
221+
| `--namespace NS` | Target a specific namespace |
222+
| `--dry-run` | Preview changes without applying them |
223+
| `--yes` | Skip confirmation prompts |
224+
| `--from-backup PATH` | (post-upgrade only) Restore from backup file or directory. Deletes existing clusters before recreating. |

0 commit comments

Comments
 (0)