Skip to content

Commit 496a8d2

Browse files
CathalOConnorRHlaurafitzgerald
authored andcommitted
Add migration scripts from RHOAI 2.x to RHOAI 3.x
1 parent 838dea6 commit 496a8d2

4 files changed

Lines changed: 2253 additions & 0 deletions

File tree

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# RHOAI RayCluster Migration Guide
2+
3+
This guide walks you through migrating your RayClusters from RHOAI 2.x to RHOAI 3.x.
4+
5+
## Overview
6+
7+
The migration tool helps you:
8+
1. **Back up** your RayCluster configurations before the upgrade
9+
2. **Verify** your cluster is ready for the upgrade (automatic pre-flight checks)
10+
3. **Migrate** your RayClusters after the upgrade is complete
11+
12+
The tool is designed to be safe and predictable:
13+
- **Staged approach**: Test on a single cluster before migrating everything
14+
- **Idempotent**: Safe to run multiple times
15+
- **Non-destructive**: Backups are created, nothing is deleted automatically
16+
17+
---
18+
19+
## Prerequisites
20+
21+
### 1. Python Environment
22+
23+
Python 3.6 or later is required.
24+
25+
```bash
26+
python3 --version
27+
```
28+
29+
### 2. Install Required Packages
30+
31+
```bash
32+
pip install -r ray_cluster_migration_requirements.txt
33+
```
34+
35+
Or install directly:
36+
37+
```bash
38+
pip install kubernetes>=28.1.0 PyYAML>=6.0
39+
```
40+
41+
### 3. Cluster Access
42+
43+
Verify you can connect to your OpenShift cluster:
44+
45+
```bash
46+
oc whoami
47+
oc get rayclusters --all-namespaces
48+
```
49+
50+
---
51+
52+
## Step 1: Pre-Upgrade (Before RHOAI Upgrade)
53+
54+
Run the pre-upgrade command to verify prerequisites and back up your RayClusters:
55+
56+
```bash
57+
python ray_cluster_migration.py pre-upgrade
58+
```
59+
60+
The script will:
61+
1. Prompt for a backup directory (default: `./raycluster-backups`)
62+
2. Run automatic pre-flight checks (permissions, cert-manager, codeflare-operator status)
63+
3. Back up your RayCluster configurations
64+
65+
**If any required checks fail, the script will stop and tell you exactly what needs to be fixed before you can proceed.**
66+
67+
### Pre-Upgrade Options
68+
69+
```bash
70+
# Back up a specific namespace only
71+
python ray_cluster_migration.py pre-upgrade --namespace my-namespace
72+
73+
# Back up a single cluster (for testing)
74+
python ray_cluster_migration.py pre-upgrade --cluster my-cluster --namespace my-namespace
75+
76+
# Specify backup directory directly
77+
python ray_cluster_migration.py pre-upgrade ./my-backup-directory
78+
```
79+
80+
---
81+
82+
## Step 2: Perform the RHOAI Upgrade
83+
84+
Follow your standard RHOAI upgrade procedure to upgrade from RHOAI 2.x to RHOAI 3.x.
85+
86+
---
87+
88+
## Step 3: Post-Upgrade (After RHOAI Upgrade)
89+
90+
After the RHOAI upgrade is complete, migrate your RayClusters.
91+
92+
### Recommended: Staged Migration
93+
94+
We recommend migrating in stages to verify everything works correctly:
95+
96+
**Stage 1: Test with a single cluster**
97+
```bash
98+
# Preview first
99+
python ray_cluster_migration.py post-upgrade --cluster my-cluster --namespace my-namespace --dry-run
100+
101+
# Run the migration
102+
python ray_cluster_migration.py post-upgrade --cluster my-cluster --namespace my-namespace
103+
```
104+
105+
**Stage 2: Migrate a namespace**
106+
```bash
107+
python ray_cluster_migration.py post-upgrade --namespace my-namespace
108+
```
109+
110+
**Stage 3: Migrate all remaining clusters**
111+
```bash
112+
python ray_cluster_migration.py post-upgrade
113+
```
114+
115+
### Post-Upgrade Options
116+
117+
```bash
118+
# Skip confirmation prompt (for automation)
119+
python ray_cluster_migration.py post-upgrade --yes
120+
121+
# Preview changes without making them
122+
python ray_cluster_migration.py post-upgrade --dry-run
123+
```
124+
125+
---
126+
127+
## Check Migration Status
128+
129+
Check which clusters need migration at any time:
130+
131+
```bash
132+
python ray_cluster_migration.py list
133+
```
134+
135+
---
136+
137+
## Troubleshooting
138+
139+
### Pre-flight check failed: cert-manager not detected
140+
141+
Install cert-manager via OperatorHub in your OpenShift cluster:
142+
143+
1. Go to OperatorHub in the OpenShift console
144+
2. Search for "cert-manager"
145+
3. Install the cert-manager operator
146+
4. Wait for it to be ready
147+
5. Run pre-upgrade again
148+
149+
### Pre-flight check failed: codeflare-operator not Removed
150+
151+
Update your DataScienceCluster:
152+
153+
```bash
154+
oc patch datasciencecluster <dsc-name> --type merge -p '{"spec":{"components":{"codeflare":{"managementState":"Removed"}}}}'
155+
```
156+
157+
### Migration failed for a cluster
158+
159+
1. Check the error message for details
160+
2. Verify cluster health: `oc get raycluster <name> -n <namespace>`
161+
3. Retry just that cluster: `python ray_cluster_migration.py post-upgrade --cluster <name> --namespace <namespace>`
162+
163+
### Route not available after migration
164+
165+
Routes may take 30-60 seconds to be created. Check directly:
166+
167+
```bash
168+
oc get httproute -n <namespace>
169+
```
170+
171+
---
172+
173+
## Quick Reference
174+
175+
```bash
176+
# Before RHOAI upgrade
177+
python ray_cluster_migration.py pre-upgrade
178+
179+
# After RHOAI upgrade (staged approach)
180+
python ray_cluster_migration.py post-upgrade --cluster test-cluster --namespace dev --dry-run
181+
python ray_cluster_migration.py post-upgrade --cluster test-cluster --namespace dev
182+
python ray_cluster_migration.py post-upgrade --namespace dev
183+
python ray_cluster_migration.py post-upgrade
184+
185+
# Check status anytime
186+
python ray_cluster_migration.py list
187+
```
188+
189+
## Command Reference
190+
191+
| Command | Description |
192+
|---------|-------------|
193+
| `pre-upgrade` | Run pre-flight checks and backup RayClusters |
194+
| `post-upgrade` | Migrate RayClusters after RHOAI upgrade |
195+
| `list` | Show all RayClusters and their migration status |
196+
| `delete` | [Advanced] Delete RayClusters |
197+
| `import` | [Advanced] Restore RayClusters from backup |
198+
199+
### Common Options
200+
201+
| Option | Description |
202+
|--------|-------------|
203+
| `--cluster NAME` | Target a specific cluster (requires `--namespace`) |
204+
| `--namespace NS` | Target a specific namespace |
205+
| `--dry-run` | Preview changes without applying them |
206+
| `--yes` | Skip confirmation prompts |

0 commit comments

Comments
 (0)