Skip to content

Commit bd8a8b3

Browse files
haimariclaude
andcommitted
docs: Update migration guide with GitOps approach and clarify node provisioning
- Replace all kubectl apply commands with GitOps/FluxCD workflow - Add Prerequisites section emphasizing FluxCD and GitOps repository requirements - Add detailed explanation of node provisioning trigger mechanism with real example - Show step-by-step process: Terraform scale-down → pod eviction → unschedulable pods → Karpenter provisioning - Include FluxCD reconciliation commands and HelmRelease example - Clean up corrupted content from previous complex migration approach - Emphasize GitOps best practices throughout the migration process 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 703aaf6 commit bd8a8b3

1 file changed

Lines changed: 115 additions & 62 deletions

File tree

docs/migration-zero-downtime.md

Lines changed: 115 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,19 @@ This guide provides a **simple, safe approach** for migrating from Terraform-man
1313

1414
---
1515

16+
## 🔧 **Prerequisites**
17+
18+
Before starting the migration, ensure you have:
19+
20+
- **Oracle Kubernetes Engine (OKE)** cluster running
21+
- **OCI credentials** configured (CLI or Instance Principal)
22+
- **FluxCD v2.x** installed and managing your cluster
23+
- **GitOps repository** where Kubernetes manifests are stored
24+
- **`flux` CLI** installed locally for manual reconciliation
25+
- **Terraform** managing your current node pools
26+
27+
---
28+
1629
## 🏗️ **Pre-Migration Assessment**
1730

1831
### **1. Inventory Current State**
@@ -111,20 +124,42 @@ The key to zero-downtime migration is installing Karpenter alongside existing in
111124

112125
#### **Step 1.1: Install Karpenter (Non-Disruptive)**
113126

127+
```yaml
128+
# Add to your GitOps repository: karpenter/karpenter-release.yaml
129+
apiVersion: helm.toolkit.fluxcd.io/v2beta1
130+
kind: HelmRelease
131+
metadata:
132+
name: karpenter-oci
133+
namespace: karpenter
134+
spec:
135+
interval: 15m
136+
chart:
137+
spec:
138+
chart: karpenter-oci
139+
version: "0.1.57" # Use latest version
140+
sourceRef:
141+
kind: HelmRepository
142+
name: karpenter-oci
143+
namespace: karpenter
144+
values:
145+
oci:
146+
region: us-ashburn-1
147+
compartmentId: "ocid1.compartment.oc1..."
148+
clusterId: "ocid1.cluster.oc1..."
149+
existingSecret: "oci-config"
150+
```
151+
114152
```bash
115-
# 1. Create Karpenter namespace and install
116-
kubectl create namespace karpenter
117-
118-
# 2. Install Karpenter with careful configuration
119-
helm install karpenter karpenter-oci/karpenter-oci \
120-
--namespace karpenter \
121-
--set oci.region=$OCI_REGION \
122-
--set oci.compartmentId=$OCI_COMPARTMENT_ID \
123-
--set oci.clusterId=$OCI_CLUSTER_ID \
124-
--set oci.existingSecret="oci-config" \
125-
--wait
126-
127-
# 3. Verify Karpenter is running but not managing anything yet
153+
# Commit and deploy via GitOps
154+
git add karpenter/karpenter-release.yaml
155+
git commit -m "Install Karpenter OCI Provider for migration"
156+
git push origin main
157+
158+
# Trigger FluxCD reconciliation
159+
flux reconcile source git flux-system
160+
flux reconcile helmrelease karpenter-oci -n karpenter
161+
162+
# Verify Karpenter is running
128163
kubectl get deployment -n karpenter
129164
kubectl get nodepools -A # Should be empty initially
130165
```
@@ -191,12 +226,16 @@ spec:
191226
```
192227

193228
```bash
194-
# Apply Karpenter NodePools (ready to take over)
195-
kubectl apply -f kafka-nodepool.yaml
196-
kubectl apply -f rabbitmq-nodepool.yaml
197-
kubectl apply -f redis-nodepool.yaml
229+
# Commit NodePool manifests to your GitOps repository
230+
git add kafka-nodepool.yaml rabbitmq-nodepool.yaml redis-nodepool.yaml
231+
git commit -m "Add Karpenter NodePools for migration handoff"
232+
git push origin main
198233

199-
# Verify NodePools are ready
234+
# Wait for FluxCD to reconcile
235+
flux reconcile source git flux-system
236+
flux reconcile kustomization karpenter
237+
238+
# Verify NodePools are deployed
200239
kubectl get nodepools -A
201240
# No immediate provisioning will happen unless pods become unschedulable
202241
```
@@ -248,9 +287,39 @@ resource "oci_containerengine_node_pool" "kafka_pool" {
248287
}
249288
```
250289

251-
### **Step 2.3: Monitor Karpenter Response**
290+
### **Step 2.3: Understanding the Node Provisioning Trigger**
291+
292+
**Here's exactly how new Karpenter nodes get provisioned during migration:**
252293

253-
**Karpenter should automatically provision nodes if pods become unschedulable:**
294+
#### **🎯 The Triggering Mechanism**
295+
1. **Terraform Scale-Down**: When Terraform reduces node pool size (e.g., 3→2 nodes)
296+
2. **Node Termination**: OCI terminates one of the existing nodes
297+
3. **Pod Eviction**: Pods on the terminated node are evicted by Kubernetes
298+
4. **Rescheduling**: Kubernetes scheduler tries to reschedule evicted pods
299+
5. **Unschedulable State**: If remaining nodes lack capacity, pods become "Pending"
300+
6. **Karpenter Trigger**: Karpenter detects unschedulable pods and provisions new nodes
301+
7. **New Node**: Karpenter creates a new OCI instance matching NodePool requirements
302+
8. **Pod Scheduling**: Pending pods are scheduled on the new Karpenter-managed node
303+
304+
#### **📋 Real Example**
305+
```bash
306+
# Before: 3 Terraform nodes, 0 Karpenter nodes
307+
kubectl get nodes | grep -E "(kafka-pool|karpenter)"
308+
# kafka-pool-terraform-node-1 Ready <none> 1d v1.28.2
309+
# kafka-pool-terraform-node-2 Ready <none> 1d v1.28.2
310+
# kafka-pool-terraform-node-3 Ready <none> 1d v1.28.2
311+
312+
# Apply Terraform scale-down (3→2)
313+
terraform apply -target=oci_containerengine_node_pool.kafka_pool
314+
315+
# After: 2 Terraform nodes, 1 Karpenter node (automatically provisioned)
316+
kubectl get nodes | grep -E "(kafka-pool|karpenter)"
317+
# kafka-pool-terraform-node-1 Ready <none> 1d v1.28.2
318+
# kafka-pool-terraform-node-2 Ready <none> 1d v1.28.2
319+
# kafka-pool-karpenter-abcd123 Ready <none> 5m v1.28.2 # <- New Karpenter node
320+
```
321+
322+
#### **🔍 Monitor the Process**
254323

255324
```bash
256325
# Monitor Karpenter's response to the Terraform scale-down
@@ -353,6 +422,7 @@ kubectl get pods -A | grep -E "(Pending|Failed)"
353422
Now that migration is complete, enable cost optimization features:
354423

355424
```yaml
425+
# kafka-nodepool-optimized.yaml
356426
# Enable flexible shape selection for cost optimization
357427
apiVersion: karpenter.sh/v1
358428
kind: NodePool
@@ -379,53 +449,36 @@ spec:
379449
consolidateAfter: "30s" # Enable aggressive consolidation
380450
```
381451
382-
```yaml
383-
# kafka-test-replica.yaml - CREATE ADDITIONAL REPLICA FOR TESTING
384-
apiVersion: v1
385-
kind: Pod
386-
metadata:
387-
name: kafka-test-replica
388-
namespace: kafka
389-
labels:
390-
app.kubernetes.io/name: kafka
391-
test-migration: "true"
392-
spec:
393-
# Force scheduling on Karpenter node
394-
nodeSelector:
395-
oci.oraclecloud.com/node-pool: kafka-pool-karpenter
396-
tolerations:
397-
- key: node_pool
398-
value: kafka
399-
effect: NoSchedule
400-
containers:
401-
- name: kafka
402-
image: confluentinc/cp-kafka:latest
403-
resources:
404-
# Match StatefulSet resource requirements exactly
405-
requests:
406-
cpu: "2000m"
407-
memory: "8Gi"
408-
limits:
409-
cpu: "2000m"
410-
memory: "8Gi"
411-
env:
412-
- name: KAFKA_ZOOKEEPER_CONNECT
413-
value: "zookeeper-service:2181"
414-
- name: KAFKA_ADVERTISED_LISTENERS
415-
value: "PLAINTEXT://kafka-test-replica:9092"
452+
```bash
453+
# Update NodePool configurations in GitOps repository
454+
git add kafka-nodepool-optimized.yaml
455+
git commit -m "Enable cost optimization features post-migration"
456+
git push origin main
457+
458+
# Trigger FluxCD reconciliation
459+
flux reconcile source git flux-system
460+
flux reconcile kustomization karpenter
416461
```
417462

463+
### **Step 4.2: Final Validation**
464+
418465
```bash
419-
# Deploy test replica on Karpenter node
420-
kubectl apply -f kafka-test-replica.yaml
466+
# Comprehensive final validation
467+
echo "=== Migration Validation Report ==="
421468

422-
# Monitor scheduling and startup
423-
kubectl get pod kafka-test-replica -n kafka -o wide
424-
kubectl logs kafka-test-replica -n kafka
469+
# 1. Node distribution
470+
echo "Current nodes:"
471+
kubectl get nodes -o custom-columns="NAME:.metadata.name,POOL:.metadata.labels.oci\.oraclecloud\.com/node-pool,STATUS:.status.conditions[?(@.type=='Ready')].status"
425472

426-
# Verify Kafka connectivity from test replica
427-
kubectl exec kafka-test-replica -n kafka -- kafka-topics --bootstrap-server localhost:9092 --list
428-
```
473+
# 2. StatefulSet health
474+
echo -e "\nStatefulSet status:"
475+
kubectl get statefulsets -A -o custom-columns="NAMESPACE:.metadata.namespace,NAME:.metadata.name,READY:.status.readyReplicas,DESIRED:.spec.replicas"
476+
477+
# 3. Critical workload validation
478+
echo -e "\nCritical workload validation:"
479+
kubectl exec -n kafka kafka-cluster-0 -- kafka-topics --bootstrap-server localhost:9092 --list | head -5
480+
kubectl exec -n rabbitmq rabbitmq-cluster-0 -- rabbitmqctl node_health_check
481+
kubectl exec -n redis redis-cluster-0 -- redis-cli info replication
429482

430483
---
431484

0 commit comments

Comments
 (0)