@@ -320,6 +320,11 @@ func getBackupLabels(annotations map[string]string) map[string]string {
320320
321321# ## OADP Integration
322322
323+ OADP (OpenShift API for Data Protection) is Red Hat's distribution of Velero for
324+ OpenShift. It requires an S3-compatible object storage backend for backup storage.
325+ See [Certified backup storage providers](https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html/backup_and_restore/oadp-application-backup-and-restore#oadp-certified-backup-storage-providers_about-installing-oadp)
326+ for supported backends (AWS S3, MCG, ODF, Ceph RGW, MinIO, etc.).
327+
323328# ### Split Backup: CRs and PVCs
324329
325330The backup uses two OADP Backup CRs — one for all namespace resources (CRs, Secrets, ConfigMaps, etc.) and one for PVC snapshots :
@@ -1133,35 +1138,208 @@ Use cases:
11331138- Interactive pauses between steps (skippable with `auto_ack=true`)
11341139- See `docs/dev/backup-restore/restore/README.md` for usage
11351140
1136- # ## Phase 4: Golang Restore Controller (Future)
1141+ # ## Phase 4: Golang Backup/Restore Controllers (Future)
1142+
1143+ **Goal**: Replace the Ansible playbooks with Golang controllers that orchestrate
1144+ backup and restore — all driven by CRs. The controllers are **backup-tool-agnostic** :
1145+ they use raw templates from a Secret to create backup/restore CRs for whatever
1146+ tool is configured (OADP/Velero, Kasten, etc.), with no tool-specific imports.
1147+
1148+ # ### Generic Template Approach
1149+
1150+ Each stage references a key in a Secret containing the raw YAML template for the
1151+ backup/restore CR to create. The controller renders the template with variables
1152+ (namespace, timestamp, etc.), creates the `unstructured.Unstructured` object, and
1153+ polls a configurable jsonpath condition for completion.
1154+
1155+ This keeps the controller decoupled from any backup tool — only the Secret
1156+ templates contain tool-specific API references.
11371157
1138- **Goal**: Replace the Ansible restore playbook with a Golang controller that
1139- creates OADP Restore CRs, handles database/RabbitMQ restore, and manages the
1140- staged deployment lifecycle — all driven by a single `OpenStackBackupRestore` CR.
1158+ # ### OpenStackBackup Controller
1159+
1160+ Orchestrates the full backup sequence : trigger Galera DB dumps, then create
1161+ backup CRs from templates for each stage.
11411162
11421163` ` ` yaml
11431164apiVersion: backup.openstack.org/v1beta1
1144- kind: OpenStackBackupRestore
1165+ kind: OpenStackBackup
11451166metadata:
1146- name: restore-20260303
1167+ name: backup-20260320
11471168 namespace: openstack
11481169spec:
1149- backupName: openstack-backup-20260303-120000
1150- automated: true
1170+ stages:
1171+ - name: galera-dumps
1172+ type: GaleraBackup # built-in: triggers jobs from GaleraBackup cronjobs
1173+ timeout: 10m
1174+ - name: pvc-backup
1175+ type: Template
1176+ templateRef:
1177+ name: openstack-backup-templates # Secret name
1178+ key: backup-pvcs # Key within the Secret
1179+ completionCondition:
1180+ jsonpath: '{.status.phase}'
1181+ value: Completed
1182+ timeout: 30m
1183+ - name: resources-backup
1184+ type: Template
1185+ templateRef:
1186+ name: openstack-backup-templates
1187+ key: backup-resources
1188+ completionCondition:
1189+ jsonpath: '{.status.phase}'
1190+ value: Completed
1191+ timeout: 30m
1192+ status:
1193+ phase: InProgress # Pending, InProgress, Completed, Failed
1194+ currentStage: pvc-backup
1195+ conditions:
1196+ - type: GaleraDumpsComplete
1197+ status: "True"
1198+ - type: PvcBackupInProgress
1199+ status: "True"
1200+ ` ` `
1201+
1202+ The Secret contains the tool-specific templates :
1203+
1204+ ` ` ` yaml
1205+ apiVersion: v1
1206+ kind: Secret
1207+ metadata:
1208+ name: openstack-backup-templates
1209+ namespace: openstack
1210+ stringData:
1211+ backup-pvcs: |
1212+ apiVersion: velero.io/v1
1213+ kind: Backup
1214+ metadata:
1215+ name: openstack-backup-pvcs-{{ .Timestamp }}
1216+ namespace: {{ .OADPNamespace }}
1217+ spec:
1218+ includedNamespaces:
1219+ - {{ .Namespace }}
1220+ labelSelector:
1221+ matchLabels:
1222+ backup.openstack.org/backup: "true"
1223+ snapshotMoveData: true
1224+ storageLocation: velero-1
1225+ backup-resources: |
1226+ apiVersion: velero.io/v1
1227+ kind: Backup
1228+ metadata:
1229+ name: openstack-backup-resources-{{ .Timestamp }}
1230+ namespace: {{ .OADPNamespace }}
1231+ spec:
1232+ includedNamespaces:
1233+ - {{ .Namespace }}
1234+ labelSelector:
1235+ matchLabels:
1236+ backup.openstack.org/restore: "true"
1237+ snapshotVolumes: false
1238+ storageLocation: velero-1
1239+ ` ` `
1240+
1241+ # ### OpenStackRestore Controller
1242+
1243+ Orchestrates the full restore sequence using the same template-based approach,
1244+ plus built-in stages for database restore, RabbitMQ credential restore, and
1245+ staged deployment lifecycle.
1246+
1247+ ` ` ` yaml
1248+ apiVersion: backup.openstack.org/v1beta1
1249+ kind: OpenStackRestore
1250+ metadata:
1251+ name: restore-20260320
1252+ namespace: openstack
1253+ spec:
1254+ backupTimestamp: "20260320-110200"
1255+ templateRef:
1256+ name: openstack-restore-templates # Secret with all restore stage templates
11511257 automatedDatabaseRestore: true
11521258 automatedRabbitMQRestore: true
11531259status:
11541260 phase: InProgress # Pending, InProgress, Completed, Failed
1155- currentRestoreOrder: 20
1261+ currentStage: order-20-infra
11561262 conditions:
1157- - type: Order00Complete
1263+ - type: Order00PvcsComplete
11581264 status: "True"
1159- - type: Order10Complete
1265+ - type: Order10FoundationComplete
11601266 status: "True"
1161- - type: Order20InProgress
1267+ - type: Order20InfraInProgress
11621268 status: "True"
11631269` ` `
11641270
1271+ # ### Scheduled Backups
1272+
1273+ The `OpenStackBackup` CR supports an optional `schedule` field for recurring
1274+ backups. When set, the controller creates a CronJob that produces new
1275+ ` OpenStackBackup` instances on schedule.
1276+
1277+ ` ` ` yaml
1278+ apiVersion: backup.openstack.org/v1beta1
1279+ kind: OpenStackBackup
1280+ metadata:
1281+ name: daily-backup
1282+ namespace: openstack
1283+ spec:
1284+ schedule: "0 2 * * *" # daily at 2am
1285+ retention: 720h # auto-cleanup backups older than 30 days
1286+ templateRef:
1287+ name: openstack-backup-templates
1288+ stages:
1289+ - name: galera-dumps
1290+ type: GaleraBackup
1291+ timeout: 10m
1292+ - name: pvc-backup
1293+ type: Template
1294+ templateRef:
1295+ name: openstack-backup-templates
1296+ key: backup-pvcs
1297+ completionCondition:
1298+ jsonpath: '{.status.phase}'
1299+ value: Completed
1300+ timeout: 30m
1301+ - name: resources-backup
1302+ type: Template
1303+ templateRef:
1304+ name: openstack-backup-templates
1305+ key: backup-resources
1306+ completionCondition:
1307+ jsonpath: '{.status.phase}'
1308+ value: Completed
1309+ timeout: 30m
1310+ ` ` `
1311+
1312+ The flow :
1313+
1314+ 1. **Controller sees `schedule` field** → creates a CronJob
1315+ 2. **CronJob fires on schedule** → creates a new `OpenStackBackup` CR
1316+ (e.g., `daily-backup-20260320-020000`) without a `schedule` field
1317+ 3. **Controller sees new `OpenStackBackup` CR** → orchestrates the stages :
1318+ - Triggers Galera dump jobs (built-in `GaleraBackup` type)
1319+ - Renders backup templates from Secret → creates backup tool CRs
1320+ (e.g., Velero `Backup`) as `unstructured.Unstructured` objects
1321+ - Polls `completionCondition` on each created CR until done
1322+ 4. **Controller updates status** on the `OpenStackBackup` CR
1323+ 5. **Retention** : Controller garbage-collects `OpenStackBackup` CRs older
1324+ than `retention` period
1325+
1326+ This follows the same pattern as `GaleraBackup` (which also creates CronJobs
1327+ from a CR spec). Without a `schedule` field, the CR triggers a one-shot backup.
1328+
1329+ # ### Design Principles
1330+
1331+ - **No backup tool imports**: Controller uses `unstructured.Unstructured` to
1332+ create CRs from templates — no Velero/OADP Go dependencies
1333+ - **Single Secret per workflow**: All stage templates in one Secret, referenced
1334+ by key (`templateRef.name` + `templateRef.key`)
1335+ - **Built-in stages**: `GaleraBackup` (trigger DB dump jobs), `GaleraRestore`
1336+ (create restore CRs, exec restore), `RabbitMQRestore` (credential restore)
1337+ are built-in since they use our own CRDs
1338+ - **Template variables**: Controller provides `.Namespace`, `.Timestamp`,
1339+ ` .OADPNamespace` , `.BackupName`, etc. for template rendering
1340+ - **Configurable completion**: Each template stage has a `completionCondition`
1341+ (jsonpath + expected value) so the controller can poll any CR type
1342+
11651343# # Benefits
11661344
11671345# ## Compared to Current Ansible Approach
@@ -1222,6 +1400,7 @@ This means:
122214001. **Restore Order Conflicts** : What if two CRDs have the same restore order?
12231401 - Currently : restored in parallel within the same Velero Restore CR (works fine for independent resources)
12241402
1225- 2. **Phase 4 Restore Controller ** : Should the controller exec into pods for database restore or delegate to Jobs?
1403+ 2. **Phase 4 Controllers ** : Should database restore exec into pods or delegate to Jobs?
12261404 - Current Ansible approach : execs into GaleraRestore pods
12271405 - Controller approach : could create Jobs or use the same exec pattern
1406+ - Template Secret : should a default Secret be created by the operator, or provided by the user?
0 commit comments