Skip to content

Commit 7348494

Browse files
stuggiclaude
andcommitted
Add automated database restore to playbook for testing
Automate the Galera database restore step in the Ansible playbook to enable fully automated testing without manual intervention. Changes: - Add automated_db_restore parameter (default: true) - Automatically find latest backup in each GaleraRestore pod - Execute restore_galera command with latest timestamp - Support both automated and manual modes - Create helper script: restore-galera-latest.sh Automated mode (default): - Finds latest backup file in /backup/data/ - Extracts timestamp from filename - Executes restore automatically - Continues with deployment when complete Manual mode (with -e automated_db_restore=false): - Prompts user to manually execute restore - Provides instructions and helper script - Waits for confirmation before continuing This enables CI/CD pipelines to run end-to-end restore testing without manual intervention while preserving the option for manual control when needed. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 07b6323 commit 7348494

3 files changed

Lines changed: 323 additions & 29 deletions

File tree

docs/dev/backup-restore-ctlplane.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,7 @@ The playbook follows the correct restore order and prompts for confirmation at c
452452
- `openstack_namespace`: Target namespace (default: `openstack`)
453453
- `backup_file`: Path to backup file (required)
454454
- `skip_rabbitmq_restore`: Skip RabbitMQ user restoration (default: `false`)
455+
- `automated_db_restore`: Automatically restore databases from latest backup (default: `true`)
455456

456457
---
457458

@@ -1568,26 +1569,41 @@ Alternatively, use the provided helper script:
15681569
../scripts/create-galerarestore-crs.sh .
15691570
```
15701571

1571-
**2. Find matching dump files:**
1572+
**2. Find matching dump files and execute restore:**
15721573

15731574
⚠️ **LIMITATION**: Dump file timestamps may not exactly match the control plane backup timestamp. The dump is created when the Galera backup job runs, which is slightly later than when the job was triggered.
15741575

1576+
**Automated Approach (Recommended for Testing/CI):**
1577+
1578+
Use the helper script to automatically find and restore from the latest backup:
1579+
1580+
```bash
1581+
# Restore main galera instance (uses latest backup automatically)
1582+
../scripts/restore-galera-latest.sh openstackrestore
1583+
1584+
# Restore cell1 (uses latest backup automatically)
1585+
../scripts/restore-galera-latest.sh openstackrestorecell1
1586+
1587+
# For additional cells
1588+
../scripts/restore-galera-latest.sh openstackrestorecell2
1589+
```
1590+
1591+
**Manual Approach:**
1592+
1593+
If you need to restore from a specific timestamp (not the latest):
1594+
15751595
```bash
1576-
# List dump files in the restore pod for main instance
1596+
# Step 1: List dump files in the restore pod
15771597
oc exec -n openstack openstack-restore-openstackrestore -- ls -la /backup/data/
15781598

15791599
# Expected output:
15801600
# -rw-rw-r--. 1 mysql mysql 256515 Feb 26 10:13 openstack_backup_2026-02-26_10-12-59.sql.gz
15811601
# -rw-rw-r--. 1 mysql mysql 837 Feb 26 10:13 openstack_backup-grants_2026-02-26_10-12-59.sql.gz
15821602

1583-
# Find the dump file with the closest timestamp to your backup
1603+
# Step 2: Find the dump file with the closest timestamp to your backup
15841604
# Example: If backup is from 2026-02-26_10-12-00, the dump might be 2026-02-26_10-12-59
1585-
```
1586-
1587-
**3. Execute restore:**
15881605

1589-
```bash
1590-
# Restore main galera instance (replace timestamp with your actual dump file timestamp)
1606+
# Step 3: Execute restore (replace timestamp with your actual dump file timestamp)
15911607
oc exec -n openstack openstack-restore-openstackrestore -- \
15921608
/var/lib/backup-scripts/restore_galera --yes /backup/data/*_2026-02-26_10-12-59.sql.gz
15931609

docs/dev/playbooks/restore-openstack-ctlplane.yaml

Lines changed: 128 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
# ansible-playbook restore-openstack-ctlplane.yaml -e backup_file=openstack-ctlplane-backup-20260119-120000.tar.gz
4141
# ansible-playbook restore-openstack-ctlplane.yaml -e backup_file=backup.tar.gz -e openstack_namespace=my-openstack
4242
# ansible-playbook restore-openstack-ctlplane.yaml -e backup_file=backup.tar.gz -e skip_rabbitmq_restore=true
43+
# ansible-playbook restore-openstack-ctlplane.yaml -e backup_file=backup.tar.gz -e automated_db_restore=false # Manual DB restore
4344

4445
- name: Restore OpenStack Control Plane
4546
hosts: localhost
@@ -52,6 +53,7 @@
5253
ansible.builtin.set_fact:
5354
openstack_namespace: "{{ openstack_namespace | default('openstack') }}"
5455
skip_rabbitmq_restore: "{{ skip_rabbitmq_restore | default(false) | bool }}"
56+
automated_db_restore: "{{ automated_db_restore | default(true) | bool }}"
5557

5658
- name: Check if backup_file is provided
5759
ansible.builtin.fail:
@@ -82,6 +84,7 @@
8284
- "Target Namespace: {{ openstack_namespace }}"
8385
- "Backup File: {{ backup_file_abs }}"
8486
- "Skip RabbitMQ Restore: {{ skip_rabbitmq_restore }}"
87+
- "Automated Database Restore: {{ automated_db_restore }}"
8588
- ""
8689
- "NOTE: This playbook assumes cleanup was already done."
8790
- "If not, run cleanup-openstack-ctlplane.yaml first."
@@ -996,43 +999,142 @@
996999
when: galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0 and galerarestore_list.rc == 0
9971000

9981001
# Step 12b: Restore Galera/MariaDB
999-
- name: Print Step 12b header
1002+
- name: Print Step 12b header (automated mode)
10001003
ansible.builtin.debug:
10011004
msg:
10021005
- "----------------------------------------"
1003-
- "Step 12b: Restore Galera/MariaDB Database Contents"
1006+
- "Step 12b: Restore Galera/MariaDB Database Contents (Automated)"
10041007
- "----------------------------------------"
10051008
- ""
10061009
- "CRITICAL: Restore database contents while services are NOT running."
10071010
- "This is only possible because of the staged deployment pause."
10081011
- ""
10091012
- "GaleraRestore CRs have been automatically created."
1010-
- "For each Galera instance, you must now:"
1011-
- " 1. Find the matching dump file in the restore pod (timestamp may not match exactly)"
1012-
- " 2. Execute restore command in the pod"
1013+
- "Now executing automated database restore (uses latest backup from each instance)."
10131014
- ""
1014-
- "Example for main galera instance:"
1015-
- " # List dump files to find closest timestamp to backup"
1016-
- " oc exec -n {{ openstack_namespace }} openstack-restore-openstackrestore -- ls -la /backup/data/"
1015+
- "To disable automation, run with: -e automated_db_restore=false"
1016+
when: automated_db_restore
1017+
1018+
- name: Print Step 12b header (manual mode)
1019+
ansible.builtin.debug:
1020+
msg:
1021+
- "----------------------------------------"
1022+
- "Step 12b: Restore Galera/MariaDB Database Contents (Manual)"
1023+
- "----------------------------------------"
1024+
- ""
1025+
- "CRITICAL: Restore database contents while services are NOT running."
1026+
- "This is only possible because of the staged deployment pause."
1027+
- ""
1028+
- "GaleraRestore CRs have been automatically created."
1029+
- "For each Galera instance, you must now execute the database restore."
1030+
- ""
1031+
- "AUTOMATED APPROACH:"
1032+
- " Use the helper script to automatically restore from latest backup:"
1033+
- " ../scripts/restore-galera-latest.sh openstackrestore"
1034+
- " ../scripts/restore-galera-latest.sh openstackrestorecell1"
1035+
- ""
1036+
- "MANUAL APPROACH:"
1037+
- " If you need to restore from a specific timestamp (not the latest):"
1038+
- ""
1039+
- " 1. List dump files to find the closest timestamp to backup:"
1040+
- " oc exec -n {{ openstack_namespace }} openstack-restore-openstackrestore -- ls -la /backup/data/"
10171041
- ""
1018-
- " # Restore using the matching timestamp file"
1019-
- " oc exec -n {{ openstack_namespace }} openstack-restore-openstackrestore -- \\"
1020-
- " /var/lib/backup-scripts/restore_galera --yes /backup/data/*_YYYY-MM-DD_HH-MM-SS.sql.gz"
1042+
- " 2. Execute restore using the matching timestamp file:"
1043+
- " oc exec -n {{ openstack_namespace }} openstack-restore-openstackrestore -- \\"
1044+
- " /var/lib/backup-scripts/restore_galera --yes /backup/data/*_YYYY-MM-DD_HH-MM-SS.sql.gz"
10211045
- ""
1022-
- "Repeat for cell1 and any additional galeras."
1046+
- " 3. Repeat for cell1 and any additional galeras."
10231047
- ""
10241048
- "LIMITATION: Dump file timestamps may not exactly match the control plane backup timestamp."
1025-
- "You must manually find the dump file with the closest timestamp."
1026-
- "Future enhancement: See docs/dev/README.md#galera-backup-timestamp-tracking"
1049+
- "The dump is created when the backup job runs, slightly later than when triggered."
1050+
when: not automated_db_restore
1051+
1052+
- name: Automated database restore
1053+
ansible.builtin.shell: |
1054+
set -e
1055+
RESTORE_NAME="{{ item }}"
1056+
POD_NAME="openstack-restore-${RESTORE_NAME}"
1057+
1058+
echo "Processing restore: ${RESTORE_NAME}"
1059+
echo "Pod name: ${POD_NAME}"
1060+
1061+
# Check if pod exists and is running
1062+
if ! oc get pod "${POD_NAME}" -n {{ openstack_namespace }} &>/dev/null; then
1063+
echo "ERROR: Restore pod not found: ${POD_NAME}"
1064+
exit 1
1065+
fi
1066+
1067+
POD_PHASE=$(oc get pod "${POD_NAME}" -n {{ openstack_namespace }} -o jsonpath='{.status.phase}')
1068+
if [ "$POD_PHASE" != "Running" ]; then
1069+
echo "ERROR: Restore pod is not running (phase: ${POD_PHASE})"
1070+
exit 1
1071+
fi
1072+
1073+
echo "Pod is running, finding latest backup..."
1074+
1075+
# List backup files (excluding grants)
1076+
BACKUP_FILES=$(oc exec -n {{ openstack_namespace }} "${POD_NAME}" -- ls -1 /backup/data/*_backup_*.sql.gz 2>/dev/null | grep -v grants || true)
10271077
1028-
- name: Confirm database restore completion
1078+
if [ -z "${BACKUP_FILES}" ]; then
1079+
echo "ERROR: No backup files found in /backup/data/"
1080+
exit 1
1081+
fi
1082+
1083+
# Get latest backup
1084+
LATEST_BACKUP=$(echo "${BACKUP_FILES}" | sort | tail -1)
1085+
echo "Latest backup: ${LATEST_BACKUP}"
1086+
1087+
# Extract timestamp from filename
1088+
TIMESTAMP=$(basename "${LATEST_BACKUP}" | sed -E 's/.*_backup_(.*)\.sql\.gz/\1/')
1089+
1090+
if [ -z "${TIMESTAMP}" ]; then
1091+
echo "ERROR: Could not extract timestamp from filename: ${LATEST_BACKUP}"
1092+
exit 1
1093+
fi
1094+
1095+
echo "Extracted timestamp: ${TIMESTAMP}"
1096+
1097+
# Construct restore pattern
1098+
RESTORE_PATTERN="/backup/data/*_${TIMESTAMP}.sql.gz"
1099+
echo "Restore pattern: ${RESTORE_PATTERN}"
1100+
1101+
# Verify files exist
1102+
MATCHED_FILES=$(oc exec -n {{ openstack_namespace }} "${POD_NAME}" -- ls -1 "${RESTORE_PATTERN}" 2>/dev/null || true)
1103+
1104+
if [ -z "${MATCHED_FILES}" ]; then
1105+
echo "ERROR: No files match pattern: ${RESTORE_PATTERN}"
1106+
exit 1
1107+
fi
1108+
1109+
FILE_COUNT=$(echo "${MATCHED_FILES}" | wc -l)
1110+
echo "Found ${FILE_COUNT} file(s) matching pattern"
1111+
1112+
# Execute restore
1113+
echo "Executing database restore..."
1114+
oc exec -n {{ openstack_namespace }} "${POD_NAME}" -- \
1115+
/var/lib/backup-scripts/restore_galera --yes "${RESTORE_PATTERN}"
1116+
1117+
echo "✓ Database restore completed for: ${RESTORE_NAME}"
1118+
args:
1119+
executable: /bin/bash
1120+
loop: "{{ galerabackup_names.stdout_lines }}"
1121+
changed_when: true
1122+
when: automated_db_restore and galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0
1123+
1124+
- name: Print Step 12b completion (automated)
1125+
ansible.builtin.debug:
1126+
msg: "✓ All database restores completed ({{ galerabackup_names.stdout_lines | length }} instance(s))"
1127+
when: automated_db_restore and galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0
1128+
1129+
- name: Confirm database restore completion (manual mode)
10291130
ansible.builtin.pause:
10301131
prompt: |
10311132
10321133
Have you completed database restore? (yes/no)
10331134
register: db_restore_confirm
1135+
when: not automated_db_restore and galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0
10341136

1035-
- name: Warn about missing database restore
1137+
- name: Warn about missing database restore (manual mode)
10361138
ansible.builtin.pause:
10371139
prompt: |
10381140
@@ -1041,19 +1143,24 @@
10411143
10421144
Continue anyway without database restore? (yes/no)
10431145
register: skip_db_confirm
1044-
when: db_restore_confirm.user_input != "yes"
1146+
when: not automated_db_restore and galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0 and db_restore_confirm.user_input != "yes"
10451147

1046-
- name: Fail if database restore not completed
1148+
- name: Fail if database restore not completed (manual mode)
10471149
ansible.builtin.fail:
10481150
msg: |
10491151
Aborting. Restore databases and then resume with:
10501152
oc annotate openstackcontrolplane {{ ctlplane_name }} -n {{ openstack_namespace }} core.openstack.org/deployment-stage-
1051-
when: db_restore_confirm.user_input != "yes" and skip_db_confirm.user_input != "yes"
1153+
when: not automated_db_restore and galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0 and db_restore_confirm.user_input != "yes" and skip_db_confirm.user_input != "yes"
10521154

1053-
- name: Print Step 12b completion
1155+
- name: Print Step 12b completion (manual)
10541156
ansible.builtin.debug:
10551157
msg: "✓ Database restore completed"
1056-
when: db_restore_confirm.user_input == "yes"
1158+
when: not automated_db_restore and galerabackup_backup_file.stat.exists and galerabackup_names.stdout_lines | length > 0 and db_restore_confirm.user_input == "yes"
1159+
1160+
- name: Print Step 12b skip message
1161+
ansible.builtin.debug:
1162+
msg: "No GaleraBackup CRs found, skipping database restore..."
1163+
when: not galerabackup_backup_file.stat.exists or galerabackup_names.stdout_lines | length == 0
10571164

10581165
# Step 13: Restore OVN Database Contents
10591166
- name: Print Step 13 header

0 commit comments

Comments
 (0)