|
| 1 | +# RDS/Aurora Refresh for DBLab |
| 2 | + |
| 3 | +Perform full refresh from RDS/Aurora snapshots (logical mode). |
| 4 | + |
| 5 | +## Why? |
| 6 | + |
| 7 | +DBLab logical mode runs `pg_dump` against your database. On large databases, this: |
| 8 | +- **Holds xmin horizon for hours** → bloat accumulation |
| 9 | +- **Creates load on production** |
| 10 | +- **Requires direct network access** to production |
| 11 | + |
| 12 | +This tool dumps from a **temporary RDS clone** instead. Production is never touched. |
| 13 | + |
| 14 | +``` |
| 15 | +Production ──RDS snapshot──► RDS Snapshot ──restore──► RDS Clone ──pg_dump──► DBLab |
| 16 | + (automated) (temporary) |
| 17 | +``` |
| 18 | + |
| 19 | +## Quick Start |
| 20 | + |
| 21 | +```bash |
| 22 | +# 1. Configure |
| 23 | +cat > config.yaml << 'EOF' |
| 24 | +source: |
| 25 | + type: rds # or "aurora-cluster" |
| 26 | + identifier: my-prod-db |
| 27 | + dbName: postgres |
| 28 | + username: postgres |
| 29 | + password: ${DB_PASSWORD} |
| 30 | +
|
| 31 | +clone: |
| 32 | + instanceClass: db.t3.medium |
| 33 | + securityGroups: [sg-xxx] # must allow DBLab inbound |
| 34 | +
|
| 35 | +dblab: |
| 36 | + apiEndpoint: https://dblab:2345 |
| 37 | + token: ${DBLAB_TOKEN} |
| 38 | +
|
| 39 | +aws: |
| 40 | + region: us-east-1 |
| 41 | +EOF |
| 42 | + |
| 43 | +# 2. Test |
| 44 | +docker run --rm \ |
| 45 | + -v $PWD/config.yaml:/config.yaml \ |
| 46 | + -e DB_PASSWORD -e DBLAB_TOKEN -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY \ |
| 47 | + postgresai/rds-refresh -config /config.yaml -dry-run |
| 48 | + |
| 49 | +# 3. Run |
| 50 | +docker run --rm \ |
| 51 | + -v $PWD/config.yaml:/config.yaml \ |
| 52 | + -e DB_PASSWORD -e DBLAB_TOKEN -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY \ |
| 53 | + postgresai/rds-refresh -config /config.yaml |
| 54 | +``` |
| 55 | + |
| 56 | +## Configuration |
| 57 | + |
| 58 | +| Field | Required | Description | |
| 59 | +|-------|----------|-------------| |
| 60 | +| `source.type` | ✓ | `rds` or `aurora-cluster` | |
| 61 | +| `source.identifier` | ✓ | RDS/Aurora identifier | |
| 62 | +| `source.dbName` | ✓ | Database name | |
| 63 | +| `source.username` | ✓ | Database user | |
| 64 | +| `source.password` | ✓ | Password (use `${ENV_VAR}`) | |
| 65 | +| `clone.instanceClass` | ✓ | RDS clone instance type | |
| 66 | +| `clone.securityGroups` | | SGs allowing DBLab access | |
| 67 | +| `clone.subnetGroup` | | DB subnet group | |
| 68 | +| `clone.maxAge` | | Max age before clone is stale (default: 48h) | |
| 69 | +| `dblab.apiEndpoint` | ✓ | DBLab API URL | |
| 70 | +| `dblab.token` | ✓ | DBLab verification token | |
| 71 | +| `dblab.timeout` | | Max refresh wait (default: 4h) | |
| 72 | +| `aws.region` | ✓ | AWS region | |
| 73 | + |
| 74 | +Full example: [config.example.rds_refresh.yaml](../../configs/config.example.rds_refresh.yaml) |
| 75 | + |
| 76 | +## Scheduling |
| 77 | + |
| 78 | +```bash |
| 79 | +# Cron (weekly, Sunday 2 AM) |
| 80 | +0 2 * * 0 docker run --rm -v /etc/dblab/config.yaml:/config.yaml \ |
| 81 | + --env-file /etc/dblab/env postgresai/rds-refresh -config /config.yaml |
| 82 | +``` |
| 83 | + |
| 84 | +<details> |
| 85 | +<summary>Kubernetes CronJob</summary> |
| 86 | + |
| 87 | +```yaml |
| 88 | +apiVersion: batch/v1 |
| 89 | +kind: CronJob |
| 90 | +metadata: |
| 91 | + name: dblab-refresh |
| 92 | +spec: |
| 93 | + schedule: "0 2 * * 0" |
| 94 | + concurrencyPolicy: Forbid |
| 95 | + jobTemplate: |
| 96 | + spec: |
| 97 | + template: |
| 98 | + spec: |
| 99 | + serviceAccountName: dblab-refresh # IRSA |
| 100 | + containers: |
| 101 | + - name: refresh |
| 102 | + image: postgresai/rds-refresh |
| 103 | + args: ["-config", "/config/config.yaml"] |
| 104 | + envFrom: |
| 105 | + - secretRef: |
| 106 | + name: dblab-refresh-secrets |
| 107 | + volumeMounts: |
| 108 | + - name: config |
| 109 | + mountPath: /config |
| 110 | + volumes: |
| 111 | + - name: config |
| 112 | + configMap: |
| 113 | + name: dblab-refresh-config |
| 114 | + restartPolicy: Never |
| 115 | +``` |
| 116 | +</details> |
| 117 | +
|
| 118 | +<details> |
| 119 | +<summary>ECS Scheduled Task</summary> |
| 120 | +
|
| 121 | +```bash |
| 122 | +aws events put-rule --name dblab-refresh --schedule-expression "cron(0 2 ? * SUN *)" |
| 123 | +aws events put-targets --rule dblab-refresh --targets '[{ |
| 124 | + "Id": "1", |
| 125 | + "Arn": "arn:aws:ecs:REGION:ACCOUNT:cluster/CLUSTER", |
| 126 | + "RoleArn": "arn:aws:iam::ACCOUNT:role/ecsEventsRole", |
| 127 | + "EcsParameters": { |
| 128 | + "TaskDefinitionArn": "arn:aws:ecs:REGION:ACCOUNT:task-definition/dblab-refresh", |
| 129 | + "TaskCount": 1, "LaunchType": "FARGATE" |
| 130 | + } |
| 131 | +}]' |
| 132 | +``` |
| 133 | +</details> |
| 134 | + |
| 135 | +## IAM Policy |
| 136 | + |
| 137 | +```json |
| 138 | +{ |
| 139 | + "Version": "2012-10-17", |
| 140 | + "Statement": [ |
| 141 | + { |
| 142 | + "Effect": "Allow", |
| 143 | + "Action": ["rds:DescribeDBSnapshots", "rds:DescribeDBClusterSnapshots", |
| 144 | + "rds:DescribeDBInstances", "rds:DescribeDBClusters"], |
| 145 | + "Resource": "*" |
| 146 | + }, |
| 147 | + { |
| 148 | + "Effect": "Allow", |
| 149 | + "Action": ["rds:RestoreDBInstanceFromDBSnapshot", "rds:RestoreDBClusterFromSnapshot", |
| 150 | + "rds:CreateDBInstance", "rds:DeleteDBInstance", "rds:DeleteDBCluster", |
| 151 | + "rds:AddTagsToResource", "rds:ModifyDBInstance", "rds:ModifyDBCluster"], |
| 152 | + "Resource": ["arn:aws:rds:*:ACCOUNT:db:dblab-refresh-*", |
| 153 | + "arn:aws:rds:*:ACCOUNT:cluster:dblab-refresh-*", |
| 154 | + "arn:aws:rds:*:ACCOUNT:snapshot:*", |
| 155 | + "arn:aws:rds:*:ACCOUNT:cluster-snapshot:*", |
| 156 | + "arn:aws:rds:*:ACCOUNT:subgrp:*", "arn:aws:rds:*:ACCOUNT:pg:*"] |
| 157 | + } |
| 158 | + ] |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +## Network |
| 163 | + |
| 164 | +RDS clone must be reachable from DBLab on port 5432. Same VPC or peered. |
| 165 | + |
| 166 | +## DBLab Setup |
| 167 | + |
| 168 | +Must run in **logical mode**. Tool updates config via API (no SSH needed). |
| 169 | + |
| 170 | +```yaml |
| 171 | +retrieval: |
| 172 | + refresh: |
| 173 | + timetable: "" # disable built-in scheduler |
| 174 | + jobs: [logicalDump, logicalRestore, logicalSnapshot] |
| 175 | + spec: |
| 176 | + logicalDump: |
| 177 | + options: |
| 178 | + source: |
| 179 | + connection: |
| 180 | + host: placeholder # updated by rds-refresh |
| 181 | + port: 5432 |
| 182 | +``` |
| 183 | +
|
| 184 | +## How It Works |
| 185 | +
|
| 186 | +1. **Startup cleanup**: Check for orphaned clones from previous runs |
| 187 | +2. Check DBLab health |
| 188 | +3. Find latest RDS snapshot |
| 189 | +4. Create RDS clone from RDS snapshot (`dblab-refresh-YYYYMMDD-HHMMSS`) |
| 190 | +5. Wait for RDS clone (~15 min) |
| 191 | +6. Update DBLab config via API |
| 192 | +7. Trigger refresh, wait for completion |
| 193 | +8. Delete RDS clone (always, even on error) |
| 194 | + |
| 195 | +## Orphan Protection |
| 196 | + |
| 197 | +The tool has multiple layers of protection against orphaned clones: |
| 198 | + |
| 199 | +1. **Defer cleanup**: Clone is deleted when process exits normally |
| 200 | +2. **Signal handlers**: Catches SIGINT, SIGTERM, SIGHUP (SSH disconnect) |
| 201 | +3. **State file**: Tracks active clone in `./meta/rds-refresh.state` (same directory as DBLab meta files) |
| 202 | +4. **Tag scan**: Finds clones by `ManagedBy=dblab-rds-refresh` tag |
| 203 | + |
| 204 | +### Manual Cleanup |
| 205 | + |
| 206 | +```bash |
| 207 | +# Dry run - see what would be deleted |
| 208 | +rds-refresh cleanup -config config.yaml -dry-run |
| 209 | +
|
| 210 | +# Delete stale clones older than 24 hours |
| 211 | +rds-refresh cleanup -config config.yaml -max-age 24h |
| 212 | +
|
| 213 | +# Run in cron as safety net (weekly) |
| 214 | +0 3 * * 0 rds-refresh cleanup -config /etc/dblab/config.yaml -max-age 48h |
| 215 | +``` |
| 216 | + |
| 217 | +### Best Practice |
| 218 | + |
| 219 | +Run inside `screen` or `tmux` to prevent SSH disconnections from orphaning clones: |
| 220 | + |
| 221 | +```bash |
| 222 | +screen -S dblab-refresh |
| 223 | +rds-refresh -config config.yaml |
| 224 | +# Ctrl+A, D to detach |
| 225 | +``` |
| 226 | + |
| 227 | +## Troubleshooting |
| 228 | + |
| 229 | +| Error | Fix | |
| 230 | +|-------|-----| |
| 231 | +| No snapshots | Enable automated backups on RDS | |
| 232 | +| RDS clone not accessible | Check security group allows 5432 from DBLab | |
| 233 | +| Config update failed | Verify DBLab endpoint and token | |
| 234 | +| Timeout | Increase `dblab.timeout`, check DBLab logs | |
| 235 | + |
| 236 | +## Cost |
| 237 | + |
| 238 | +RDS clone cost only while running (~2-5 hours): |
| 239 | +- db.t3.medium: ~$0.35 |
| 240 | +- db.r5.large: ~$1.20 |
| 241 | + |
| 242 | +## License |
| 243 | + |
| 244 | +Apache 2.0 — [Postgres.ai](https://postgres.ai) |
0 commit comments