Skip to content

feat(ops): add automated PostgreSQL backup script with GCS upload#295

Open
snowfox1003 wants to merge 3 commits into
cppalliance:developfrom
snowfox1003:feat/backup-database-script-gcs
Open

feat(ops): add automated PostgreSQL backup script with GCS upload#295
snowfox1003 wants to merge 3 commits into
cppalliance:developfrom
snowfox1003:feat/backup-database-script-gcs

Conversation

@snowfox1003

@snowfox1003 snowfox1003 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds production VM automation for PostgreSQL backups: scripts/backup_database.sh runs pg_dump -Fc, uploads timestamped dumps (bdc-YYYYMMDD.dump) to a private GCS bucket, and optionally prunes objects older than BACKUP_RETENTION_DAYS (default 7; 0 disables pruning).
Credentials come from .env (DATABASE_URL / DB_*, with host.docker.internal rewritten to 127.0.0.1 for host cron). On success or failure, optional Discord/Slack notifications use existing DISCORD_WEBHOOK_URL / SLACK_WEBHOOK_URL (BACKUP_NOTIFICATIONS to disable).
Documentation covers bucket/IAM, env vars, cron/systemd setup, log directory creation, restore from GCS, and smoke-test steps. Cross-link added in GCP_Production_Checklist.md.

Apps touched

  • (none — ops script and docs only)

Test plan

  • python -m pytest (or scoped: python -m pytest <app>/tests)
  • uv run pyright (if typed code changed)
  • lint-imports (if imports or cross-app coupling changed)
  • App command smoke-tested (if collector/command changed):
bash scripts/backup_database.sh --help
bash scripts/backup_database.sh
echo $?   # expect 0
gcloud storage ls "gs://YOUR_BUCKET/boost-data-collector/staging/bdc-*.dump"
bash scripts/backup_database.sh --list-retention

Docs / coupling

  • cross-app-dependencies.md updated (if FKs or cross-app imports changed)
  • python scripts/generate_service_docs.py run (if services.py or core/protocols.py changed)
  • App README or docs/ updated (if behavior or ops changed)
    • docs/Deployment.md — automated backup runbook, cron, systemd, restore, smoke test
    • docs/GCP_Production_Checklist.md — cross-link
    • scripts/README.md, .env.example

Close #287

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automated PostgreSQL backups to Google Cloud Storage, with timestamped dumps, configurable retention/pruning, and optional local dump cleanup.
    • Enabled optional success/failure notifications via configured webhooks.
  • Documentation

    • Added step-by-step deployment guidance for automated backups and restores, including cron/systemd timer examples and smoke tests.
    • Updated Google Cloud Storage copy instructions to prefer gcloud storage over gsutil.
    • Updated the production checklist and scripts guide to reference the new backup workflow.
  • Chores

    • Expanded .env.example with backup configuration templates.

@snowfox1003 snowfox1003 self-assigned this Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 09586bc8-67b3-43eb-bb68-470676d5c54f

📥 Commits

Reviewing files that changed from the base of the PR and between 9caedbc and 12103e3.

📒 Files selected for processing (1)
  • docs/Deployment.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/Deployment.md

📝 Walkthrough

Walkthrough

Introduces scripts/backup_database.sh, a Bash script that loads env vars, runs pg_dump -Fc, uploads the timestamped dump to GCS, prunes objects older than the configured retention window, and optionally notifies Discord/Slack. Adds matching .env.example template entries and comprehensive deployment, restore, and scheduling documentation.

Changes

Automated PostgreSQL Backup

Layer / File(s) Summary
Config template and script bootstrap/CLI
.env.example, scripts/backup_database.sh
Adds eight commented backup env var entries to .env.example. Implements the script's strict-mode header, timestamped log/error helpers, CLI flag parsing (--env-file, --dry-run, --list-retention, --help), env file loading, and UTC-dated dump filename derivation.
Prerequisite validation and database URI resolution
scripts/backup_database.sh
Validates required env vars (BACKUP_GCS_BUCKET, numeric BACKUP_RETENTION_DAYS), checks external command availability (python3, pg_dump, gcloud), rewrites Docker host references to 127.0.0.1, appends SSL-mode query parameters, creates secure staging directory with mode 0700, and logs the resolved URI with password redacted.
Notification infrastructure and error handling
scripts/backup_database.sh
Defines send_backup_notification using embedded Python to POST structured success/failure messages to Discord and/or Slack webhooks gated by BACKUP_NOTIFICATIONS; registers an EXIT trap that computes success/failure from exit code and BACKUP_LAST_ERROR/BACKUP_SUCCESS_DETAIL, triggers notifications, and rethrows the original exit code.
Retention pruning logic
scripts/backup_database.sh
Implements run_retention: lists GCS bucket objects by prefix via gcloud, parses YYYYMMDD date suffixes using embedded Python to compute cutoff, and either lists deletions, dry-run-reports them, or performs live deletion via gcloud storage rm with error handling.
Core backup execution flow
scripts/backup_database.sh
Exports retention-related variables, handles --list-retention early exit, resolves and logs DB target, ensures staging directory, runs pg_dump with custom format and owner/ACL stripping, uploads to GCS via gcloud storage cp, runs retention in live or dry-run mode, optionally deletes the local dump, and sets BACKUP_SUCCESS_DETAIL with dump size and retention note.
Deployment documentation, checklist, and scripts README
docs/Deployment.md, docs/GCP_Production_Checklist.md, scripts/README.md
Updates GCS copy commands to current bdc/...bdc-YYYYMMDD.dump naming convention. Adds comprehensive "Automated database backups" section covering host prerequisites, GCS/IAM setup, required/optional .env variables, cron and systemd timer scheduling, restore steps using pg_restore, smoke-test checklist, and script flag behaviors. Adds a checklist bullet documenting automated daily backups; adds backup_database.sh row to scripts table.

Sequence Diagram(s)

sequenceDiagram
  participant Cron as cron / systemd timer
  participant Script as backup_database.sh
  participant PostgreSQL
  participant GCS as GCS bucket
  participant Webhook as Discord / Slack webhook

  Cron->>Script: invoke (--env-file .env)
  Script->>Script: load env, derive DUMP_BASENAME
  Script->>Script: validate BACKUP_GCS_BUCKET, BACKUP_RETENTION_DAYS
  Script->>PostgreSQL: pg_dump -Fc → local .dump file
  PostgreSQL-->>Script: compressed dump
  Script->>GCS: gcloud storage cp dump gs://BUCKET/bdc/
  GCS-->>Script: upload OK
  Script->>GCS: list objects (gcloud storage ls)
  GCS-->>Script: object list
  Script->>Script: compute cutoff via embedded Python
  Script->>GCS: delete objects older than BACKUP_RETENTION_DAYS
  Script->>Script: EXIT trap fires (success/failure)
  Script->>Webhook: POST notification (if BACKUP_NOTIFICATIONS=true)
  Webhook-->>Script: 200 OK
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • cppalliance/boost-data-collector#203: Both PRs modify docs/Deployment.md's PostgreSQL dump and restore instructions, with #203 expanding pg_restore guidance that this PR extends with automated backup restore steps.

Suggested reviewers

  • wpak-ai
  • jonathanMLDev

Poem

🐇 Hop hop, the dumps are saved each day,
To GCS buckets far away.
Seven sunsets prune the old,
Discord gets the tale re-told.
With dry-run flags and cron in line,
This rabbit's backups work just fine! 🗄️✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an automated PostgreSQL backup script with GCS upload functionality.
Description check ✅ Passed The PR description comprehensively covers the summary, apps touched, test plan with specific script commands, and docs updates with checkmarks indicating completion.
Linked Issues check ✅ Passed All acceptance criteria from issue #287 are met: backup script with pg_dump, timestamped files, GCS upload, 7-day retention, non-zero exit on failure, documentation, and smoke-test guidance.
Out of Scope Changes check ✅ Passed All changes directly support issue #287 objectives: backup script, documentation, environment variables, and script registry updates. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/Deployment.md`:
- Around line 274-290: The cron schedule in the bdc-db-backup example is
inconsistent with the recommended 03:00 UTC time mentioned in the preceding
text. The cron line showing the backup_database.sh script execution currently
uses 0 23 (23:00 UTC) but should be changed to 0 3 to match the 03:00 UTC
recommendation stated earlier in the document. Update the hour field in the cron
schedule from 23 to 3 to align the example with the documented best practice and
match the systemd timer configuration shown later.

In `@scripts/backup_database.sh`:
- Around line 109-113: The BACKUP_FILE_PREFIX is being used to derive both the
dump basename and construct paths, but when it contains nested segments (e.g.,
prod/bdc/), it creates subdirectories under BACKUP_STAGING_DIR that may not
exist, causing pg_dump to fail. Additionally, if BACKUP_GCS_PREFIX lacks a
trailing slash, concatenation at line 453 produces malformed keys. Fix this by
extracting only the basename (final segment without directory components) from
BACKUP_FILE_PREFIX when constructing DUMP_BASENAME, ensure BACKUP_STAGING_DIR
and any nested subdirectories are created before use, and normalize all path
concatenation operations (at lines 361-362 and 453) by adding trailing slashes
to prefix values before appending the dump filename to guarantee correct path
and key formation.
- Around line 365-368: The `|| true` operator inside the command substitution on
line 365 forces the assignment to always succeed, even when `gcloud storage ls`
fails, which prevents the error check on line 366 from ever triggering. Remove
the `|| true` from inside the command substitution (the one after the gcloud
storage ls call) so that real failures in listing GCS objects at the gcs_glob
path will cause the assignment to fail and properly enter the error handling
block, allowing retention outages to be detected instead of silently succeeding.
- Around line 347-351: The chmod command for setting permissions on the
BACKUP_STAGING_DIR is suppressing errors with 2>/dev/null and continuing on
failure with || true, which allows the script to proceed even if permissions
cannot be enforced on a directory containing sensitive backup data. Remove the
error suppression (2>/dev/null) and the failure handling (|| true) from the
chmod 700 "$BACKUP_STAGING_DIR" command so that permission hardening failures
become hard failures that stop script execution instead of being silently
ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ac07bd1-c296-46fb-8616-68532b51f997

📥 Commits

Reviewing files that changed from the base of the PR and between 74c981c and 41593c1.

📒 Files selected for processing (5)
  • .env.example
  • docs/Deployment.md
  • docs/GCP_Production_Checklist.md
  • scripts/README.md
  • scripts/backup_database.sh

Comment thread docs/Deployment.md Outdated
Comment thread scripts/backup_database.sh Outdated
Comment thread scripts/backup_database.sh
Comment thread scripts/backup_database.sh Outdated
Comment thread docs/Deployment.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automated PostgreSQL Backup: Dump Script, GCS Upload, 7-Day Retention

2 participants