fix: validate pg_dumpall globals dump exit code#479
Open
alwynpan wants to merge 1 commit into
Open
Conversation
backup_pgsql_globals called check_exit_code without the "backup" arg, so a failed globals dump went undetected. Pass "backup" like other sites.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
backup_pgsql_globals()callscheck_exit_codewithout the mandatory first positional argument:check_exit_code "${backup_job_filename}"check_exit_codeswitches on$1and only acts on abackupormoveliteral (there is no default case):With
$1set to the filename, no branch matches and the body is skipped entirely — thepg_dumpall -gexit code captured on the previous line is never inspected.This is the only one of the 13 backup-phase
check_exit_codecall sites missing thebackupliteral; every other site (and themovecheck in this same function at the next line) passes it.Impact
A failed PostgreSQL globals dump (roles/grants/tablespaces) is completely silent:
master_exit_codestays0so the job reports success,cleanup_old_datastill prunes older good global backups (it only skips pruning whenmaster_exit_code == 1).Fix
Pass the
backupliteral so the dump's exit code is evaluated like every other backup call site:Test plan
bash -n install/assets/functions/10-db-backup— syntax OKpg_dumpall -gfailure (e.g. insufficient role privileges) and confirm the job now logs the error, fires a notification, setsmaster_exit_code=1, and skips cleanupValidation
Tested locally against a real PostgreSQL backend: a
postgres:16container with the actual (unmodified)backup_pgsql_globals()andcheck_exit_code()functions — extracted from both the pre-fix commit and this fix — driving a realpg_dumpall -g. Only peripheral orchestration helpers were stubbed;check_exit_codeand the failure path are the real code. A genuine dump failure was triggered via bad credentials (realpassword authentication failedfrompg_dumpall).master_exit_code01[FATAL] ... Backup failed completely)00In the pre-fix failure run the
backupbranch ofcheck_exit_codenever executes — control jumps straight from the dump to themovecheck — which is exactly the defect.