Skip to content

fix: scope S3/minio cleanup to the current job's backup files#478

Open
alwynpan wants to merge 1 commit into
nfrastack:mainfrom
alwynpan:fix/s3-cleanup-job-scoped-deletion
Open

fix: scope S3/minio cleanup to the current job's backup files#478
alwynpan wants to merge 1 commit into
nfrastack:mainfrom
alwynpan:fix/s3-cleanup-job-scoped-deletion

Conversation

@alwynpan
Copy link
Copy Markdown
Contributor

@alwynpan alwynpan commented May 30, 2026

Problem

cleanup_old_data() deletes every object older than the retention window under s3://${bucket}/${s3_path}/ with no filename filter:

run_as_user aws ... s3 ls s3://${backup_job_s3_bucket}/${backup_job_s3_path}/ ... | grep " DIR " -v | grep " PRE " -v | while read -r s3_file; do
    ...
    if [[ $s3_createdate -le $s3_olderthan ]] ; then
        s3_filename=$(echo $s3_file | awk {'print $4'})
        ...
        run_as_user aws ... s3 rm s3://${backup_job_s3_bucket}/${backup_job_s3_path}/${s3_filename} ...
    fi
done

When multiple backup jobs (different databases/hosts) share one bucket + path prefix, or unrelated objects live under it, one job's cleanup deletes the other jobs' backups and any unrelated files older than its own cleanup_time — cross-job data loss and incorrect retention.

The file/filesystem and blobxfer branches already scope deletion to the job's own files via -iname "${backup_job_filename_base}*" (and "${backup_job_global_base}*" for postgres globals). The S3/minio branch had no equivalent.

Fix

Apply the same job-scoping to the S3/minio branch: only remove an object when its name matches this job's filename base (case-insensitive prefix), guarded against an empty base so it can't fall back to matching everything. Non-matching objects are logged at debug level instead of deleted.

if { [ -n "${backup_job_filename_base}" ] && [[ "${s3_filename,,}" == "${backup_job_filename_base,,}"* ]] ; } || \
   { var_true "${_postgres_backup_globals}" && [ -n "${backup_job_global_base}" ] && [[ "${s3_filename,,}" == "${backup_job_global_base,,}"* ]] ; } ; then
    write_log debug "Deleting $s3_filename"
    run_as_user aws ... s3 rm ...
else
    write_log debug "Skipping $s3_filename (does not match this backup job's filename prefix)"
fi

This mirrors the matching behaviour of the existing filesystem/blobxfer branches, so retention semantics stay consistent across all three backup locations.

Test plan

  • bash -n install/assets/functions/10-db-backup — syntax OK
  • Point two backup jobs at the same MinIO/S3 bucket + path prefix and confirm each job's cleanup only removes its own aged backups, leaving the other job's files intact

Validation

Tested locally against a real S3 backend: a MinIO container with the actual (unmodified) cleanup_old_data() function exercised over the live aws s3 ls/rm API. The function was run inside a Linux container (amazon/aws-cli) so the GNU date/aws environment matches production, and the function bodies were extracted from both the pre-fix commit and this fix to confirm the real code paths were exercised. Bucket seeded with two jobs' backups (pgsql_db1_myhost_*, pgsql_db2_myhost_*), a postgres globals file (pgsql_globals_myhost_*), and an unrelated object; retention set so every object is age-eligible, isolating the new prefix filter.

Scenario Deleted Survived Result
Pre-fix, base pgsql_db1_myhost all 5 (incl. other job + unrelated) none bug reproduced
Fixed, base pgsql_db1_myhost, globals off pgsql_db1_myhost_* (2) db2, globals, unrelated
Fixed, base pgsql_db1_myhost, globals on pgsql_db1_myhost_* + globals db2, unrelated ✅ (matches filesystem-branch semantics)
Fixed, empty filename base none all ✅ (empty base does not fall back to match-all)

S3 cleanup deleted every object older than the retention window with no
filename filter, risking deletion of other jobs' backups sharing the
bucket/path. Scope deletion to this job's filename base, matching the
filesystem and blobxfer branches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant