Skip to content
8 changes: 8 additions & 0 deletions percona/controller/pgcluster/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ import (
"github.com/pkg/errors"
batchv1 "k8s.io/api/batch/v1"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/util/retry"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"

"github.com/percona/percona-postgresql-operator/v2/internal/controller/postgrescluster"
"github.com/percona/percona-postgresql-operator/v2/internal/logging"
"github.com/percona/percona-postgresql-operator/v2/internal/naming"
"github.com/percona/percona-postgresql-operator/v2/percona/controller"
Expand Down Expand Up @@ -47,6 +49,12 @@ func (r *PGClusterReconciler) cleanupOutdatedBackups(ctx context.Context, cr *v2
return nil
}

repoCondition := meta.FindStatusCondition(cr.Status.Conditions, postgrescluster.ConditionRepoHostReady)
if repoCondition == nil || repoCondition.Status != metav1.ConditionTrue {
log.Info("pgBackRest repo host not ready, skipping backup cleanup")
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this function is not using repo host, i am confused how this fixes the issue

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original error full stacktrace here :

ERROR   failed to cleanup outdated backups      {"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"cl
uster1","namespace":"pg2502"}, "namespace": "pg2502", "name": "cluster1", "reconcileID": "bf3f58e4-3d58-4112-b6ec-6af38241dcb7", "error": "get pgBackRest info: pgBackRest info command failed with code 99: other", "errorVerb
ose": "pgBackRest info command failed with code 99:

We have

info, err = pgbackrest.GetInfo(ctx, readyPod, repo.Name)

we try to do

which executes pgbackrest info --repo=repo1

If repohost is not ready, we got an error in this case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this assume that repo1 is a PVC and stored in repo host? what happens if repo1 is s3?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we assume that repo1 pvc,
well we can rewrite check a bit if repo1 is pvc and repohost is not ready, let me know if it will be ok for you.
if repo1 is s3, yee, you right, we will wait until repohost is ready (we don't need to wait it) and delete backups on the next iteration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified that RepoHost is only used for pvc. When using S3/Azure repos, pgBackRest connects directly to cloud storage.
Therefore, we should only wait for RepoHost to be ready when the repository type is volume.


for _, repo := range cr.Spec.Backups.PGBackRest.Repos {
var info pgbackrest.InfoOutput

Expand Down
Loading