K8SPG-1012 stanza timing fix#1575
Open
hors wants to merge 16 commits intocrd-renamefrom
Open
Conversation
During a dataSource bootstrap restore, postgres promotes from TL1 to TL2 and immediately passes 00000002.history to archive_command. pgBackRest's async archiver silently drops the push (error 103) when archive.info does not yet exist — and postgres never retries. Without 00000002.history in the archive, pg_rewind on replicas fails with "could not find common ancestor" after any subsequent PITR restore.
Bumps [github.com/Azure/go-ntlmssp](https://github.com/Azure/go-ntlmssp) from 0.0.0-20221128193559-754e69321358 to 0.1.1. - [Release notes](https://github.com/Azure/go-ntlmssp/releases) - [Commits](https://github.com/Azure/go-ntlmssp/commits/v0.1.1) --- updated-dependencies: - dependency-name: github.com/Azure/go-ntlmssp dependency-version: 0.1.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) from 0.35.4 to 0.36.0. - [Commits](kubernetes/apimachinery@v0.35.4...v0.36.0) --- updated-dependencies: - dependency-name: k8s.io/apimachinery dependency-version: 0.36.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
During a dataSource bootstrap restore, postgres promotes from TL1 to TL2 and immediately passes 00000002.history to archive_command. pgBackRest's async archiver silently drops the push (error 103) when archive.info does not yet exist — and postgres never retries. Without 00000002.history in the archive, pg_rewind on replicas fails with "could not find common ancestor" after any subsequent PITR restore.
egegunes
reviewed
Apr 30, 2026
Comment on lines
+3030
to
+3043
| // Re-push any timeline history files stranded by the async-archiver race: | ||
| // postgres archives 00000002.history during bootstrap promotion before the | ||
| // stanza exists; pgBackRest drops it silently (error 103) and postgres | ||
| // never retries. Without it pg_rewind fails on replicas after PITR. | ||
| log := logging.FromContext(ctx) | ||
| historyOut, historyErr := pgbackrest.Executor(exec).ArchivePushHistoryFiles(ctx) | ||
| if historyErr != nil { | ||
| r.Recorder.Event(postgresCluster, corev1.EventTypeWarning, | ||
| "ArchivePushHistoryFilesFailed", historyErr.Error()) | ||
| log.Error(historyErr, "timeline history file recovery failed", | ||
| "pod", writableInstanceName, "output", historyOut) | ||
| } else if historyOut != "" { | ||
| log.Info("timeline history file recovery", "output", historyOut) | ||
| } |
Contributor
There was a problem hiding this comment.
i understand we need to do this after stanza is created but i wonder if we can do this in the caller of this function reconcilePGBackRest after line 1642 and if configHashMismatch is false
Collaborator
commit: 783a445 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CHANGE DESCRIPTION
Problem:
During a dataSource bootstrap restore, postgres promotes from TL1 to TL2
and immediately passes 00000002.history to archive_command. pgBackRest's
async archiver silently drops the push (error 103) when archive.info does
not yet exist and postgres never retries. Without 00000002.history in
the archive, pg_rewind on replicas fails with "could not find common
ancestor" after any subsequent PITR restore.
CHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability