fix(backup): reject incremental backups whose read_ts has regressed#9707
Merged
matthewmcneely merged 2 commits intoMay 27, 2026
Merged
Conversation
When Zero state is wiped or rebuilt while an existing backup chain is still active, the cluster's timestamp counter restarts low. Subsequent incremental backups were silently accepted with read_ts values below earlier manifest entries. On restore, the reduce phase keeps only the highest-version KV per key, so newer postings written at the regressed (lower) timestamps were silently dropped — leaving index posting lists missing entries while data postings sometimes survived. Indexed queries like type(X) and eq(field, value) then returned incomplete results. Refuse the backup at request time when its ReadTs does not advance the chain past the latest manifest, and direct the operator to forceFull to start a new chain. The check is extracted into a pure helper for testability and skips the no-prior-manifest case so first-ever backups still succeed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
The two top-level tests in systest/backup/advanced-scenarios/acl-nonAcl
share one Docker volume at /data/backups/ but run against distinct
alpha/zero pairs (alpha1/zero1 vs alpha4/zero4). The first test leaves
a manifest; the second test's fresh Zero has a low ReadTs that does
not advance the chain, which the new backup-side guard now refuses —
the exact production scenario the guard exists to catch.
Add TakeFullBackup that sets forceFull: true so each test in this
package starts a clean chain. The 127-Namespace and deleted-namespace
tests stay on TakeBackup since their backups chain monotonically off
a single alpha.
Also fix TakeBackup's response decoding: it was doing nested
JsonGet(...).(string) calls that panic with "interface {} is nil"
whenever the mutation returned errors instead of data. Decode into a
typed struct, surface the GraphQL errors via t.Fatalf, and let the
remaining require.Equal report the missing code cleanly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
shiva-istari
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When Zero state is wiped or rebuilt while an existing backup chain is still active, the cluster's timestamp counter restarts low. Subsequent incremental backups were silently accepted with read_ts values below earlier manifest entries. On restore, the reduce phase keeps only the highest-version KV per key, so newer postings written at the regressed (lower) timestamps were silently dropped — leaving index posting lists missing entries while data postings sometimes survived. Indexed queries like type(X) and eq(field, value) then returned incomplete results.
Refuse the backup at request time when its ReadTs does not advance the chain past the latest manifest, and direct the operator to forceFull to start a new chain. The check is extracted into a pure helper for testability and skips the no-prior-manifest case so first-ever backups still succeed.
Fixes #9706
Checklist
Conventional Commits syntax, leading
with
fix:,feat:,chore:,ci:, etc.