Skip to content

fix(backup): reject incremental backups whose read_ts has regressed#9707

Merged
matthewmcneely merged 2 commits into
mainfrom
matthewmcneely/fix-restore-ts-mismatch-silent-fail
May 27, 2026
Merged

fix(backup): reject incremental backups whose read_ts has regressed#9707
matthewmcneely merged 2 commits into
mainfrom
matthewmcneely/fix-restore-ts-mismatch-silent-fail

Conversation

@matthewmcneely

Copy link
Copy Markdown
Contributor

When Zero state is wiped or rebuilt while an existing backup chain is still active, the cluster's timestamp counter restarts low. Subsequent incremental backups were silently accepted with read_ts values below earlier manifest entries. On restore, the reduce phase keeps only the highest-version KV per key, so newer postings written at the regressed (lower) timestamps were silently dropped — leaving index posting lists missing entries while data postings sometimes survived. Indexed queries like type(X) and eq(field, value) then returned incomplete results.

Refuse the backup at request time when its ReadTs does not advance the chain past the latest manifest, and direct the operator to forceFull to start a new chain. The check is extracted into a pure helper for testability and skips the no-prior-manifest case so first-ever backups still succeed.

Fixes #9706

Checklist

  • The PR title follows the
    Conventional Commits syntax, leading
    with fix:, feat:, chore:, ci:, etc.
  • Code compiles correctly and linting (via trunk) passes locally
  • Tests added for new functionality, or regression tests for bug fixes added as applicable

When Zero state is wiped or rebuilt while an existing backup chain is
still active, the cluster's timestamp counter restarts low. Subsequent
incremental backups were silently accepted with read_ts values below
earlier manifest entries. On restore, the reduce phase keeps only the
highest-version KV per key, so newer postings written at the regressed
(lower) timestamps were silently dropped — leaving index posting lists
missing entries while data postings sometimes survived. Indexed queries
like type(X) and eq(field, value) then returned incomplete results.

Refuse the backup at request time when its ReadTs does not advance the
chain past the latest manifest, and direct the operator to forceFull
to start a new chain. The check is extracted into a pure helper for
testability and skips the no-prior-manifest case so first-ever backups
still succeed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@matthewmcneely matthewmcneely requested a review from a team as a code owner May 20, 2026 19:45
@github-actions github-actions Bot added area/testing Testing related issues area/core internal mechanisms go Pull requests that update Go code labels May 20, 2026
@blacksmith-sh

This comment has been minimized.

The two top-level tests in systest/backup/advanced-scenarios/acl-nonAcl
share one Docker volume at /data/backups/ but run against distinct
alpha/zero pairs (alpha1/zero1 vs alpha4/zero4). The first test leaves
a manifest; the second test's fresh Zero has a low ReadTs that does
not advance the chain, which the new backup-side guard now refuses —
the exact production scenario the guard exists to catch.

Add TakeFullBackup that sets forceFull: true so each test in this
package starts a clean chain. The 127-Namespace and deleted-namespace
tests stay on TakeBackup since their backups chain monotonically off
a single alpha.

Also fix TakeBackup's response decoding: it was doing nested
JsonGet(...).(string) calls that panic with "interface {} is nil"
whenever the mutation returned errors instead of data. Decode into a
typed struct, surface the GraphQL errors via t.Fatalf, and let the
remaining require.Equal report the missing code cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@matthewmcneely matthewmcneely merged commit 153d9ee into main May 27, 2026
26 checks passed
@matthewmcneely matthewmcneely deleted the matthewmcneely/fix-restore-ts-mismatch-silent-fail branch May 27, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core internal mechanisms area/testing Testing related issues go Pull requests that update Go code

Development

Successfully merging this pull request may close these issues.

Restore silently drops index entries when backup read_ts regresses (Zero state loss)

2 participants