Skip to content

backup: compaction silently corrupts BackupManifest_File EndKeys via slice aliasing #170895

@dt

Description

@dt

Describe the problem

SSTSinkKeyWriter.maybeDoSizeFlush in pkg/backup/backupsink/sst_sink_key_writer.go assigns lastFile.Span.EndKey = newSpan.Key without cloning. newSpan.Key aliases the caller's reused scratch buffer in compactSpanEntry (pkg/backup/compaction_processor.go:469-470). Subsequent WriteKey calls overwrite scratch in place, silently mutating the already-recorded EndKey on the just-shrunk manifest entry. By the time Flush serializes the entry, its EndKey holds whatever value scratch contained when the SST was finalized — typically a key far past the intended split boundary.

The result is multiple BackupManifest_File entries in the same compacted layer pointing at the same physical SST with overlapping [start_key, end_key) spans. Two entries share an EndKey (often differing only by a /0 family suffix that arrives in scratch via the iteration sequence), with the later entry's StartKey falling strictly inside the earlier entry's span.

Sibling Reset paths in the same file clone correctly (lines 164, 181) with a comment at lines 178-180 calling out exactly this aliasing concern. maybeDoSizeFlush was missed.

Counts observed in a production fixture

Counted physical files with overlapping BackupManifest_File entries (same path, overlapping [start_key, end_key)) per layer in the tpcc-5k fixture at gs://cockroach-fixtures-us-east1/roachtest/master/tpcc-5k/20260522-090958.790:

Layer kind Files w/ overlapping entries
Full backup (544 physical files) 0
1-min raw incremental layers 0
Compacted-inc layer (3-min span) 1
Compacted-inc layer (21-min span) 8
Compacted-inc layer (41-min span) 17

Example entry pair on one physical file:

1177601918797905921.sst  →  [/Table/108/1,                /Table/108/1/268/6/1362  ]
                         →  [/Table/108/1/134/6/1952/0,   /Table/108/1/268/6/1362/0]

Reproduction

A direct unit reproduction against SSTSinkKeyWriter:

  1. Open a key-writer sink, override fileSpanByteLimit to a small value (e.g. 8 KiB) so soft-flush fires on tiny data.
  2. Mirror compactSpanEntry's scratch-reuse pattern: a single []byte whose contents are rewritten in place for every WriteKey call.
  3. Write a key, then write enough data to push the accumulated entry size past the soft-flush threshold, then write the splitting key. After this, len(flushedFiles) == 2 and the shrunk entry's EndKey equals the split key.
  4. Write the next key (reusing scratch). The shrunk entry's EndKey mutates in place under it.

Assert that the shrunk entry's EndKey is byte-stable across step 4.

Code references

Fix

Clone the boundary key before assigning, e.g. lastFile.Span.EndKey = newSpan.Key.Clone(). This corrects the writer going forward but does not retroactively repair manifests already produced.

Related: #170225.

Jira issue: CRDB-64224

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.O-agentFiled by an AI agent; usually the result of a human/agent investigation sessionP-0Issues/test failures with a fix SLA of 2 weeksT-disaster-recoverybranch-release-26.2Used to mark GA and release blockers, technical advisories, and bugs for 26.2target-release-26.3.0

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions