[CORE-6913] Cloud Storage Scrubber: Fix false positive for replaced segments#30062
[CORE-6913] Cloud Storage Scrubber: Fix false positive for replaced segments#30062
Conversation
Code ReviewSummary: Clean, well-motivated fix. The root cause analysis is sound — the old filter only checked offset-range existence, which is insufficient when a compacted reupload produces a replacement segment at the same range but with a different identity (encoded in the segment name via Logic correctness: Verified both
Test: The new One minor item: Left an inline comment — Overall this looks good to merge. |
There was a problem hiding this comment.
Pull request overview
Fixes a cloud storage scrubber false-positive where a missing-segment anomaly could be retained after a compacted reupload replaced the segment at the same offset range but with a different remote identity.
Changes:
- Update
partition_manifest::process_anomalies()to treat “same offset range but different remote segment name” as a false positive (filters the anomaly). - Apply the same replacement-aware filtering to both
missing_segmentsandsegment_metadata_anomalies. - Add a unit test reproducing the compacted reupload race and asserting the anomaly is filtered.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/v/cloud_storage/partition_manifest.cc |
Enhances anomaly filtering by comparing generated remote segment names (identity) in addition to offset range matching. |
src/v/cloud_storage/tests/anomalies_detector_test.cc |
Adds a test case covering the compacted reupload replacement scenario to prevent regressions. |
CI test resultstest results on build#82714
|
The scrub false-positive filter in process_anomalies() only checked whether a segment with the same offset range existed in the manifest. A compacted reupload produces a replacement segment at the same offset range but with a different name (different size). When GC deleted the old segment from cloud storage while the scrubber was still referencing a stale manifest, the filter kept the anomaly because the offset range still matched—even though the current segment at that range was a different (replacement) object that existed in cloud storage. Compare generate_remote_segment_name() for the manifest entry and the reported-missing segment so that replacements with the same offset range but different identity are correctly recognized as false positives. Fixes CORE-6913. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
5cfa54c to
00d8e4f
Compare
Code ReviewOverall: This is a clean, well-motivated fix. LGTM with one minor nit. SummaryThe fix correctly addresses the false-positive race between the scrubber and compacted segment reuploads. The old What I verified
TestsBoth new test cases are well-structured and clearly document the race scenario. Good use of Minor nitLeft one inline comment about the doc comment on |
Test for race between scrubber and compacted segment reupload: 1. Create manifest with 3 segments, remove the middle one from cloud storage so the detector reports it missing 2. Replace it in the manifest with a compacted version at the same offset range but different size_bytes 3. Assert generate_remote_segment_name() differs for the original vs compacted segment (v2/v3 names encode size) 4. Call process_anomalies() and assert the anomaly is filtered out as a false positive Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
00d8e4f to
34185af
Compare
|
/backport v26.1.x |
|
/backport v25.3.x |
|
/backport v25.2.x |
The scrub false-positive filter in process_anomalies() only checked whether a segment with the same offset range existed in the manifest. A compacted reupload produces a replacement segment at the same offset range but with a different name (different size). When GC deleted the old segment from cloud storage while the scrubber was still referencing a stale manifest, the filter kept the anomaly because the offset range still matched—even though the current segment at that range was a different (replacement) object that existed in cloud storage.
Compare generate_remote_segment_name() for the manifest entry and the reported-missing segment so that replacements with the same offset range but different identity are correctly recognized as false positives.
Fixes CORE-6913.
Backports Required
Release Notes
Improvements