[rocksdb] Sync recovery SST directory before reused MANIFEST append#14780
Open
xingbowang wants to merge 1 commit into
Open
[rocksdb] Sync recovery SST directory before reused MANIFEST append#14780xingbowang wants to merge 1 commit into
xingbowang wants to merge 1 commit into
Conversation
Summary: When `reuse_manifest_on_open` is active, `DB::Open` can recover WAL data into a new L0 SST and append the corresponding VersionEdit to the already-current MANIFEST. The MANIFEST sync makes that edit durable before `SetCurrentFile` has a chance to fsync the DB directory, since the reused-MANIFEST path does not create a new CURRENT file. If Open later fails and the process crashes, the SST directory entry can be lost while the MANIFEST still references it. The stress test reproduced this by deleting files created after the last directory sync, then reopening into a MANIFEST that referenced missing `001674.sst`. Fix by fsyncing the recovery SST data directory before adding the file to the recovery edit when the existing MANIFEST writer is being reused. Test Plan: make -j128 db_basic_test timeout 60 ./db_basic_test --gtest_filter=DBBasicTest.ReuseManifestOnOpenSyncsRecoverySstDirBeforeManifestAppend --gtest_repeat=5 timeout 60 ./db_basic_test --gtest_filter='DBBasicTest.ReuseManifestOnOpen*' Reverted the production fix and confirmed the new test fails with a missing SST referenced by MANIFEST, then reapplied the fix. make clean COERCE_CONTEXT_SWITCH=1 make -j128 db_basic_test timeout 60 ./db_basic_test --gtest_filter=DBBasicTest.ReuseManifestOnOpenSyncsRecoverySstDirBeforeManifestAppend --gtest_repeat=5 make check-sources git diff --check Reviewers: Subscribers: Tasks: 272584339 Tags:
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106201774. |
✅ clang-tidy: No findings on changed linesCompleted in 329.1s. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 1c0d2f5 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 1c0d2f5 SummaryCorrect and well-targeted fix for a real crash-safety gap in the High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Missing null check on
|
| Context | Affected? | Safe? | Notes |
|---|---|---|---|
| WritePreparedTxnDB | Yes (same recovery) | Yes | Same code path |
| ReadOnly DB | No | N/A | No recovery flush in read-only mode |
| best_efforts_recovery | No | N/A | Mutually exclusive with reuse_manifest_on_open |
| User-defined timestamps | Yes | Yes | Orthogonal to dir sync |
| btrfs filesystem | Yes | Yes | btrfs skips dir fsync for kNewFileSynced intentionally -- btrfs guarantees new file visibility after file fsync without dir fsync. This is a documented RocksDB optimization (io_posix.cc:1959-1963), not a bug. |
| avoid_flush_during_recovery | No | N/A | Recovery SSTs not created |
| Multiple db_paths | Yes | Yes | GetPathId() selects correct dir |
Positive Observations
- Correct fix location: The fsync is placed after SST creation and before
edit->AddFile(), matching the established pattern in FlushJob and CompactionJob. - Minimal and focused scope: Only the necessary code path is modified (reused MANIFEST during recovery).
- Proper condition: Checking both
reuse_manifest_on_openanddescriptor_log_ != nullptrcorrectly identifies exactly the case needing the fix. - Good test design: The regression test uses FaultInjectionTestFS and SyncPoint injection (not sleep), making it deterministic and non-flaky. The test correctly simulates the crash scenario using
DeleteFilesCreatedAfterLastDirSync. - Cold path: This runs only during
DB::Openrecovery, so there is zero performance impact on steady-state operations. - Error propagation is correct: A failed fsync properly prevents
edit->AddFile()from executing (the subsequentif (s.ok() && has_output)block is gated ons).
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
reuse_manifest_on_openreuses the current MANIFEST,DB::Openrecovery can flush WAL data into a new L0 SST and append the correspondingVersionEditto that already-current MANIFEST.Task
Test Plan
CI