Add async WAL precreation#14738
Conversation
Add an immutable async_wal_precreate DB option that reserves and creates the next WAL on a background thread, then consumes it during WAL rotation when available. Fall back to synchronous WAL creation when precreation is unavailable, wait for pending precreation at rotation, and release unused precreated WAL writers on close without deleting the empty future WAL. Add option plumbing, db_bench/db_stress flags, crash-test coverage, unit tests, docs, and release note.
✅ clang-tidy: No findings on changed linesCompleted in 564.2s. |
a558318 to
63839c6
Compare
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 8c7ad3b ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 8c7ad3b SummaryWell-designed feature that follows established RocksDB async patterns (mirrors High-severity findings (1):
Full review (click to expand)Findings🔴 HIGHH1.
|
| Context | Safe? | Notes |
|---|---|---|
| WritePreparedTxnDB | YES | SwitchMemtable path unchanged; WAL creation independent of txn state |
| ReadOnly DB | See H1 | error_if_wal_file_exists change is the only interaction |
| WAL recycling | N/A | Correctly sanitized to false when recycle > 0 |
| 2PC | YES | WAL precreation orthogonal to 2PC lifecycle |
| User-defined timestamps | YES | UDT doesn't affect WAL file creation |
Disproven claims from initial agents: TOCTOU race (mutex prevents), double file number allocation (state check prevents), file leak on StartWALFile failure (cleanup code exists), ticker ABI breakage (documented as unstable), shutdown race (wait loop + cleanup handles correctly).
Positive Observations
- Clean
CreateWALdecomposition into async-safe I/O + sync metadata - Best-effort fallback ensures correctness is never compromised
- Good test coverage: 7 feature tests + deterministic sync points
- Consistent with existing
AsyncFileOpenStatepattern - Comprehensive option plumbing across C/Java/db_bench/db_stress/crash test
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D105020559. |
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 53186ec ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 53186ec SummaryWell-structured PR that adds async WAL precreation with clean separation of concerns (CreateWALWriter/StartWALFile), proper state machine, synchronous fallback, and comprehensive tests. The main concerns are: (1) a behavioral change to High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1. Statistics enum insertion breaks ticker value stability —
|
| Context | Code executes? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES (inherits SwitchMemtable) | YES (precreated WALs start empty) | Safe |
| ReadOnly DB | NO (no writes) | N/A (recovery change is separate) | H2 above |
| Secondary Instance | PARTIAL (scans WAL dirs) | MOSTLY (empty WALs treated as EOF) | Minor: empty WAL in GetSortedWals |
| User-defined timestamps | YES | YES (UDT is above WAL creation layer) | Safe |
| MemPurge | YES (calls SwitchMemtable) | YES | Safe |
| FIFO/Universal compaction | YES (same flush path) | YES | Safe |
| allow_2pc=true | YES | YES (precreated WALs have no prepared txns) | Safe |
| WAL recycling | NO (sanitized to false) | YES | Safe |
File number safety after crash: MaybeUpdateNextFileNumber() in db_impl_files.cc:1187 scans ALL database paths during recovery and advances next_file_number_ past any existing file, including orphaned precreated WALs. This protects against file number reuse.
WAL preallocation vs GetFileSize: SetPreallocationBlockSize() uses fallocate() on Linux which does NOT change the file's st_size (only allocated extents). GetFileSize() returns st_size, so a preallocated-but-unwritten file correctly reports 0 bytes. This makes the error_if_wal_file_exists size check reliable on Linux. On Windows, SetEndOfFile behavior may differ; this should be verified.
FindObsoleteFiles safety: WAL deletion in FindObsoleteFiles only removes WALs from alive_wal_files_ with numbers BELOW min_log_number. A precreated WAL has a number HIGHER than any active WAL and is therefore safe from this path. The full_scan_candidate_files path also uses min_log_number to determine WAL obsolescence.
Positive Observations
- Clean two-phase split (CreateWALWriter/StartWALFile) that separates I/O from metadata, following single-responsibility principle.
- State machine pattern (kNotScheduled/kScheduled/kReady) is consistent with existing
AsyncFileOpenState. - Best-effort design with synchronous fallback is production-friendly.
- Comprehensive sync-point-based tests covering ready, wait, failure, and recovery paths.
- Statistics counters (hit/miss/waited/wait_micros/failed) provide excellent production observability.
- Proper sanitization against WAL recycling.
- Recovery robustness via
MaybeUpdateNextFileNumberfilesystem scan.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary: - Add experimental immutable `DBOptions::async_wal_precreate` to reserve and open one future WAL on a background HIGH-priority task, with sanitization that disables the optimization when WAL recycling is configured. - Split WAL creation into open/preallocate and start phases so `SwitchMemtable()` can consume a prepared WAL after writing normal WAL metadata, wait for in-flight precreation, fall back to synchronous creation, and delete an unstarted prepared WAL on start failure. - Keep WAL numbering, close, recovery, and read-only open safe for empty future WAL files left by async precreation; `error_if_wal_file_exists=true` now rejects non-empty WALs while tolerating empty WALs. - Add public option plumbing for the C API, options parsing/stringification, random option testing, `db_bench`, `db_stress`, and crash-test configuration. - Add WAL precreate statistics counters plus Java `TickerType`/JNI mappings, and update C++, C, and Java read-only-open documentation for the empty-WAL behavior. - Add focused WAL/option/C/Java tests for async precreate ready/wait/failure/recovery paths, read-only WAL detection, option sanitization, and API plumbing, plus write-flow docs and unreleased history entries for the new feature and behavior change. PR #14738 Reviewed By: pdillinger Differential Revision: D105020559 fbshipit-source-id: 5059b424702e021abb8de65ceeb6d3b975280ffc
Summary
DBOptions::async_wal_precreateto reserve and open one future WAL on a background HIGH-priority task, with sanitization that disables the optimization when WAL recycling is configured.SwitchMemtable()can consume a prepared WAL after writing normal WAL metadata, wait for in-flight precreation, fall back to synchronous creation, and delete an unstarted prepared WAL on start failure.error_if_wal_file_exists=truenow rejects non-empty WALs while tolerating empty WALs.db_bench,db_stress, and crash-test configuration.TickerType/JNI mappings, and update C++, C, and Java read-only-open documentation for the empty-WAL behavior.Testing
git diff --checkJAVA_HOME.