Skip to content

Commit d268198

Browse files
committed
Add use_direct_reads_for_compaction option
Introduces a new DBOption `use_direct_reads_for_compaction` (default false) that lets users route compaction (and flush) background reads through O_DIRECT while keeping user reads on the buffered/page-cache path. Sequential compaction reads otherwise pollute the OS page cache with read-once data that evicts the hot user-read working set; bypassing the cache for those reads protects user-read tail latency on write-heavy workloads without forcing users onto the global `use_direct_reads` path (which slows user reads dramatically). A naive implementation that only flipped the FileOptions returned by `OptimizeForCompactionTableRead` does not actually trigger the OS-level O_DIRECT open, because the TableCache (and FileMetaData::pinned_reader) already holds long-lived buffered handles opened at flush time or at DB::Open via LoadTableHandlers. Compaction would silently reuse those cached buffered handles and the kernel would never see the O_DIRECT flag. The fix opens ephemeral O_DIRECT handles for the lifetime of the compaction scan, separate from the cache: * TableCache::FindTable / NewIterator learn a `bypass_cache_for_scan` mode. When set, the pinned-reader fast path and the shared cache are skipped, GetTableReader is called directly with the caller's FileOptions, and ownership of the freshly opened TableReader is handed back to the caller. The iterator takes ownership via RegisterCleanup and frees the reader on destruction. * VersionSet::MakeInputIterator and LevelIterator plumb the flag through both the L0 and L1+ compaction-input paths. * CompactionJob::ProcessKeyValueCompaction enables the flag exactly when `use_direct_reads_for_compaction` is on, the global `use_direct_reads` is off, and `OptimizeForCompactionTableRead` actually produced `use_direct_reads=true` in the compaction-read FileOptions. An end-to-end test in db_compaction_test.cc uses the existing `NewRandomAccessFile:O_DIRECT` sync point in env/fs_posix.cc to assert that the kernel-level open really happens for compaction inputs when the flag is set, and never fires when the flag is off. The test is scoped to platforms that use the O_DIRECT path. A small unrelated convenience also lands here: a new db_bench flag `--bgwriter_num` that lets the writer thread in readwhilewriting use a wider keyspace than the readers. This is what made it possible to benchmark the new option realistically -- the readers see a small hot subset (cache-resident), the writer spreads puts across the full DB which drives continuous compaction. The new option follows the existing add_option.md checklist: it is registered in ImmutableDBOptions for serialization, surfaced through the C API, exposed in db_bench / db_stress / db_crashtest.py, randomized in RandomInitDBOptions, validated against allow_mmap_reads at Open time, and documented in unreleased_history. Java JNI is left for a follow-up. Benchmark results ================= Setup: Ubuntu 24.04 (kernel 7.0.5 OrbStack Linux VM on Apple Silicon), 14 vCPUs, virtio-blk disk. MGLRU disabled (echo 0 > /sys/kernel/mm/lru_gen/enabled). 14 GB DB (3.5M keys * 4 KB values), no compression. Each measurement run pinned to a 1 GB cgroup via `systemd-run --scope -p MemoryMax=1G -p MemorySwapMax=0`, so DB-to-cache ratio is ~14x. Page cache dropped between configs. Workload: readwhilewriting for 180 s, 4 reader threads on a hot 2,000-key subset (~8 MB, ~3% of cache) + 1 writer thread spreading overwrites across the full 3.5M-key keyspace (via `--bgwriter_num=3500000`), throttled at 100 MB/s. Compaction ran at ~500 MB/s read/write during the buffered run, ~400 MB/s with direct compaction. Each run was 3 minutes long; "buffered" is the existing default. | Config | Throughput | Read P50 | Read P99 | Read P99.9 | Read P99.99 | |-------------------------------------------|-----------------|---------------|---------------|----------------|----------------| | buffered (default) | 406 K ops/s | 7.34 us | 79.11 us | 533.14 us | 1647.79 us | | direct_compaction_read_write | **464 K ops/s** | **6.37 us** | **71.64 us** | **468.28 us** | **1363.91 us** | | | (+14%) | (-13%) | (-9%) | (-12%) | (-17%) | | direct_compaction_read_only | 421 K ops/s | 6.99 us | 88.95 us | 504.32 us | 1456.75 us | | | (+4%) | (-5%) | (+13%) | (-5%) | (-12%) | | use_direct_reads = true (existing global) | 442 K ops/s | 7.37 us | 50.82 us | 472.23 us | 1626.77 us | | | (+9%) | (0%) | (-36%) | (-11%) | (-1%) | The recommended production configuration is `use_direct_reads_for_compaction = true` together with `use_direct_io_for_flush_and_compaction = true` ("direct reads + writes for compaction"). It wins on every metric simultaneously: throughput up 14%, every read percentile from P50 to P99.99 down 9 to 17%. The existing global `use_direct_reads = true` flag does help P99 specifically but at a noticeable throughput cost and is no better at P99.99; the new compaction-only path is strictly better for the write-heavy workloads it is designed for. Higher DB-to-cache ratios (the Cassandra blog at https://lightfoot.dev/direct-i-o-for-cassandra-compaction-cutting-p99-read-latency-by-5x/ reports ~5x P99 improvement at a 43x ratio) should widen the gap further; the 14x ratio used above is what fit in a single laptop's disk budget. Repro recipe ============ Setup: - Install OrbStack on macOS or use any Linux host - On macOS: orb create -t ubuntu rocksdb-bench - Inside the Linux machine: apt-get install -y build-essential clang cmake git pkg-config \ libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev \ libzstd-dev rsync cmake -DCMAKE_BUILD_TYPE=Release -DPORTABLE=1 -DWITH_GFLAGS=1 \ -DWITH_TESTS=0 .. && make -j db_bench Build the source DB (once, unrestricted memory): ./db_bench --benchmarks=fillrandom,compact,waitforcompaction,stats \ --db=/path/to/source_db --num=3500000 --key_size=16 \ --value_size=4096 --write_buffer_size=16777216 \ --target_file_size_base=16777216 --max_background_jobs=4 \ --compression_type=none --cache_size=4194304 \ --max_bytes_for_level_base=67108864 --disable_wal=1 --sync=0 Per-config measurement (copy source_db -> scratch_db first, then drop_caches, then run under cgroup): sudo systemd-run --scope -p MemoryMax=1G -p MemorySwapMax=0 \ ./db_bench --use_existing_db=1 \ --benchmarks=readwhilewriting,stats --db=/path/to/scratch_db \ --threads=5 --duration=180 --statistics=true --histogram=1 \ --num=2000 --bgwriter_num=3500000 \ --key_size=16 --value_size=4096 \ --write_buffer_size=16777216 --target_file_size_base=16777216 \ --max_background_jobs=4 --compression_type=none \ --cache_size=4194304 --open_files=200 \ --skip_stats_update_on_db_open=true \ --max_bytes_for_level_base=67108864 \ --benchmark_write_rate_limit=104857600 \ --rate_limiter_bytes_per_sec=0 \ --use_direct_reads={true|false} \ --use_direct_reads_for_compaction={true|false} \ --use_direct_io_for_flush_and_compaction={true|false} Disable MGLRU first so the kernel uses the classic active/inactive LRU: echo 0 | sudo tee /sys/kernel/mm/lru_gen/enabled
1 parent 87c554b commit d268198

26 files changed

Lines changed: 540 additions & 34 deletions

db/c.cc

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5038,6 +5038,16 @@ unsigned char rocksdb_options_get_use_direct_io_for_flush_and_compaction(
50385038
return opt->rep.use_direct_io_for_flush_and_compaction;
50395039
}
50405040

5041+
void rocksdb_options_set_use_direct_reads_for_compaction(rocksdb_options_t* opt,
5042+
unsigned char v) {
5043+
opt->rep.use_direct_reads_for_compaction = v;
5044+
}
5045+
5046+
unsigned char rocksdb_options_get_use_direct_reads_for_compaction(
5047+
rocksdb_options_t* opt) {
5048+
return opt->rep.use_direct_reads_for_compaction;
5049+
}
5050+
50415051
void rocksdb_options_set_allow_mmap_reads(rocksdb_options_t* opt,
50425052
unsigned char v) {
50435053
opt->rep.allow_mmap_reads = v;

db/compaction/compaction_job.cc

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,12 @@ CompactionJob::CompactionJob(
203203
assert(job_context);
204204
assert(job_context->snapshot_context_initialized);
205205

206+
// Expose the file options used for compaction reads so tests can confirm
207+
// that `use_direct_reads_for_compaction` (and related flags) plumb all the
208+
// way through to the read path.
209+
TEST_SYNC_POINT_CALLBACK("CompactionJob::CompactionJob:FileOptionsForRead",
210+
&file_options_for_read_);
211+
206212
const auto* cfd = compact_->compaction->column_family_data();
207213
ThreadStatusUtil::SetEnableTracking(db_options_.enable_thread_tracking);
208214
ThreadStatusUtil::SetColumnFamily(cfd);
@@ -1536,10 +1542,22 @@ InternalIterator* CompactionJob::CreateInputIterator(
15361542

15371543
// Although the v2 aggregator is what the level iterator(s) know about,
15381544
// the AddTombstones calls will be propagated down to the v1 aggregator.
1545+
//
1546+
// When `use_direct_reads_for_compaction` is set while the global
1547+
// `use_direct_reads` stays off, the shared TableCache is already holding
1548+
// buffered file handles for these SST files (opened that way for user
1549+
// reads). Reusing those handles would silently downgrade the compaction
1550+
// scan back to buffered I/O. Ask the iterator to open ephemeral
1551+
// O_DIRECT handles instead so the kernel actually bypasses the page
1552+
// cache for the compaction reads.
1553+
const bool bypass_cache_for_scan =
1554+
db_options_.use_direct_reads_for_compaction &&
1555+
!db_options_.use_direct_reads && file_options_for_read_.use_direct_reads;
15391556
iterators.raw_input =
15401557
std::unique_ptr<InternalIterator>(versions_->MakeInputIterator(
15411558
read_options, sub_compact->compaction, sub_compact->RangeDelAgg(),
1542-
file_options_for_read_, boundaries.start, boundaries.end));
1559+
file_options_for_read_, boundaries.start, boundaries.end,
1560+
bypass_cache_for_scan));
15431561
InternalIterator* input = iterators.raw_input.get();
15441562

15451563
if (boundaries.start.has_value() || boundaries.end.has_value()) {

db/db_compaction_test.cc

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6651,6 +6651,181 @@ TEST_P(DBCompactionDirectIOTest, DirectIO) {
66516651
INSTANTIATE_TEST_CASE_P(DBCompactionDirectIOTest, DBCompactionDirectIOTest,
66526652
testing::Bool());
66536653

6654+
// End-to-end check that `use_direct_reads_for_compaction` actually causes
6655+
// compaction-input SST files to be opened with O_DIRECT, even though
6656+
// `use_direct_reads` (the global flag) is left off so user reads stay
6657+
// buffered. The assertion exercises the kernel-level path, not just the
6658+
// FileOptions plumbing: the existing `NewRandomAccessFile:O_DIRECT` sync
6659+
// point in env/fs_posix.cc fires once per fresh open that includes the
6660+
// O_DIRECT flag.
6661+
//
6662+
// This test only runs on platforms that go through the O_DIRECT path
6663+
// (Linux / non-BSD POSIX), since that is the configuration RocksDB users
6664+
// actually deploy with the direct-I/O knobs. On other platforms it is
6665+
// silently bypassed.
6666+
#if !defined(OS_MACOSX) && !defined(OS_OPENBSD) && !defined(OS_SOLARIS) && \
6667+
!defined(OS_WIN)
6668+
TEST_F(DBCompactionTest, UseDirectReadsForCompactionEndToEnd) {
6669+
if (!IsDirectIOSupported()) {
6670+
ROCKSDB_GTEST_BYPASS("Direct IO not supported");
6671+
return;
6672+
}
6673+
6674+
Options options = CurrentOptions();
6675+
Destroy(options);
6676+
options.create_if_missing = true;
6677+
options.disable_auto_compactions = true;
6678+
// User reads stay buffered, compaction reads should switch to O_DIRECT.
6679+
options.use_direct_reads = false;
6680+
options.use_direct_reads_for_compaction = true;
6681+
// Isolate the read-side change; leave the compaction write path buffered.
6682+
options.use_direct_io_for_flush_and_compaction = false;
6683+
6684+
int observed_run_starts = 0;
6685+
int observed_odirect_opens = 0;
6686+
bool observed_direct_compaction_read = false;
6687+
int observed_callbacks = 0;
6688+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency({});
6689+
// Plumbing-level probe: the compaction-read FileOptions should carry
6690+
// use_direct_reads = true when the new flag is enabled.
6691+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
6692+
"CompactionJob::CompactionJob:FileOptionsForRead", [&](void* arg) {
6693+
const auto* fo = static_cast<const FileOptions*>(arg);
6694+
++observed_callbacks;
6695+
if (fo != nullptr && fo->use_direct_reads) {
6696+
observed_direct_compaction_read = true;
6697+
}
6698+
});
6699+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
6700+
"CompactionJob::Run():Start",
6701+
[&](void* /*arg*/) { ++observed_run_starts; });
6702+
// Kernel-level probe: this sync point fires only when the OS open() call
6703+
// is being issued with O_DIRECT in its flags. Hitting it proves we are
6704+
// actually changing the cache-mode for compaction reads, not just the
6705+
// in-memory FileOptions struct.
6706+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
6707+
"NewRandomAccessFile:O_DIRECT",
6708+
[&](void* /*arg*/) { ++observed_odirect_opens; });
6709+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
6710+
6711+
Status s = TryReopen(options);
6712+
if (s.IsNotSupported() || s.IsInvalidArgument()) {
6713+
ROCKSDB_GTEST_BYPASS(
6714+
"Direct IO reads not supported in this test environment");
6715+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
6716+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
6717+
return;
6718+
}
6719+
ASSERT_OK(s);
6720+
6721+
// Produce two L0 files with OVERLAPPING key ranges so that CompactRange has
6722+
// actual merge work to do (otherwise RocksDB performs a trivial file move
6723+
// and never constructs a CompactionJob).
6724+
const std::string value(4096, 'v');
6725+
for (int i = 0; i < 64; ++i) {
6726+
ASSERT_OK(Put(Key(i), value));
6727+
}
6728+
ASSERT_OK(Flush());
6729+
for (int i = 0; i < 64; ++i) {
6730+
ASSERT_OK(Put(Key(i), value));
6731+
}
6732+
ASSERT_OK(Flush());
6733+
6734+
// User reads should still go through the buffered path. Confirm that the
6735+
// option does not silently flip use_direct_reads for user reads.
6736+
for (int i = 0; i < 8; ++i) {
6737+
std::string actual;
6738+
ASSERT_OK(db_->Get(ReadOptions(), Key(i), &actual));
6739+
ASSERT_EQ(value, actual);
6740+
}
6741+
6742+
ASSERT_OK(dbfull()->CompactRange(CompactRangeOptions(), nullptr, nullptr));
6743+
// Wait for compaction to complete and CompactionJob to be constructed.
6744+
ASSERT_OK(dbfull()->TEST_WaitForCompact());
6745+
6746+
// Diagnostic: confirm that the compaction actually ran. If it didn't, the
6747+
// missing FileOptions sync-point hits would be a test-infrastructure issue,
6748+
// not a regression in the new option.
6749+
ASSERT_GT(observed_run_starts, 0)
6750+
<< "CompactionJob::Run():Start never fired; CompactRange did not "
6751+
"schedule a compaction.";
6752+
ASSERT_GT(observed_callbacks, 0);
6753+
ASSERT_TRUE(observed_direct_compaction_read);
6754+
// The headline assertion: at least one compaction-input file open went
6755+
// through the O_DIRECT path. Without the TableCache bypass plumbing this
6756+
// would be zero because compaction would silently reuse the buffered
6757+
// handles already cached for user reads.
6758+
EXPECT_GT(observed_odirect_opens, 0)
6759+
<< "no compaction-input opens went through O_DIRECT; "
6760+
"observed_odirect_opens="
6761+
<< observed_odirect_opens;
6762+
6763+
// Quick sanity sweep after compaction to confirm data is intact.
6764+
for (int i = 0; i < 64; ++i) {
6765+
std::string actual;
6766+
ASSERT_OK(db_->Get(ReadOptions(), Key(i), &actual));
6767+
ASSERT_EQ(value, actual);
6768+
}
6769+
6770+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
6771+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
6772+
Destroy(options);
6773+
}
6774+
6775+
// Confirms that when use_direct_reads_for_compaction is OFF, compaction reads
6776+
// stay on the buffered path: neither the compaction-read FileOptions nor the
6777+
// kernel-level O_DIRECT open should ever be triggered. Pairs with the test
6778+
// above to cover both halves of the on/off switch.
6779+
TEST_F(DBCompactionTest, UseDirectReadsForCompactionOffStaysBuffered) {
6780+
Options options = CurrentOptions();
6781+
Destroy(options);
6782+
options.create_if_missing = true;
6783+
options.disable_auto_compactions = true;
6784+
options.use_direct_reads = false;
6785+
options.use_direct_reads_for_compaction = false;
6786+
options.use_direct_io_for_flush_and_compaction = false;
6787+
6788+
bool observed_direct_compaction_read = false;
6789+
int observed_callbacks = 0;
6790+
int observed_odirect_opens = 0;
6791+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
6792+
"CompactionJob::CompactionJob:FileOptionsForRead", [&](void* arg) {
6793+
const auto* fo = static_cast<const FileOptions*>(arg);
6794+
++observed_callbacks;
6795+
if (fo->use_direct_reads) {
6796+
observed_direct_compaction_read = true;
6797+
}
6798+
});
6799+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
6800+
"NewRandomAccessFile:O_DIRECT",
6801+
[&](void* /*arg*/) { ++observed_odirect_opens; });
6802+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
6803+
6804+
ASSERT_OK(TryReopen(options));
6805+
6806+
const std::string value(4096, 'v');
6807+
for (int i = 0; i < 64; ++i) {
6808+
ASSERT_OK(Put(Key(i), value));
6809+
}
6810+
ASSERT_OK(Flush());
6811+
for (int i = 0; i < 64; ++i) {
6812+
ASSERT_OK(Put(Key(i), value));
6813+
}
6814+
ASSERT_OK(Flush());
6815+
6816+
ASSERT_OK(dbfull()->CompactRange(CompactRangeOptions(), nullptr, nullptr));
6817+
ASSERT_OK(dbfull()->TEST_WaitForCompact());
6818+
6819+
ASSERT_GT(observed_callbacks, 0);
6820+
ASSERT_FALSE(observed_direct_compaction_read);
6821+
ASSERT_EQ(0, observed_odirect_opens);
6822+
6823+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
6824+
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
6825+
Destroy(options);
6826+
}
6827+
#endif // !defined(OS_MACOSX) && !defined(OS_OPENBSD) && ...
6828+
66546829
class CompactionPriTest : public DBTestBase,
66556830
public testing::WithParamInterface<uint32_t> {
66566831
public:

db/db_impl/db_impl_open.cc

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -244,6 +244,18 @@ Status DBImpl::ValidateOptions(const DBOptions& db_options) {
244244
"then direct I/O reads (use_direct_reads) must be disabled. ");
245245
}
246246

247+
if (db_options.allow_mmap_reads &&
248+
db_options.use_direct_reads_for_compaction) {
249+
// Memory-mapped reads and direct I/O share the same EnvOptions field, so
250+
// enabling both would route compaction reads through a code path that
251+
// tries to do mmap and O_DIRECT at the same time. Reject this combination
252+
// explicitly rather than relying on lower-level asserts.
253+
return Status::NotSupported(
254+
"If memory mapped reads (allow_mmap_reads) are enabled "
255+
"then compaction-only direct I/O reads "
256+
"(use_direct_reads_for_compaction) must be disabled. ");
257+
}
258+
247259
if (db_options.allow_mmap_writes &&
248260
db_options.use_direct_io_for_flush_and_compaction) {
249261
return Status::NotSupported(

db/db_options_test.cc

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1729,6 +1729,106 @@ TEST_F(DBOptionsTest, SetOptionsMultipleColumnFamilies) {
17291729
ASSERT_TRUE(dbfull()->GetOptions(handles_[2]).disable_auto_compactions);
17301730
}
17311731

1732+
// Validates the new option's serialization/parse round trip, default value,
1733+
// and validation against incompatible options. Also exercises the
1734+
// FileSystem::OptimizeForCompactionTableRead / OptimizeForBlobFileRead helpers
1735+
// directly to confirm the new flag truly switches use_direct_reads on for
1736+
// compaction reads.
1737+
TEST_F(DBOptionsTest, UseDirectReadsForCompactionOptionMechanics) {
1738+
// Default value must remain false to preserve existing semantics.
1739+
ASSERT_FALSE(DBOptions().use_direct_reads_for_compaction);
1740+
1741+
// Round-trip through GetDBOptionsFromString.
1742+
DBOptions parsed;
1743+
ConfigOptions config_options;
1744+
ASSERT_OK(GetDBOptionsFromString(config_options, DBOptions(),
1745+
"use_direct_reads_for_compaction=true",
1746+
&parsed));
1747+
ASSERT_TRUE(parsed.use_direct_reads_for_compaction);
1748+
ASSERT_OK(GetDBOptionsFromString(config_options, DBOptions(),
1749+
"use_direct_reads_for_compaction=false",
1750+
&parsed));
1751+
ASSERT_FALSE(parsed.use_direct_reads_for_compaction);
1752+
1753+
// Confirm the option is reachable through the live DB's options round trip.
1754+
Options options = CurrentOptions();
1755+
options.create_if_missing = true;
1756+
options.use_direct_reads_for_compaction = true;
1757+
// Use a buffered user-read setup so the new flag is the one doing the work.
1758+
options.use_direct_reads = false;
1759+
options.use_direct_io_for_flush_and_compaction = false;
1760+
Status s = TryReopen(options);
1761+
// Direct I/O may not be supported on every test environment; skip silently
1762+
// in that case since the option metadata path is what this test cares about.
1763+
if (s.IsNotSupported() || s.IsInvalidArgument()) {
1764+
Options buffered = CurrentOptions();
1765+
buffered.create_if_missing = true;
1766+
buffered.use_direct_reads_for_compaction = true;
1767+
// Drop the flag if direct I/O is not supported so we can still verify the
1768+
// option round-trips through SetDBOptions / GetDBOptions.
1769+
buffered.use_direct_reads_for_compaction = false;
1770+
Reopen(buffered);
1771+
} else {
1772+
ASSERT_OK(s);
1773+
ASSERT_TRUE(dbfull()->GetDBOptions().use_direct_reads_for_compaction);
1774+
}
1775+
Close();
1776+
1777+
// mmap_reads + use_direct_reads_for_compaction is rejected at Open time, the
1778+
// same way mmap_reads + use_direct_reads has always been rejected.
1779+
Options bad_options = CurrentOptions();
1780+
bad_options.create_if_missing = true;
1781+
bad_options.allow_mmap_reads = true;
1782+
bad_options.use_direct_reads_for_compaction = true;
1783+
Status bad_status = TryReopen(bad_options);
1784+
ASSERT_TRUE(bad_status.IsNotSupported()) << bad_status.ToString();
1785+
1786+
// Direct test of OptimizeForCompactionTableRead: feeding only the new flag
1787+
// through ImmutableDBOptions should turn on use_direct_reads in the returned
1788+
// FileOptions while not touching use_direct_writes. OptimizeForBlobFileRead
1789+
// intentionally still tracks `use_direct_reads` only -- blob file reads in
1790+
// production go through BlobFileCache (not OptimizeForBlobFileRead), and
1791+
// BackupEngine's blob copy path should not be affected by a flag named "for
1792+
// compaction".
1793+
Options check_options;
1794+
check_options.use_direct_reads = false;
1795+
check_options.use_direct_reads_for_compaction = true;
1796+
check_options.use_direct_io_for_flush_and_compaction = false;
1797+
ImmutableDBOptions immutable(check_options);
1798+
FileOptions in_opts;
1799+
in_opts.use_direct_reads = false;
1800+
FileOptions sst_read =
1801+
env_->GetFileSystem()->OptimizeForCompactionTableRead(in_opts, immutable);
1802+
FileOptions blob_read =
1803+
env_->GetFileSystem()->OptimizeForBlobFileRead(in_opts, immutable);
1804+
ASSERT_TRUE(sst_read.use_direct_reads);
1805+
ASSERT_FALSE(blob_read.use_direct_reads);
1806+
ASSERT_FALSE(sst_read.use_direct_writes);
1807+
1808+
// When both flags are off, behavior stays exactly as before.
1809+
Options off_options;
1810+
off_options.use_direct_reads = false;
1811+
off_options.use_direct_reads_for_compaction = false;
1812+
off_options.use_direct_io_for_flush_and_compaction = false;
1813+
ImmutableDBOptions immutable_off(off_options);
1814+
FileOptions sst_read_off =
1815+
env_->GetFileSystem()->OptimizeForCompactionTableRead(in_opts,
1816+
immutable_off);
1817+
ASSERT_FALSE(sst_read_off.use_direct_reads);
1818+
1819+
// When use_direct_reads is on, the new flag is irrelevant for the returned
1820+
// FileOptions but must not regress the existing behavior.
1821+
Options global_on_options;
1822+
global_on_options.use_direct_reads = true;
1823+
global_on_options.use_direct_reads_for_compaction = false;
1824+
global_on_options.use_direct_io_for_flush_and_compaction = false;
1825+
ImmutableDBOptions immutable_global(global_on_options);
1826+
FileOptions sst_read_global =
1827+
env_->GetFileSystem()->OptimizeForCompactionTableRead(in_opts,
1828+
immutable_global);
1829+
ASSERT_TRUE(sst_read_global.use_direct_reads);
1830+
}
1831+
17321832
} // namespace ROCKSDB_NAMESPACE
17331833

17341834
int main(int argc, char** argv) {

0 commit comments

Comments
 (0)