Skip to content

Commit 9f47518

Browse files
joshkang97meta-codesync[bot]
authored andcommitted
Add interpolation search as an alternative to binary search (facebook#14247)
Summary: Interpolation search is an alternative algorithm to binary search, which performs better on uniformly distributed keys. Instead of binary search always computing the mid point of the left and right boundaries, interpolation search "interpolates" the mid point based on the distance to the target. Fortunately, we can re-use existing block format to support interpolation search. For a given block, we compute the shared_prefix length of the first and last key. Interpolation search is usually done with numerical target values, so for a variable binary length key, we calculate the "value" as the first 8 non-shared bytes. This also means interpolation search would only really be effective for bytewise comparator (guarded via options validations). #### Fallback to binary search - if the the val(left_key) == val(right_key) then we fallback to classic binary search (to avoid divide by 0) - interpolation search is significantly more computationally expensive than binary search, so when the search distance is small, we also fallback to binary search. - if interpolation search does not make significant progress (i.e. reduces search space by more than half each iteration), we can assume data is non-uniform and fallback. Interpolation search also performs best when there is minimal shortening, especially shortening of the last block, as it can heavily skew the distribution of the actual keys. Note that each search algorithm is guaranteed to make progress because at each iteration the search space is guaranteed to be reduce by at least 1. For now this change only applies to index block seeks, as data block seeks and other blocks do not have as many entries and would not require significant number of search rounds, but it could be easily extended to include that support. Pull Request resolved: facebook#14247 Test Plan: Updated unit tests and crash test with new search option ### Benchmark The default benchmark sets up keys in generally uniform distribution, so it was a good way to test performance improvements. Setup: `./db_bench -benchmarks=fillseq,compact -index_shortening_mode=1` #### Before this change ``` ./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 readrandom : 2.899 micros/op 344973 ops/sec 2.899 seconds 1000000 operations; 38.2 MB/s (1000000 of 1000000 found) ``` #### After this change Notice how key comparison counts are the same between the two. ``` ./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=binary_search readrandom : 2.881 micros/op 347128 ops/sec 2.881 seconds 1000000 operations; 38.4 MB/s (1000000 of 1000000 found) ``` ``` ./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=interpolation_search readrandom : 2.609 micros/op 383209 ops/sec 2.610 seconds 1000000 operations; 42.4 MB/s (1000000 of 1000000 found) ``` With a non-uniform distribution, `i.e. index_shortening_mode=2` ``` ./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=binary_search readrandom : 2.958 micros/op 338075 ops/sec 2.958 seconds 1000000 operations; 37.4 MB/s (1000000 of 1000000 found) ``` ``` ./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=interpolation_search readrandom : 5.502 micros/op 181750 ops/sec 5.502 seconds 1000000 operations; 20.1 MB/s (1000000 of 1000000 found) ``` Reviewed By: pdillinger Differential Revision: D91063163 Pulled By: joshkang97 fbshipit-source-id: 151d6aa76f8713740b714de6e406aff40d28ccbc
1 parent 871f79d commit 9f47518

23 files changed

Lines changed: 651 additions & 93 deletions

db/c.cc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3757,6 +3757,12 @@ void rocksdb_block_based_options_set_data_block_index_type(
37573757
static_cast<BlockBasedTableOptions::DataBlockIndexType>(v);
37583758
}
37593759

3760+
void rocksdb_block_based_options_set_index_block_search_type(
3761+
rocksdb_block_based_table_options_t* options, int v) {
3762+
options->rep.index_block_search_type =
3763+
static_cast<BlockBasedTableOptions::BlockSearchType>(v);
3764+
}
3765+
37603766
void rocksdb_block_based_options_set_data_block_hash_ratio(
37613767
rocksdb_block_based_table_options_t* options, double v) {
37623768
options->rep.data_block_hash_table_util_ratio = v;

db_stress_tool/db_stress_common.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,7 @@ DECLARE_uint32(sqfc_version);
175175
DECLARE_bool(use_sqfc_for_range_queries);
176176
DECLARE_int32(index_type);
177177
DECLARE_int32(data_block_index_type);
178+
DECLARE_int32(index_block_search_type);
178179
DECLARE_string(db);
179180
DECLARE_string(secondaries_base);
180181
DECLARE_bool(test_secondary);

db_stress_tool/db_stress_gflags.cc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -596,6 +596,12 @@ DEFINE_int32(
596596
ROCKSDB_NAMESPACE::BlockBasedTableOptions().data_block_index_type),
597597
"Index type for data blocks (see `enum DataBlockIndexType` in table.h)");
598598

599+
DEFINE_int32(index_block_search_type,
600+
static_cast<int32_t>(ROCKSDB_NAMESPACE::BlockBasedTableOptions()
601+
.index_block_search_type),
602+
"Search algorithm for index blocks (see `enum BlockSearchType` in "
603+
"table.h)");
604+
599605
DEFINE_string(db, "", "Use the db with the following name.");
600606

601607
DEFINE_string(secondaries_base, "",

db_stress_tool/db_stress_test_base.cc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4325,6 +4325,9 @@ void InitializeOptionsFromFlags(
43254325
block_based_options.data_block_index_type =
43264326
static_cast<BlockBasedTableOptions::DataBlockIndexType>(
43274327
FLAGS_data_block_index_type);
4328+
block_based_options.index_block_search_type =
4329+
static_cast<BlockBasedTableOptions::BlockSearchType>(
4330+
FLAGS_index_block_search_type);
43284331
block_based_options.prepopulate_block_cache =
43294332
static_cast<BlockBasedTableOptions::PrepopulateBlockCache>(
43304333
FLAGS_prepopulate_block_cache);

include/rocksdb/c.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1209,6 +1209,13 @@ enum {
12091209
extern ROCKSDB_LIBRARY_API void
12101210
rocksdb_block_based_options_set_data_block_index_type(
12111211
rocksdb_block_based_table_options_t*, int); // uses one of the above enums
1212+
enum {
1213+
rocksdb_block_based_table_index_block_search_type_binary = 0,
1214+
rocksdb_block_based_table_index_block_search_type_interpolation = 1,
1215+
};
1216+
extern ROCKSDB_LIBRARY_API void
1217+
rocksdb_block_based_options_set_index_block_search_type(
1218+
rocksdb_block_based_table_options_t*, int); // uses one of the above enums
12121219
extern ROCKSDB_LIBRARY_API void
12131220
rocksdb_block_based_options_set_data_block_hash_ratio(
12141221
rocksdb_block_based_table_options_t* options, double v);

include/rocksdb/table.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,21 @@ struct BlockBasedTableOptions {
263263

264264
IndexType index_type = kBinarySearch;
265265

266+
// The search algorithm used when seeking to entries in the index block.
267+
enum BlockSearchType : char {
268+
// Standard binary search
269+
kBinary = 0x00,
270+
// Interpolation search, which may be better suited for uniformly
271+
// distributed keys. This will only be applicable if the comparator is the
272+
// byte-wise comparator. Avoid using
273+
// IndexShorteningMode::kShortenSeparatorsAndSuccessor as shortening the
274+
// succesor can skew the end key and make interpolation search significantly
275+
// less performant.
276+
kInterpolation = 0x01,
277+
};
278+
279+
BlockSearchType index_block_search_type = kBinary;
280+
266281
// The index type that will be used for the data block.
267282
enum DataBlockIndexType : char {
268283
kDataBlockBinarySearch = 0, // traditional block type

java/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,7 @@ set(JAVA_MAIN_CLASSES
182182
src/main/java/org/rocksdb/HyperClockCache.java
183183
src/main/java/org/rocksdb/ImportColumnFamilyOptions.java
184184
src/main/java/org/rocksdb/IndexShorteningMode.java
185+
src/main/java/org/rocksdb/IndexSearchType.java
185186
src/main/java/org/rocksdb/IndexType.java
186187
src/main/java/org/rocksdb/InfoLogLevel.java
187188
src/main/java/org/rocksdb/IngestExternalFileOptions.java

java/rocksjni/portal.h

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7016,6 +7016,44 @@ class DataBlockIndexTypeJni {
70167016
}
70177017
};
70187018

7019+
// The portal class for org.rocksdb.IndexSearchType
7020+
class IndexSearchTypeJni {
7021+
public:
7022+
// Returns the equivalent org.rocksdb.IndexSearchType for the provided
7023+
// C++ ROCKSDB_NAMESPACE::BlockSearchType enum
7024+
static jbyte toJavaIndexSearchType(
7025+
const ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType&
7026+
index_block_search_type) {
7027+
switch (index_block_search_type) {
7028+
case ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType::kBinary:
7029+
return 0x0;
7030+
case ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType::
7031+
kInterpolation:
7032+
return 0x1;
7033+
default:
7034+
return 0x7F; // undefined
7035+
}
7036+
}
7037+
7038+
// Returns the equivalent C++ ROCKSDB_NAMESPACE::BlockSearchType enum for
7039+
// the provided Java org.rocksdb.IndexSearchType
7040+
static ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType
7041+
toCppIndexSearchType(jbyte jindex_search_type) {
7042+
switch (jindex_search_type) {
7043+
case 0x0:
7044+
return ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType::
7045+
kBinary;
7046+
case 0x1:
7047+
return ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType::
7048+
kInterpolation;
7049+
default:
7050+
// undefined/default
7051+
return ROCKSDB_NAMESPACE::BlockBasedTableOptions::BlockSearchType::
7052+
kBinary;
7053+
}
7054+
}
7055+
};
7056+
70197057
// The portal class for org.rocksdb.ChecksumType
70207058
class ChecksumTypeJni {
70217059
public:
@@ -9200,7 +9238,7 @@ class BlockBasedTableOptionsJni
92009238
}
92019239

92029240
jmethodID method_id_init =
9203-
env->GetMethodID(jclazz, "<init>", "(ZZZZBBDBZJIIIJZZZZZIIZZJJBBJD)V");
9241+
env->GetMethodID(jclazz, "<init>", "(ZZZZBBDBZJIIIJZZZZZIIZZJJBBBJD)V");
92049242
if (method_id_init == nullptr) {
92059243
// exception thrown: NoSuchMethodException or OutOfMemoryError
92069244
return nullptr;
@@ -9250,6 +9288,8 @@ class BlockBasedTableOptionsJni
92509288
table_factory_options->super_block_alignment_space_overhead_ratio),
92519289
IndexShorteningModeJni::toJavaIndexShorteningMode(
92529290
table_factory_options->index_shortening),
9291+
IndexSearchTypeJni::toJavaIndexSearchType(
9292+
table_factory_options->index_block_search_type),
92539293
FilterPolicyJni::toJavaIndexType(filter_policy_type),
92549294
filter_policy_handle, filter_policy_config_value);
92559295
if (env->ExceptionCheck()) {

java/rocksjni/table.cc

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ jlong Java_org_rocksdb_PlainTableConfig_newTableFactoryHandle(
4545
/*
4646
* Class: org_rocksdb_BlockBasedTableConfig
4747
* Method: newTableFactoryHandle
48-
* Signature: (ZZZZBBDBZJJJIIIJZZZJZZIIZZJJBJI)J
48+
* Signature: (ZZZZBBDBZJJJIIIJZZZJZZIIZZJJBBJI)J
4949
*/
5050
jlong Java_org_rocksdb_BlockBasedTableConfig_newTableFactoryHandle(
5151
JNIEnv*, jclass, jboolean jcache_index_and_filter_blocks,
@@ -65,7 +65,8 @@ jlong Java_org_rocksdb_BlockBasedTableConfig_newTableFactoryHandle(
6565
jboolean jenable_index_compression, jboolean jblock_align,
6666
jlong jsuper_block_alignment_size,
6767
jlong jsuper_block_alignment_space_overhead_ratio, jbyte jindex_shortening,
68-
jlong jblock_cache_size, jint jblock_cache_num_shard_bits) {
68+
jbyte jindex_search_type, jlong jblock_cache_size,
69+
jint jblock_cache_num_shard_bits) {
6970
ROCKSDB_NAMESPACE::BlockBasedTableOptions options;
7071
options.cache_index_and_filter_blocks =
7172
static_cast<bool>(jcache_index_and_filter_blocks);
@@ -144,6 +145,9 @@ jlong Java_org_rocksdb_BlockBasedTableConfig_newTableFactoryHandle(
144145
options.index_shortening =
145146
ROCKSDB_NAMESPACE::IndexShorteningModeJni::toCppIndexShorteningMode(
146147
jindex_shortening);
148+
options.index_block_search_type =
149+
ROCKSDB_NAMESPACE::IndexSearchTypeJni::toCppIndexSearchType(
150+
jindex_search_type);
147151

148152
return GET_CPLUSPLUS_POINTER(
149153
ROCKSDB_NAMESPACE::NewBlockBasedTableFactory(options));

java/src/main/java/org/rocksdb/BlockBasedTableConfig.java

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ public BlockBasedTableConfig() {
4343
superBlockAlignmentSize = 0;
4444
superBlockAlignmentSpaceOverheadRatio = 128;
4545
indexShortening = IndexShorteningMode.kShortenSeparators;
46+
indexSearchType = IndexSearchType.kBinary;
4647

4748
// NOTE: ONLY used if blockCache == null
4849
blockCacheSize = 8 * 1024 * 1024;
@@ -64,8 +65,8 @@ private BlockBasedTableConfig(final boolean cacheIndexAndFilterBlocks,
6465
final boolean verifyCompression, final int readAmpBytesPerBit, final int formatVersion,
6566
final boolean enableIndexCompression, final boolean blockAlign,
6667
final long superBlockAlignmentSize, final long superBlockAlignmentSpaceOverheadRatio,
67-
final byte indexShortening, final byte filterPolicyType, final long filterPolicyHandle,
68-
final double filterPolicyConfigValue) {
68+
final byte indexShortening, final byte indexSearchType, final byte filterPolicyType,
69+
final long filterPolicyHandle, final double filterPolicyConfigValue) {
6970
this.cacheIndexAndFilterBlocks = cacheIndexAndFilterBlocks;
7071
this.cacheIndexAndFilterBlocksWithHighPriority = cacheIndexAndFilterBlocksWithHighPriority;
7172
this.pinL0FilterAndIndexBlocksInCache = pinL0FilterAndIndexBlocksInCache;
@@ -92,6 +93,7 @@ private BlockBasedTableConfig(final boolean cacheIndexAndFilterBlocks,
9293
this.superBlockAlignmentSize = superBlockAlignmentSize;
9394
this.superBlockAlignmentSpaceOverheadRatio = superBlockAlignmentSpaceOverheadRatio;
9495
this.indexShortening = IndexShorteningMode.values()[indexShortening];
96+
this.indexSearchType = IndexSearchType.values()[indexSearchType];
9597
try (Filter filterPolicy = FilterPolicyType.values()[filterPolicyType].createFilter(
9698
filterPolicyHandle, filterPolicyConfigValue)) {
9799
if (filterPolicy != null) {
@@ -871,6 +873,26 @@ public BlockBasedTableConfig setIndexShortening(final IndexShorteningMode indexS
871873
return this;
872874
}
873875

876+
/**
877+
* Get the index search type.
878+
*
879+
* @return the currently set index search type
880+
*/
881+
public IndexSearchType indexSearchType() {
882+
return indexSearchType;
883+
}
884+
885+
/**
886+
* Sets the index search type to used with this table.
887+
*
888+
* @param indexSearchType {@link org.rocksdb.IndexSearchType} value
889+
* @return the reference to the current option.
890+
*/
891+
public BlockBasedTableConfig setIndexSearchType(final IndexSearchType indexSearchType) {
892+
this.indexSearchType = indexSearchType;
893+
return this;
894+
}
895+
874896
/**
875897
* Get the size of the cache in bytes that will be used by RocksDB.
876898
*
@@ -996,7 +1018,7 @@ public BlockBasedTableConfig setHashIndexAllowCollision(
9961018
useDeltaEncoding, filterPolicyHandle, wholeKeyFiltering, verifyCompression,
9971019
readAmpBytesPerBit, formatVersion, enableIndexCompression, blockAlign,
9981020
superBlockAlignmentSize, superBlockAlignmentSpaceOverheadRatio, indexShortening.getValue(),
999-
blockCacheSize, blockCacheNumShardBits);
1021+
indexSearchType.getValue(), blockCacheSize, blockCacheNumShardBits);
10001022
}
10011023

10021024
private static native long newTableFactoryHandle(final boolean cacheIndexAndFilterBlocks,
@@ -1013,6 +1035,7 @@ private static native long newTableFactoryHandle(final boolean cacheIndexAndFilt
10131035
final int readAmpBytesPerBit, final int formatVersion, final boolean enableIndexCompression,
10141036
final boolean blockAlign, final long superBlockAlignmentSize,
10151037
final long superBlockAlignmentSpaceOverheadRatio, final byte indexShortening,
1038+
final byte indexSearchType,
10161039

10171040
@Deprecated final long blockCacheSize, @Deprecated final int blockCacheNumShardBits);
10181041

@@ -1046,6 +1069,7 @@ private static native long newTableFactoryHandle(final boolean cacheIndexAndFilt
10461069
private long superBlockAlignmentSize;
10471070
private long superBlockAlignmentSpaceOverheadRatio;
10481071
private IndexShorteningMode indexShortening;
1072+
private IndexSearchType indexSearchType;
10491073

10501074
// NOTE: ONLY used if blockCache == null
10511075
@Deprecated private long blockCacheSize;

0 commit comments

Comments
 (0)