You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add interpolation search as an alternative to binary search (facebook#14247)
Summary:
Interpolation search is an alternative algorithm to binary search, which performs better on uniformly distributed keys. Instead of binary search always computing the mid point of the left and right boundaries, interpolation search "interpolates" the mid point based on the distance to the target. Fortunately, we can re-use existing block format to support interpolation search.
For a given block, we compute the shared_prefix length of the first and last key. Interpolation search is usually done with numerical target values, so for a variable binary length key, we calculate the "value" as the first 8 non-shared bytes. This also means interpolation search would only really be effective for bytewise comparator (guarded via options validations).
#### Fallback to binary search
- if the the val(left_key) == val(right_key) then we fallback to classic binary search (to avoid divide by 0)
- interpolation search is significantly more computationally expensive than binary search, so when the search distance is small, we also fallback to binary search.
- if interpolation search does not make significant progress (i.e. reduces search space by more than half each iteration), we can assume data is non-uniform and fallback.
Interpolation search also performs best when there is minimal shortening, especially shortening of the last block, as it can heavily skew the distribution of the actual keys.
Note that each search algorithm is guaranteed to make progress because at each iteration the search space is guaranteed to be reduce by at least 1.
For now this change only applies to index block seeks, as data block seeks and other blocks do not have as many entries and would not require significant number of search rounds, but it could be easily extended to include that support.
Pull Request resolved: facebook#14247
Test Plan:
Updated unit tests and crash test with new search option
### Benchmark
The default benchmark sets up keys in generally uniform distribution, so it was a good way to test performance improvements.
Setup: `./db_bench -benchmarks=fillseq,compact -index_shortening_mode=1`
#### Before this change
```
./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1
readrandom : 2.899 micros/op 344973 ops/sec 2.899 seconds 1000000 operations; 38.2 MB/s (1000000 of 1000000 found)
```
#### After this change
Notice how key comparison counts are the same between the two.
```
./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=binary_search
readrandom : 2.881 micros/op 347128 ops/sec 2.881 seconds 1000000 operations; 38.4 MB/s (1000000 of 1000000 found)
```
```
./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=interpolation_search
readrandom : 2.609 micros/op 383209 ops/sec 2.610 seconds 1000000 operations; 42.4 MB/s (1000000 of 1000000 found)
```
With a non-uniform distribution, `i.e. index_shortening_mode=2`
```
./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=binary_search
readrandom : 2.958 micros/op 338075 ops/sec 2.958 seconds 1000000 operations; 37.4 MB/s (1000000 of 1000000 found)
```
```
./db_bench -use_existing_db=true -benchmarks=readrandom -seed=1 -index_search_type=interpolation_search
readrandom : 5.502 micros/op 181750 ops/sec 5.502 seconds 1000000 operations; 20.1 MB/s (1000000 of 1000000 found)
```
Reviewed By: pdillinger
Differential Revision: D91063163
Pulled By: joshkang97
fbshipit-source-id: 151d6aa76f8713740b714de6e406aff40d28ccbc
0 commit comments