This is how the search index for scdb works under the hood. Keep in mind that scdb is basically an inverted index implemented as a hashmap, backed by a list of doubly-linked cyclic lists of offsets.
In the most basic of terms, look at the inverted index as a mapping where the keys are the character sets, and the values are the main db offsets of the keys that have those character sets.
Note: In actual sense, the values themselves have some important meta information like expiry, is_deleted etc.
For example:
Consider the following keys: "foo", "fore", "bar", "band", "pig".
Assume that the offsets of these keys in the scdb file are:
"foo": 1, "fore": 2, "bar": 3, "band": 4, "pig": 5
The inverted index would be:
package example
_ = map[string][]int{
"f": [1, 2],
"b": [3, 4],
"p": [5],
"fo": [1, 2],
"ba": [3, 4],
"pi": [5],
"foo": [1],
"for": [2],
"bar": [3],
"ban": [4],
"pig": [5],
}There are five main operations
- This creates the search index file if it does not exist
- adds the 100-byte header basing on user configuration and default values
- adds the placeholders for the index blocks, each item pre-initialized with a zero.
- And loads the derived and non-derived properties like 'max_keys', 'max_index_key_len', 'block_size', '
redundant_blocks',
'number_of_index_blocks' (
(max_keys / number_of_items_per_index_block).ceil() + redundant_blocks), 'number_of_items_per_index_block' ((block_size / 8).floor()), 'values_start_point' (100 + (net_block_size * number_of_index_blocks)), 'net_block_size' (number_of_items_per_index_block * 8)
- Get the prefix of the key passed, upto
ncharacters. At the startn= 1. - The prefix supplied is run through a hash function with modulo
number_of_items_per_index_blockand answer multiplied by 8 to get the byte offset. Let the hashed value behash - Set
index_block_offsetto zero to start from the first block. - The
index_addressis set toindex_block_offset + 100 + hash. - The 8-byte offset at the
index_addressoffset is read. This is the first possible pointer to the value entry. Let's call itroot_value_offset. - If this
root_value_offsetis zero, this means that no value has been set for that inverted index key i.e. prefix yet.- set the value's
is_rootto true - set the value's
next_offsettolast_offset - set the value's
previous_offsettolast_offset(i.e. previous offset = next offset = offset of current value) - so the value entry (with all its data including
index_key_size,expiry(got from ttl from user),kv_address,deleted) is appended to the end of the file at offsetlast_offset - the
last_offsetis then inserted atindex_addressin place of the zero - the
last_offsetheader is then updated tolast_offset + get_size_of_v_entry(v)[get_size_of_v gets the total size of the entry in bytes]
- set the value's
- Else if this
root_value_offsetis non-zero, the value for that prefix has already been set.- set the
value_offsetof the current value toroot_value_offset - read the value at the
value_offset. This is the current value. - retrieve the
index_keyof this current value. - if the
index_keyis the same as the prefix passed:- i. If the
db_keyof the current value entry is equal to the key that is to be added:- set the new data from the db into the current value entry i.e. new
expiryand the newdb_offset. - increment
nby 1- if
nis greater thanmax_index_key_len, stop the iteration and exit. - else go back to step 1 ii. Else if the
db_keyis not equal to the key being added
- if
- if the
next_offsetis equal to theroot_value_offset, append the new value to the end of this list i.e.- Append the value entry (with all its data including
index_key_size,expiry(got from ttl from user)deleted,kv_address) is to the end of the file at offsetlast_offset - Set the
next_offsetof the current value entry (not the newly appended one) tolast_offset - Set the
previous_offsetof the newly appended entry to thevalue_offset - Set the
next_offsetof the newly appended entry to theroot_value_offset. - the
last_offsetheader is then updated tolast_offset + get_size_of_v(v)
- Append the value entry (with all its data including
- else
- set
value_offsettonext_offset. - read the value at
value_offsetand this becomes the current value. - Go back to step (i) - i.e. check
db_keyagainst the key passed.
- set
- set the new data from the db into the current value entry i.e. new
- i. If the
- else increment the
index_block_offsetbynet_block_size- if the new
index_block_offsetis equal to or greater than thevalues_start_point, raise theCollisionSaturatedErrorerror. We have run out of blocks without getting a free slot to add the value entry. - else go back to step 4.
- if the new
- set the
Note: this uses a form of separate chaining to handle hash collisions. Having multiple index blocks is a form of separate chaining
- Time complexity: This operation is O(km+n) where:
- k =
number_of_index_blocks - m =
max_index_key_len - n = length of the longest linked list of values accessed.
- k =
- Auxiliary Space: This operation is O(km+n) where:
- n = length of the longest
db_keyin the linked list of values accessed. - m =
max_index_key_len. - k =
number_of_index_blocks. If the hash for the given prefix already has offsets associated with it, each will be allocated in memory thuskm.
- n = length of the longest
- There is a possibility that when one sets a given
index_keyor prefix, we may find all index blocks for the given hash already filled. We thus have to throw aCollisionSaturatedErrorand abort inserting the key. This means that the occurrence of such errors will increase in frequency as the number of keys comes closer to themax_keysvalue. One possible remedy to this is to add a more redundant index block(s) i.e. increaseredundant_blocks. Keep in mind that this consumes extra disk and memory space.
- Get the prefix of the key passed, upto
ncharacters. At the startn= 1. - The prefix is run through a hash function with modulo
number_of_items_per_index_blockand answer multiplied by 8 to get the byte offset. Let the hashed value behash. - Set
index_block_offsetto zero to start from the first block. - The
index_addressis set toindex_block_offset + 100 + hash. - The 8-byte offset at the
index_addressoffset is read. This is the first possible pointer to the key-value entry. Let's call itroot_value_offset. - Set the
value_offsetof the current value toroot_value_offset - If this
root_value_offsetis non-zero, it is possible that the value for that key exists.-
read the value at the
value_offset. This is thecurrent_value. -
retrieve the
index_keyof thiscurrent_value. -
if the
index_keyis the same as the prefix:- i. retrieve the
db_keyof thiscurrent_value. - ii. if the
db_keyis the same as the key passed:- update the
is_deletedof thecurrent_valueto true - if
previous_offsetequalsvalue_offset:- set the
previous_valuetocurrent_value
- set the
- else:
- read the value at
previous_offsetof thecurrent_value. Set thatprevious_valueto that value.
- read the value at
- if
next_offsetequalsvalue_offset:- set
next_valuetocurrent_value
- set
- else if
next_offsetequalsprevious_offset- set
next_valuetoprevious_value
- set
- else:
- read the value at
next_offsetof thecurrent_value. Set thatnext_valueto that value
- update the
- update the
next_offsetof theprevious_valuetonext_offsetof thecurrent_value.- update the
previous_offsetof thenext_valuetoprevious_offsetof thecurrent_value. - if
is_rootis true forcurrent_value:- set the
is_rootof thenext_valueto true. - set the value at
index_addresstonext_offset
- set the
- if
value_offsetequalsroot_value_offset- set the value at
index_addressto 0 i.e. reset it
- set the value at
- increment
nby 1 - if
nis greater thanmax_index_key_len, exit - else go back to step 1
- update the
-iii. else if
db_keyis not equal to the key passed:- if
next_offsetequalsroot_value_offset: - increment
nby 1: - if
nis greater thanmax_index_key_len, exit - else go back to step 1 - else: - set the
value_offsettonext_offset - read the value at the
value_offset. This is thecurrent_value. - go back to step (i)
- i. retrieve the
-
else increment the
index_block_offsetbynet_block_size- if the new
index_block_offsetis equal to or greater than thevalues_start_point, exit. - else go back to step 4.
- if the new
-
- Time complexity: This operation is O(km+n) where:
- k =
number_of_index_blocks - m =
max_index_key_len - n = length of the longest linked list of values accessed.
- k =
- Auxiliary space: This operation is O(km+n) where:
- k =
number_of_index_blocks - m =
max_index_key_len - n = length of the longest
db_keyin the linked list of values accessed.
- k =
- Get the lower of the two values: length of the
search_termand themax_index_key_len. Let it ben. - Get the prefix i.e. the first
ncharacters/runes of thesearch_term - The prefix is run through a hash function with modulo
number_of_items_per_index_blockand answer multiplied by 8 to get the byte offset. Let the hashed value behash. - Set
index_block_offsetto zero to start from the first block. - The
index_addressis set toindex_block_offset + 100 + hash. - The 8-byte offset at the
index_addressoffset is read. This is the first possible pointer to the key-value entry. Let's call itroot_value_offset. - If this
root_value_offsetis non-zero, it is possible that the value for that prefix exists.- retrieve the value at the
root_value_offset. Let it becurrent_value. - retrieve the
index_keyof thecurrent_value - if
index_keyequals prefix:- i. let
value_offsetequal toroot_value_offset - ii. retrieve the
db_keyof thecurrent_value - iii. if
db_keycontains thesearch_term:- add its
kv_addressto the list ofmatched_addresses
- add its
- iv. set the
value_offsettonext_offsetof thecurrent_value - v. if the
current_offsetequalsroot_value_offset- read the key, values of the
matched_addressesfrom the main database file and return them - exit
- read the key, values of the
- vi. else go back to step (ii).
- i. let
- else increment the
index_block_offsetbynet_block_size- if the new
index_block_offsetis equal to or greater than thekey_values_start_point, stop and return an empty list. No matches found. - else go back to step 5.
- if the new
- retrieve the value at the
- Else return an empty list. No matches found.
- Time complexity: This operation is O(kn) where:
- k =
number_of_index_blocks. - n = length of the longest linked list of values accessed.
- k =
- Auxiliary space: This operation is O(kn) where:
- k =
number_of_index_blocks - n = length of the longest
db_keyin the linked list of values accessed.
- k =
Clear the entire database.
- Initialize a new header basing on the old settings
- Write the new header into the file
- Shrink the file to the size of the header
- Expand the file to the expected size of file if it had headers and index blocks. This fills any gaps with 0s.
- Time complexity: This operation is O(1)
- Auxiliary space: This operation is O(1).
- The inverted index works separately from the BufferPool only until the point of compaction.
- When the DB file is being compacted, addresses are being moved around and so that requires the index to be rebuilt.
- This makes compaction an even more expensive operation than it was before.
skipandlimitparameters are provided where:skipis the number of matching records to skip before starting to return. Default is zero.limitis the maximum number of records to return at any one time. Default is zero. When limit is zero, all matched records are returned since it would make no sense for someone to search for zero items.