Performance 11900115518: use atom key packed in version map cache#3116
Performance 11900115518: use atom key packed in version map cache#3116alexowens90 wants to merge 1 commit into
Conversation
| entry->validate(); | ||
|
|
||
| version_store::TombstoneVersionResult res{true, entry->head_->id()}; | ||
| version_store::TombstoneVersionResult res{true, entry->stream_id_}; |
There was a problem hiding this comment.
This was a pre-existing bug that dereferenced std::nullopt when trying to delete a symbol that had never existed (i.e. had no vref key), it started segfaulting with the other changes.
ArcticDB Code Review SummaryCorrectness
Performance
Testing
Latest delta (7baa1f3..1873c55, after rebase onto current master)
PR Title and Description
|
bc83aa8 to
e07974f
Compare
|
Excellent point Claude |
e07974f to
56cc7a5
Compare
| } | ||
|
|
||
| friend bool operator>(const AtomKeyImpl& l, const AtomKeyImpl& r) { return !(l < r) && (l != r); } | ||
| friend bool operator>(const AtomKeyImpl& l, const AtomKeyImpl& r) { return r < l; } |
There was a problem hiding this comment.
Noticed that this now breaks std::weak_ordering which requires exactly one of a < b, a == b and a > b to be true. And we now have only std::partial_ordering
E.g. if two atom keys are the same but have a different type all of the above would be false.
To be honest the previous version was also broken because a > b && b > a could have been true.
In any case I think if not worried about performance we should include all fields in the std::tie comparison. I think it would make reasoning about this class easier because it would give us std::strong_ordering
There was a problem hiding this comment.
Yeh, comparing keys of different type is pretty meaningless, but so is having index_start_ and index_end_ before creation_ts_ in the std::tie, so I'm fine with making things simple.
| #include <arcticdb/version/version_utils.hpp> | ||
| #include <arcticdb/util/lock_table.hpp> | ||
|
|
||
| #include "storage/storage_utils.hpp" |
| if (head_) | ||
| id_to_version_id[head_->id()].push_back(head_->version_id()); | ||
| for (const auto& k : keys_) | ||
| id_to_version_id[k.id()].push_back(k.version_id()); |
There was a problem hiding this comment.
Here we used to check that all keys have the same stream_id.
We should move these assertions to methods like unsift_key or try_set_tombstone_all which get AtomKey arguments.
6567da1 to
de3849c
Compare
| object_version_store.write(sym, df) | ||
| vitem = object_version_store.read(sym) | ||
| assert_equal(vitem.data, df) | ||
| object_version_store.version_store.clear() |
There was a problem hiding this comment.
Azurite silently maps \ characters to / in this test. It was passing previously as we did not validate that keys on version/version ref segments had the same stream id as the version map entry. In practice, because / is before \ in the character list, this meant that the write call for the \ symbol actually loaded the version map entry for the / symbol, but so did the read, so everything appeared to work correctly.
The change is safe for real users because one of the following is true:
- This is a bug in Azurite, real Azure blob storage fully supports
\characters in blob paths (the docs don't mention a limitation like this, although a 7 year old stack overflow answer does), and so this is only a problem in the testing environment. - Real Azure behaves the same way as Azurite. In this case, if a user has two symbols
a/bandb\a, then it is possible they were previously accidentally reading the wrong one, and so raising an exception is better behaviour. If this is the case, we should disallow\characters in symbol names on Azure.
| auto key = atom_key_builder().build(symbol, KeyType::TABLE_INDEX); | ||
| version_map_->write_version(store_, key, previous_key); | ||
| previous_key = key; | ||
| version_map_->write_version(store_, key, std::nullopt); |
There was a problem hiding this comment.
This was wrong previously, it was setting the previous key to one with a different stream id to key.
Before this change stream ids were not checked to be identical in the version map entry, so it worked anyway.
With this change, it only failed on the sanitizer runs because they are slower than regular test runs, so the version map cache expired before the call to get_symbol_set below, which then raised in read_segment_with_keys.
7ec9cb7 to
d5724da
Compare
7baa1f3 to
1873c55
Compare
Reference Issues/PRs
11900115518
What does this implement or fix?
Uses
AtomKeyPackedinstead ofAtomKeyin the two collections insideVersionMapEntryin order to reduce the memory footprint of the version map cache. Also usesankerl::unordered_dense::mapoverstd::unordered_mapfor the same reason. FullAtomKeys are only materialised when necessary.