Improve evmasm block deduplicator performance#16684
Conversation
2be13ff to
462f621
Compare
| /// Hash function compatible with `operator==`. Found via ADL by `boost::hash`. | ||
| friend std::size_t hash_value(AssemblyItem const& _item) | ||
| { | ||
| std::size_t hash = 0; | ||
| boost::hash_combine(hash, static_cast<int>(_item.m_type)); | ||
| if (_item.m_type == Operation) | ||
| boost::hash_combine(hash, static_cast<int>(_item.instruction())); | ||
| else if (_item.m_type == VerbatimBytecode) | ||
| boost::hash_combine(hash, *_item.m_verbatimBytecode); | ||
| else | ||
| boost::hash_combine(hash, _item.data()); | ||
| return hash; | ||
| } |
There was a problem hiding this comment.
This looks OK, I would just move it somewhere else.
Placing it between operator!= and operator< does not seem like the best choice.
There was a problem hiding this comment.
I placed it here so it's evident that the implementation follows the one of the equality operator. Where do you think it should live?
blishko
left a comment
There was a problem hiding this comment.
While the whole deduplication algorithm seems somewhat fragile to me, I believe the proposed changes preserve the existing behaviour and do not introduce any new issues.
rodiazet
left a comment
There was a problem hiding this comment.
LGTM. Some nits cleaning up includes.
|
|
||
| #include <cstddef> | ||
| #include <vector> | ||
| #include <functional> |
There was a problem hiding this comment.
good catch(es) but seems quite unrelated unless i am missing something? i'd rather file a follow-up.
There was a problem hiding this comment.
Absolutely. Mentioned this on second comment, but if we already change this file it can be cleaned. Just noticed this reviewing which headers you removed from cpp. Actually I doubt that we will have time to make cleanup PRs so I always try to clean if possible.
|
|
||
| #include <optional> | ||
| #include <iostream> | ||
| #include <sstream> |
There was a problem hiding this comment.
It’s unrelated to the PR but if we change the includes I would clean this one too.
|
I guess this is ready for a @cameel review? |
This is mainly a refactor of the
BlockDeduplicatorand no behavior change is intended.Right now the deduplicator uses a
std::setwith a lexicographical compare over each block's body. Looking up a block is thereforeO(block_size * log N)per probe and each comparison walks both blocks assembly item by assembly item (worst case the entire block).The refactor exchanges the
setwith anunordered_setprovided a specific hash function and a specific equality function. Each block's body is hashed once viahash_rangeand equality checks useranges::equalover the same range. Lookup is thenO(block_size)amortized.To achieve this,
AssemblyItemgets ahash_valuefriend so that boost hash can load a hash via ADLBlockIteratorgets default ctor and post-increment to satisfy the range concept (needed forhash_rangeandranges::equal).Some benchmarks on my machine: