Optimizations for stack checks when no over/under flow is present#2709
Optimizations for stack checks when no over/under flow is present#2709ehildenb wants to merge 6 commits into
Conversation
afd1734 to
c0852fc
Compare
|
I've tested this change on the KEVM testsuite, and th eresults are: Old KEVM performance (~ October 1, 2024): Recent commit performance (#2708): Change to how stack size is calculated (#2676): This PR: So the concrete execution time went up a bit, but symbolic execution came back down. I think it's because in #2676, an optimization is made to try and avoid computing workstack sizes using lazy evaluation (something I believe the LLVM backend does, because it's safe for concrete execution, but the Haskell backend doesn't): https://github.com/runtimeverification/evm-semantics/pull/2676/files#diff-f2f12e7ffac586c70477ba2c731d8d36e08ee0b8c294a5ef86d31337212b0264L384. So this PR instead introduces it as pattern-matching directly on the rules for checking stack-over/under flow which should be fast on both backends. It does seem to make the LLVM backend run a bit longer, I will measure again and see if that's an artifact or a real slowdown. Ping @Stevengre |
c0852fc to
51deb06
Compare
|
A second run of the tests on this PR demonstrates that performance is just flaky it appears: |
|
I've checked through the commit history, and it seems that the performance regression I measured was flakiness on my machine. Re-measuring shows that we are at similar performanec as we were in October of last year. |
|
I re-measured, and it seems this does not appreciably change any performance metrics: 4c6d363 (this commit): db7d25e (current master): Looking at specific concrete execution specs, it appears that there are some tests that benefit, mostly ones that hammer the wordstack size, but most do not. |
This is pulled out of: #2676
Specifically, we are trying to optimize for the cases where the stack checks on under/over-flow do not fail, by directly going to
falseon those cases when possible rather than computing the size of the workstack (using pattern-matching and faster checks). In the cases where it's potentially failing that check, we fall back to the slow cases which use the size of the wordstack to compute whether there is a stack under/over-flow.The number of functional tests run in parallel is also increased, as that is causing timeouts (and has on other PRs too).