Skip to content

perf: optimize BAL lookups and eliminate redundant GetCodeHash calls#11446

Open
kamilchodola wants to merge 6 commits into
masterfrom
bal-devnet-3-optimized
Open

perf: optimize BAL lookups and eliminate redundant GetCodeHash calls#11446
kamilchodola wants to merge 6 commits into
masterfrom
bal-devnet-3-optimized

Conversation

@kamilchodola
Copy link
Copy Markdown
Contributor

Summary

  • Replace SortedDictionary with Dictionary in BlockAccessList for O(1) account lookups on the EVM hot path (was O(log n) with 20-byte span comparisons). Sorted enumeration preserved for encoding/validation via AccountChangesSorted.
  • Merge TryGetDelegation + GetCachedCodeInfo into a single call in InstructionCall, eliminating a redundant GetCodeHash per CALL opcode.
  • Inline IsDeadAccount in EXTCODEHASH to avoid a second GetCodeHash call for the same address.

Benchmark results (compute/perf-devnet-3, test_gas_op filter, dotTrace profiling enabled)

Gas Level Baseline (ms) This PR (ms) Change
30M 111.6 94.5 -15.3%
60M 115.3 94.8 -17.8%
150M 146.6 113.6 -22.5%
180M 152.3 115.8 -24.0%
300M 226.5 166.9 -26.3%

Types of changes

  • Optimization

Testing

  • Existing tests pass (16/16 BlockAccessList tests green)
  • Benchmark profiled with dotTrace on perf-devnet-3

- Replace SortedDictionary with Dictionary in BlockAccessList for O(1)
  account lookups on the EVM hot path (was O(log n) with 20-byte span
  comparisons). Sorted enumeration preserved for encoding/validation.
- Merge TryGetDelegation + GetCachedCodeInfo into a single call in
  InstructionCall, eliminating a redundant GetCodeHash per CALL opcode.
- Inline IsDeadAccount in EXTCODEHASH to avoid a second GetCodeHash
  call for the same address.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Claude finished @kamilchodola's task in 6m 17s —— View job


Code Review

  • Gather context and read changed files
  • Read relevant rule files (coding-style, robustness, performance)
  • Review BlockAccessList.cs changes (Dictionary vs SortedDictionary)
  • Review BlockAccessListDecoder.cs and decoder tests
  • Review EvmInstructions.Call.cs (merged TryGetDelegation + GetCachedCodeInfo)
  • Review EvmInstructions.Environment.cs (inlined IsDeadAccount)
  • Review EngineModuleTests.V6.cs change
  • Synthesize findings and post review

Summary

The core optimizations are valid and the benchmark numbers are strong. Three targeted changes — Dictionary for O(1) BAL lookups, collapsing TryGetDelegation+GetCachedCodeInfo into one call, and inlining IsDeadAccount in EXTCODEHASH — are all mechanically correct. One medium and one low finding are flagged below.


Findings

Medium — AccountChangesSorted uses LINQ OrderBy, violating the "No LINQ" rule

src/Nethermind/Nethermind.Core/BlockAccessLists/BlockAccessList.cs line 33–35

The new AccountChangesSorted property calls OrderBy on every access. The repo's coding-style rules explicitly state "No LINQ when a simple for/foreach works" — and List<T>.Sort() is the direct alternative for sorting. Beyond style, there is a performance regression for the validation path: GetChangesAtIndex is called once per transaction index in BlockAccessListManager (line 291–292), and each call now triggers an O(n log n) sort, whereas the old SortedDictionary kept accounts pre-sorted with O(log n) per-insert cost. For a block with T transactions and A accounts, validation degrades from O(T·A) → O(T·A log A).

The EVM hot-path gains clearly outweigh this, but the allocation pattern should be fixed. Suggested alternative:

public EnumerableWithCount<AccountChanges> AccountChangesSorted
{
    get
    {
        List<AccountChanges> sorted = new(_accountChanges.Values);
        sorted.Sort(static (a, b) => GenericComparer.GetOptimized<Address>().Compare(a.Address, b.Address));
        return new EnumerableWithCount<AccountChanges>(sorted, sorted.Count);
    }
}

Fix this →


Low — GetContentLength inconsistently uses AccountChanges while Encode uses AccountChangesSorted

src/Nethermind/Nethermind.Serialization.Rlp/Eip7928/BlockAccessListDecoder.cs line 48

GetLengthGetContentLength uses item.AccountChanges (insertion order) while Encode uses item.AccountChangesSorted. This is functionally correct — RLP encoding length is the same regardless of ordering. But it reads misleadingly next to the sorted encode. Either document the intent or use the same property in both.


Correctness Verification

InstructionCall refactor ✅ — vm.CodeInfoRepository is assigned from TxExecutionContext.CodeInfoRepository (line 158 of VirtualMachine.cs), so replacing vm.TxExecutionContext.CodeInfoRepository.TryGetDelegation() + vm.CodeInfoRepository.GetCachedCodeInfo() with a single vm.CodeInfoRepository.GetCachedCodeInfo(…, out Address? delegated) call is equivalent and eliminates the double delegation-resolution work. Gas accounting order is preserved; moving the cache read earlier (before OOG checks) is safe because GetCachedCodeInfo has no observable state side-effects.

EXTCODEHASH inline ✅ — The inlined dead-account check is semantically equivalent to IsDeadAccount across all IWorldState implementations. BlockAccessListBasedWorldState.IsDeadAccount was already implemented as !AccountExists || (balance==0 && nonce==0 && codeHash==emptyString) — which is exactly what the new code does. StateProvider.IsDeadAccount resolves the same condition via Account.IsEmpty. The single GetCodeHash call replaces two (one inside the former IsDeadAccount for StateProvider implementations that pre-load the full account, and one explicit call after).

RLP round-trip ✅ — Decode validates sorted-address ordering and populates a Dictionary preserving insertion order. Encode then sorts via AccountChangesSorted before writing. The Can_decode_then_encode test correctly covers this round-trip.

GetChangesAtIndex ordering ✅ — Switched from AccountChanges to AccountChangesSorted restores the order guarantee that SortedDictionary previously gave for free.


Severity Count Items
Critical 0
High 0
Medium 1 LINQ OrderBy in AccountChangesSorted (style + validation perf)
Low 1 GetContentLength inconsistency


public EnumerableWithCount<AccountChanges> AccountChanges => new(_accountChanges.Values, _accountChanges.Values.Count);
public EnumerableWithCount<AccountChanges> AccountChangesSorted => new(
_accountChanges.Values.OrderBy(static a => a.Address, GenericComparer.GetOptimized<Address>()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — LINQ rule violation + O(n log n) sort on every access

AccountChangesSorted calls OrderBy on every access. The coding-style rules say "No LINQ when a simple for/foreach works" — here List<T>.Sort() is the idiomatic alternative.

More importantly, GetChangesAtIndex (called once per transaction index during block validation in BlockAccessListManager) now triggers an O(n log n) sort per call, whereas the previous SortedDictionary kept items pre-sorted. For a block with T transactions and A accounts this degrades validation from O(T·A) to O(T·A log A).

Suggest replacing with a sort-on-demand pattern:

public EnumerableWithCount<AccountChanges> AccountChangesSorted
{
    get
    {
        List<AccountChanges> sorted = new(_accountChanges.Values);
        sorted.Sort(static (a, b) => GenericComparer.GetOptimized<Address>().Compare(a.Address, b.Address));
        return new EnumerableWithCount<AccountChanges>(sorted, sorted.Count);
    }
}

This avoids the LINQ overhead and hidden deferred-execution allocations while still producing the same result. The per-call O(n log n) cost is unchanged, but at least there's no additional LINQ machinery on top.

=> stream.EncodeArray([.. item.AccountChangesSorted], rlpBehaviors);

private static int GetContentLength(BlockAccessList item, RlpBehaviors rlpBehaviors)
=> AccountChangesDecoder.Instance.GetContentLength([.. item.AccountChanges], rlpBehaviors);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — inconsistency: GetContentLength uses unsorted AccountChanges while Encode uses AccountChangesSorted

RLP encoding length is order-independent, so this is functionally correct. But it reads oddly side-by-side with the Encode method that explicitly sorts. For clarity, consider using AccountChanges (the simpler property) in both, or documenting why they differ.

Suggested change
=> AccountChangesDecoder.Instance.GetContentLength([.. item.AccountChanges], rlpBehaviors);
=> AccountChangesDecoder.Instance.GetContentLength([.. item.AccountChangesSorted], rlpBehaviors);

@kamilchodola kamilchodola marked this pull request as ready for review May 1, 2026 08:39
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 1, 2026

Claude finished @kamilchodola's task in 3m 48s —— View job


Code Review

  • Gather context (previous review found Medium LINQ + Low inconsistency; commit d5cabdf claims both are fixed)
  • Read rule files (coding-style, robustness, performance)
  • Verify LINQ fix in BlockAccessList.cs
  • Verify GetContentLength consistency fix in BlockAccessListDecoder.cs
  • Re-review all changed files for any new or remaining issues
  • Synthesize and post final review

Summary

All findings from the first review have been correctly resolved. The code is correct, performant, and clean. No new issues found.


Resolved Findings (verified)

Medium (now fixed) — LINQ OrderBy in AccountChangesSorted

BlockAccessList.cs lines 32–40 now uses List<AccountChanges>.Sort() with a static lambda:

List<AccountChanges> sorted = new(_accountChanges.Values);
sorted.Sort(static (a, b) => a.Address.CompareTo(b.Address));

Address.CompareTo is Bytes.AsSpan().SequenceCompareTo(other?.Bytes) — consistent with the ordering check in the decoder (address.CompareTo(lastAddress) <= 0). ✅

Low (now fixed) — GetContentLength / Encode inconsistency

Both GetContentLength and Encode in BlockAccessListDecoder.cs now use item.AccountChangesSorted. ✅


Correctness Re-verification

EXTCODEHASH inline — The optimization saves one GetCodeHash call on the hot path for live accounts. BlockAccessListBasedWorldState.IsDeadAccount was calling AccountExists + GetBalance + GetNonce + GetCodeHash internally, and then the opcode was calling GetCodeHash again — two total. New code: AccountExists + GetCodeHash + GetBalance + GetNonce — one. Semantically equivalent: non-existent accounts still return 0 immediately (before any code-hash lookup), dead accounts are correctly identified by the same three-field check, and live accounts push the hash. ✅

InstructionCall delegation merge — Single GetCachedCodeInfo(codeSource, spec, out Address? delegated) replaces TryGetDelegation + GetCachedCodeInfo. Gas accounting order is unchanged; moving the cache read before the OOG checks is safe since GetCachedCodeInfo has no observable side-effects. ✅

Dictionary orderingGetChangesAtIndex iterates AccountChangesSorted, so the order guarantee previously provided by SortedDictionary is preserved on the validation path. Encode also uses AccountChangesSorted, so RLP output is deterministic and sorted. ✅

Test change (EngineModuleTests.V6.cs)SortedDictionaryDictionary for modifiedAccounts in the test. Correct: new BlockAccessList(modifiedAccounts) accepts a Dictionary, and the encoded output is sorted via AccountChangesSorted regardless of insertion order. ✅


Findings

Severity Count Items
Critical 0
High 0
Medium 0 All resolved
Low 0 All resolved

Ready to merge. The benchmark numbers (up to −26% at 300 M gas) are strong, both previous findings are properly fixed, and no new issues are present.

Copy link
Copy Markdown
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor optimizations still possible:

Comment on lines +36 to +38
List<AccountChanges> sorted = new(_accountChanges.Values);
sorted.Sort(static (a, b) => a.Address.CompareTo(b.Address));
return new EnumerableWithCount<AccountChanges>(sorted, sorted.Count);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be cached (and dropped on any _accountChanges changes)?

}
// Otherwise, push the account's code hash.
ValueHash256 hash = state.GetCodeHash(address);
if (state.GetBalance(address) == 0 && state.GetNonce(address) == 0 && hash == Keccak.OfAnEmptyString)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (state.GetBalance(address) == 0 && state.GetNonce(address) == 0 && hash == Keccak.OfAnEmptyString)
if (state.GetBalance(address).IsZero && state.GetNonce(address).IsZero && hash == Keccak.OfAnEmptyString)

Comment on lines +620 to +621
ValueHash256 hash = state.GetCodeHash(address);
if (state.GetBalance(address) == 0 && state.GetNonce(address) == 0 && hash == Keccak.OfAnEmptyString)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe IsDeadAccount or IsNonZeroAccount would be better?
Or TryGetAccount and then IsEmpty or IsNull?


// todo: optimize to use hashmaps where appropriate, separate data structures for tracing and state reading
private readonly SortedDictionary<Address, AccountChanges> _accountChanges = new(GenericComparer.GetOptimized<Address>());
private readonly Dictionary<Address, AccountChanges> _accountChanges = new();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pre-initialize with some sensible count? Like 4*transaction count in block?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

EXPB Benchmark Comparison

Run: View workflow run

superblocks

Scenario: nethermind-flat-superblocks-bal-devnet-3-optimized-delay0s

Client Processing (SSE)

Metric PR Master (cached) Delta
AVG (ms) 1290.12 1232.72 +4.66%
MEDIAN (ms) 1155.8 1150.9 +0.43%
P90 (ms) 1734.8 1641.0 +5.72%
P95 (ms) 2249.8 2090.3 +7.63%
P99 (ms) 4357.4 3406.2 +27.93%
MIN (ms) 812.0 813.6 -0.20%
MAX (ms) 4357.4 3406.2 +27.93%
K6 TTFB
Metric PR Master (cached) Delta
AVG (ms) 1505.02 1455.31 +3.42%
MEDIAN (ms) 1349.73 1308.74 +3.13%
P90 (ms) 1902.60 1904.91 -0.12%
P95 (ms) 2670.63 2372.29 +12.58%
P99 (ms) 3759.87 3397.21 +10.68%
MIN (ms) 971.76 945.24 +2.81%
MAX (ms) 4507.62 3565.86 +26.41%

realblocks

Scenario: nethermind-flat-realblocks-bal-devnet-3-optimized-delay0s

Client Processing (SSE)

Metric PR Master (cached) Delta
AVG (ms) 35.07 35.46 -1.10%
MEDIAN (ms) 29.0 29.0 +0.00%
P90 (ms) 65.6 64.0 +2.50%
P95 (ms) 77.2 76.4 +1.05%
P99 (ms) 130.1 122.2 +6.46%
MIN (ms) 0.5 0.5 +0.00%
MAX (ms) 249.6 236.2 +5.67%
K6 TTFB
Metric PR Master (cached) Delta
AVG (ms) 41.04 41.20 -0.39%
MEDIAN (ms) 34.75 34.90 -0.43%
P90 (ms) 72.43 71.72 +0.99%
P95 (ms) 83.09 83.56 -0.56%
P99 (ms) 131.72 131.11 +0.47%
MIN (ms) 1.21 1.42 -14.79%
MAX (ms) 254.55 241.57 +5.37%

…lining

- Add [JsonIgnore] to AccountChangesSorted property to prevent it from
  being serialized in JSON-RPC responses (fixes Eth_get_block_access_list tests)
- Revert EXTCODEHASH to use IsDeadAccount (single state lookup vs 3 separate
  calls, logically equivalent but more efficient)
Revert the optimization that merged TryGetDelegation + GetCachedCodeInfo
into a single call. Moving GetCachedCodeInfo before the new-account gas
check causes EIP-7702 delegation + OOG Pyspec tests to fail (callcode
and staticcall variants). The exact mechanism needs further investigation
but the original two-call sequence is required for correct gas accounting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants