Skip to content

Bal devnet 6 benchmarks#11511

Closed
Marchhill wants to merge 162 commits into
masterfrom
bal-devnet-6-benchmarks
Closed

Bal devnet 6 benchmarks#11511
Marchhill wants to merge 162 commits into
masterfrom
bal-devnet-6-benchmarks

Conversation

@Marchhill
Copy link
Copy Markdown
Contributor

@Marchhill Marchhill commented May 6, 2026

Changes

  • Split BlockAccessList/AccountChanges/SlotChanges into role-specific ReadOnly* (wire), Generated* (locally produced), and *AtIndex (per-tx) types so the hot paths only carry the fields they need.
  • Parallelise transaction execution against the suggested BAL via a per-account prestate gate; the loader and pre-execution iteration also run as parallel-loop slots so workers no longer wait on a block-wide barrier.
  • Defer per-account sorted-array rebuilds to a single Array.Sort on first read, replacing the O(n²) per-prestate-load InsertSorted.
  • Cut allocations on the validation and generation hot paths (zero-alloc dict-lookup BAL validation, pooled WorldStates / TxProcessors / BALs, tighter generated-BAL structures).
  • Fix a BAL delegation-OOG bug: defer GetCachedCodeInfo past the CALL cold-access OOG check so a reverted EIP-7702 call doesn't leave the delegation target in the BAL.

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

flcl42 and others added 30 commits April 30, 2026 13:49
)

* fix(bal): align EIP-7928 BlockAccessIndex with uint32 spec, add validation

Widens BlockAccessIndex from uint16 to uint32 per EIP-7928 (commit 645099785a)
and geth bal-devnet-4. Hardens the BAL decoder so truncated/malformed wire
bytes produce a clean RlpException at engine_newPayloadV5 instead of crashing.
Adds the missing validation rules geth bal-devnet-4 enforces: empty
storage_changes per slot is rejected, and BlockAccessIndex is bounded by
[0, txCount + 1].

Decoder primitives:
- New Rlp.ValueDecoderContext.DecodeUInt() / RlpStream.Encode(uint) /
  Rlp.LengthOf(uint) helpers.
- Rlp.Decode<T> and Rlp.DecodeArrayPool<T> wrap IndexOutOfRangeException /
  ArgumentOutOfRangeException as RlpException so any truncated-RLP primitive
  read surfaces consistently.
- BlockAccessListDecoder rejects an empty AccountChanges entry (0xc0 inside
  the outer list) and SlotChangesDecoder rejects an empty StorageChange list
  for a slot — geth bal-devnet-4 "empty storage writes" parity.

Type widening:
- IIndexedChange.Index, BalanceChange/NonceChange/CodeChange/StorageChange.Index,
  BlockAccessList.Index and BlockAccessIndex on the Change journal record all
  become uint. SortedList<int, T> keys flip to SortedList<uint, T>; ushort
  query parameters on AccountChanges become uint.

Prestate sentinel preservation:
- Replaces the legacy -1 int sentinel with Eip7928Constants.PrestateIndex
  (uint.MaxValue). Adds PrestateAwareIndexComparer that orders the sentinel
  before all real indices, restoring the iteration semantics that
  AccountChanges.GetBalance/GetNonce/GetCode/AccountExists and
  ApplyStateChanges' [^1].Index check rely on.
- Decoder-built SortedLists also use the prestate-aware comparer so that
  LoadPreStateToSuggestedBlockAccessList grafting prestate onto the suggested
  BAL after decode keeps it sorted first.
- Loop predicates that compare change.Key directly against blockAccessIndex
  explicitly skip PrestateIndex so the raw uint comparison doesn't trigger
  on the huge sentinel value.

Validation:
- BlockValidator gains ValidateBlockLevelAccessListIndexBounds enforcing
  index <= txCount + 1 (mirrors geth's index < txCount + 2 check) with a new
  BlockLevelAccessListIndexOutOfRange error message.

Tests:
- New PrestateAwareIndexComparerTests, AccountChangesPrestateTests covering
  the comparer and prestate-fallback iteration semantics.
- BlockAccessListDecoderTests adds: empty-bytes / truncated-outer-list /
  inner-empty-list throw RlpException; empty storage_changes per slot throw;
  decoded SlotChanges accepts a later prestate graft as first; BalanceChange
  round-trips for indices 0x10_0000 and uint.MaxValue.
- BlockValidatorTests adds tx-index bound cases (0, 1 valid; 2,
  uint.MaxValue rejected for a 0-tx block).
- ExecutionPayloadV4Tests covers the engine-API decoding-error path for
  malformed BAL bytes.

* style: drop unused usings flagged by IDE0005

CI surfaced four IDE0005 warnings (treated as errors):
- BlockAccessListManager: drop `using Nethermind.Crypto;` —
  Keccak.OfAnEmptySequenceRlp comes from Nethermind.Core.Crypto, already imported.
- PrestateAwareIndexComparerTests / AccountChangesPrestateTests: drop
  `using Nethermind.Core;` — Eip7928Constants resolves via the test's parent
  namespace Nethermind.Core.Test.BlockAccessLists.
- Eip8037Tests: drop `using System;` — no System.* type referenced directly.

* fix(bal): record system pre/post-block SSTOREs as reads per EIP-7928 v5.7.0

EIP-7928 v5.7.0 specifies that SSTOREs performed during system pre-block calls
(EIP-2935 BlockHashHistory, EIP-4788 BeaconRootContract) and post-block calls
(EIP-7002 withdrawal requests, EIP-7251 consolidation requests) are recorded
in the BAL as storage reads on the touched slot — not as storage changes with
post-values. Same-value writes are skipped entirely. Without this, nethermind
generated a BAL whose Keccak256 differs from what eels-built fixtures expect,
so the BlockAccessListHash check fails for every Amsterdam block that touches
a system contract slot (most pyspec tests).

IWorldState gains an opt-in scope:

  IDisposable? BeginSystemPreBlockScope()

TracedAccessWorldState implements it via an int depth counter. While depth > 0,
Set(storageCell, value) reclassifies the recording: AddStorageRead when the
slot value would change, no-op when unchanged. The state mutation still
applies via the inner world state.

BlockAccessListManager wraps the three system contract entry points with the
scope: StoreBeaconRoot (EIP-4788 system tx), ApplyBlockhashStateChanges
(EIP-2935 fast-path Set), and ProcessExecutionRequests (EIP-7002 / EIP-7251
post-block system txs).

Parallel-mode state application:

In parallel processing, non-system slots wrap stateProvider with
BlockAccessListBasedWorldState whose Set is a no-op — actual state mutation
relies on ApplyStateChanges replaying the suggested BAL's storage_changes.
With the spec-correct BAL containing reads instead of changes for the system
slots, ApplyStateChanges has nothing to replay for them, so the canonical
state would diverge.

TxProcessorWithWorldState gains an `isSystemSlot` parameter. The
ParallelTxProcessorWithWorldStateManager passes `isSystemSlot: i == 0 || i ==
_len - 1` (pre-execution and post-execution slots). For those slots, the
TracedAccessWorldState wraps stateProvider directly, so system pre/post-block
SSTOREs mutate the canonical state regardless of BAL recording. Tx slots
(1..n) keep the BAL-backed wrapping unchanged.

Sequential pyspec tests (which is what the Ethereum.Blockchain.Pyspec.Test
suite runs) are unaffected by the parallel slot change but benefit from the
BAL recording fix; the BlockAccessListHash check now passes for blocks that
previously failed solely on system pre-block storage encoding.

Note: a residual InvalidStateRoot mismatch remains on a subset of Amsterdam
pyspec tests (~70/360 ecrecover_weird_v + a similar slice of stMemoryStress).
These were previously masked by the BAL hash error firing first. The
remaining state divergence appears unrelated to BAL recording and is left
for follow-up.

* test(jsonrpc): update Eth_get_block_access_list_by_* fixtures for EIP-7928 v5.7.0

The Eth_get_block_access_list_by_hash and _by_number tests had hardcoded the
pre-v5.7.0 shape, recording the EIP-2935 BlockHashHistory system pre-block
SSTORE as a storageChanges entry with the post-value. v5.7.0 records system
pre-block SSTOREs as storageReads (slot key only).

Updated both expected JSON strings to match the new spec-compliant output.

* Revert "fix(bal): record system pre/post-block SSTOREs as reads per EIP-7928 v5.7.0"

This reverts commit 364f294.

* Revert "test(jsonrpc): update Eth_get_block_access_list_by_* fixtures for EIP-7928 v5.7.0"

This reverts commit 937ca5d.

* review: address PR #11362 feedback

- Rlp.DecodeArrayPool<T>: dispose partially-allocated ArrayPoolList<T>
  before wrapping IndexOutOfRangeException/ArgumentOutOfRangeException
  as RlpException so the rented buffer is returned to the pool.
- Rlp.ValueDecoderContext.DecodeUInt(): collapse case 0 to a single
  `return RlpHelpers.ThrowNonCanonicalInteger(Position)` (DoesNotReturn,
  uint) to match DecodeInt and drop the dead `return default`.
- BlockAccessListManager.GetPostExecution(): use uint.MaxValue literal
  to match the uint? balIndex parameter.
- AccountChanges.SlotChangesAtIndex: build the returned SortedList with
  PrestateAwareIndexComparer.Instance so a later prestate graft sorts
  first, matching the rest of the codebase.
- PrestateAwareIndexComparer xmldoc: clarify that decoded BALs also use
  this comparer (so later prestate grafting preserves order).

* test(pyspec): skip EELS bal@v5.7.0 ported_static fixtures with legacy state-test conversion bug

EELS's `from_state_test` conversion for ported_static tests omits the
EIP-2935 / EIP-4788 system pre-block SSTORE entries from the suggested
BAL, while a real client (and Nethermind) executes them — so every such
fixture's BlockAccessListHash diverges from what we compute. 91 such
tests were the entire residual pyspec failure set on the bal-devnet-4a
branch.

Detect the conversion via the legacy difficulty value 0x20000 baked
into the post-merge mixHash field — real prevRandao would never be
exactly 0x...020000 — and Assert.Ignore those tests.

Track upstream EELS fix; remove the guard once bal@>v5.7.0 ships with
the system pre-block SSTOREs included in the BAL.

* fix(pyspec test): drop ?-annotation in non-nullable file

CS8632: PyspecTestFixture.cs is not under `#nullable enable`, so the
`string?` introduced in 6fb6527 broke the build of every pyspec
job. `string` works the same here — the value already gets a null
check on the next line.

* test(pyspec): also skip blockchain_test_engine_from_state_test variant

The first guard only walked test.Blocks, which is null for engine
fixtures. Engine fixtures keep the same legacy 0x...020000 sentinel,
just on the JSON `prevRandao` field of the engine_newPayload params.
Walk EngineNewPayloads too.

* Update tests

* fix(eip-8037): pin cost_per_state_byte at static 1174 for bal-devnet-4

bal-devnet-4 keeps cost_per_state_byte static at 1174 (carried over from
bal-devnet-3), removing the per-block-gas-limit scaling formula that an
earlier draft of EIP-8037 specified. snøbal-devnet-4 fixtures encode the
same gas usage (63574) at 1M / 5M / 10M / 30M block gas limits, confirming
the value is now invariant across blockGasLimit.

Reduce CalculateCostPerStateByte to a direct return of CostPerStateByte
and drop the dynamic-quantization helpers and BitOperations dependency.
The blockGasLimit parameter is retained on call sites in case a future
devnet revisits per-block scaling. Update the EIP-8037 unit test to pin
the static behaviour rather than asserting quantized values.
- Replace SortedDictionary with Dictionary in BlockAccessList for O(1)
  account lookups on the EVM hot path (was O(log n) with 20-byte span
  comparisons). Sorted enumeration preserved for encoding/validation.
- Merge TryGetDelegation + GetCachedCodeInfo into a single call in
  InstructionCall, eliminating a redundant GetCodeHash per CALL opcode.
- Inline IsDeadAccount in EXTCODEHASH to avoid a second GetCodeHash
  call for the same address.
commit 3a3078f
Author: Marc Harvey-Hill <10379486+Marchhill@users.noreply.github.com>
Date:   Tue Apr 28 18:23:03 2026 +0200

    delete old tests

commit 95de888
Merge: 244c2c6 31a673a
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 18:08:12 2026 +0200

    Merge branch 'master' into engine-api-glamsterdam-cleanup

commit 244c2c6
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 18:05:32 2026 +0200

    fix lint

commit 9194b8f
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 18:04:19 2026 +0200

    remove tests

commit eedfbb7
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 17:57:46 2026 +0200

    revert formatting changes and zero hash stuff

commit dc71aa5
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 17:38:42 2026 +0200

    cleanups

commit ec2fce3
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 17:33:11 2026 +0200

    fixing

commit 90a1da8
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 17:15:58 2026 +0200

    remove test

commit 4451656
Author: MarekM25 <marekm2504@gmail.com>
Date:   Tue Apr 28 17:12:32 2026 +0200

    Cleanup mess

commit 2425748
Merge: 4a90460 a36154c
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 27 14:33:36 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 4a90460
Merge: 3549768 18d60a4
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 27 14:08:25 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 3549768
Merge: 30fcc36 b71c352
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 27 14:07:39 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 30fcc36
Merge: 540e4a2 ed6a354
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 27 13:09:48 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 540e4a2
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 04:52:49 2026 +0300

    minor fixes

commit 010547b
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 03:22:39 2026 +0300

    test fixes

commit 6fcc136
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 02:11:12 2026 +0300

    fixes

commit 0bccb06
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 01:52:02 2026 +0300

    test fix

commit 830c578
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 01:38:55 2026 +0300

    fix tests

commit 1f5fd2c
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 01:35:17 2026 +0300

    claude review again

commit d87491f
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 00:57:57 2026 +0300

    fixes

commit dd431e2
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sun Apr 26 00:51:52 2026 +0300

    fix tests

commit 71dc99f
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sat Apr 25 22:40:50 2026 +0300

    fixes

commit 9c03d12
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sat Apr 25 22:32:58 2026 +0300

    update per 786

commit a4c4f3c
Merge: fc49e7e 9cba44c
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Sat Apr 25 21:38:25 2026 +0300

    Merge remote-tracking branch 'origin/master' into engine-api-glamsterdam

commit fc49e7e
Merge: bca0c63 42c551f
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 20 16:14:33 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit bca0c63
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Mon Apr 20 16:14:10 2026 +0300

    remove unused import

commit 3d2c973
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Mon Apr 20 16:09:17 2026 +0300

    fix taiko tests

commit b30ef5c
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Mon Apr 20 15:34:35 2026 +0300

    fix tests

commit 6f131cd
Merge: a8c884a 0fe41f4
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 20 14:55:46 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit a8c884a
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Mon Apr 20 14:54:40 2026 +0300

    conflicts fix

commit 96eb912
Merge: 82a1f70 425a2a3
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Mon Apr 20 14:52:33 2026 +0300

    Merge remote-tracking branch 'origin/master' into engine-api-glamsterdam

    # Conflicts:
    #	src/Nethermind/Nethermind.Merge.Plugin/BlockTreeExtensions.cs
    #	src/Nethermind/Nethermind.Merge.Plugin/Handlers/ForkchoiceUpdatedHandler.cs
    #	src/Nethermind/Nethermind.Taiko/Rpc/TaikoForkchoiceUpdatedHandler.cs

commit 82a1f70
Merge: 7b6133f 436c65b
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Mon Apr 20 13:16:55 2026 +0300

    Merge remote-tracking branch 'origin/master' into engine-api-glamsterdam

commit 7b6133f
Merge: 2f068ec 02c2026
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Thu Apr 16 14:49:46 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 2f068ec
Merge: bdd7ee5 5464c8b
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Thu Apr 16 14:10:02 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit bdd7ee5
Merge: f09cb41 bfaaeb0
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Wed Apr 15 16:45:19 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit f09cb41
Merge: 370acdf cf03d18
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Wed Apr 15 16:24:17 2026 +0300

    Merge remote-tracking branch 'origin/master' into engine-api-glamsterdam

commit 370acdf
Merge: d1d0370 6cd0ea4
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Mon Apr 13 14:28:21 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit d1d0370
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Fri Apr 10 17:30:54 2026 +0300

    more fixes

commit c99d6cc
Merge: 4d0f215 a52cb90
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Fri Apr 10 17:24:06 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 4d0f215
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Fri Apr 10 17:23:42 2026 +0300

    claude review

commit cb6defe
Merge: af63142 0057bd8
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Fri Apr 10 12:34:37 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit af63142
Merge: 1b338f3 2f24891
Author: Stavros Vlachakis <89769224+svlachakis@users.noreply.github.com>
Date:   Fri Apr 10 12:08:36 2026 +0300

    Merge branch 'master' into engine-api-glamsterdam

commit 1b338f3
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Fri Apr 10 12:08:13 2026 +0300

    claude review

commit 619259b
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Thu Apr 9 18:32:59 2026 +0300

    taiko fix

commit d067793
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Thu Apr 9 18:19:14 2026 +0300

    cleaner design

commit 1e27be1
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Thu Apr 9 18:12:04 2026 +0300

    improve tests matching the specs

commit a77e182
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Thu Apr 9 18:04:58 2026 +0300

    improve test

commit 4ba069c
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Thu Apr 9 18:01:59 2026 +0300

    fix comment

commit fe6f9b0
Author: stavrosvl7 <stavrosvl7@gmail.com>
Date:   Thu Apr 9 17:53:57 2026 +0300

    engine api changes as per ethereum/execution-apis#770
                              ethereum/execution-apis#760
…halt

Top-level REVERT preserves gas_left, so the spilled portion of state_gas_used
(originally drawn from gas_left) is still in the user's pocket and must be
refunded to the reservoir alongside the reservoir-funded portion. Top-level
halt burns gas_left, so the spilled portion was paid out of gas the user can
no longer reclaim — only the reservoir-funded portion is restorable.

Splits RefundRevertedTopLevelStateGas into a revert path (full refund) and a
new RefundHaltedTopLevelStateGas (reservoir-only, spill discarded) and wires
the halt error sites to the latter.
The pyspec fixture archive ships three uncovered directories: a stray state-
tests transition fork, plus two test types we never wired in. Adds:

- CancunToPragueAtTime15kStateTests — fills a one-class gap in Tests.cs
- PyspecSyncBlockchainTestFixture + AmsterdamSyncBlockchainTests,
  OsakaSyncBlockchainTests — runs blockchain_tests_sync fixtures through the
  engine harness; the additional syncPayload field is left for a follow-up
- TestType.Transaction + TransactionTest/Json/Base + ConvertTransactionTests
  + PyspecTransactionTestFixture — decodes raw txbytes via Rlp.Decode and
  matches expected EEST exception tokens (TYPE_4_*) against the validator's
  ValidationResult.Error or RLP decode message; covers AmsterdamTxTests,
  OsakaTxTests, PragueTxTests fork directories

Bumps FlatDB pyspec chunking from 4 to 16 to match the regular workflow —
~860 tests/shard instead of ~3,437/shard, well under the 256/matrix cap and
the 20-minute job timeout.
@github-actions github-actions Bot added network optimism sync taiko related to the taiko alethia rollup xdc labels May 8, 2026
…hmarks

bal-devnet-6 (#11436) is now in master, so this brings everything from master
including the BAL devnet-6 PR plus subsequent master commits. Conflict
resolution:

  - Took ours for every BAL-touching file (decoders, manager, validator,
    BlockAccessListBased/TracedAccess world states, parallel-loop executor,
    cache pre-warmer, BAL tests). Our branch's ReadOnly*/Generated*/*AtIndex
    split diverges substantially from upstream's unified BlockAccessList, and
    upstream merging bal-devnet-6 into master pulled in the unified version.
  - Removed (again) BlockAccessListValidationIndex + tests, BlockAccessList,
    AccountChanges, SlotChanges, BlockAccessListItemCountTests, and
    BlockAccessListJournalTests: all reference the removed unified type.
  - Removed the now-broken Rlp.Encode(BlockAccessList) overload in Rlp.cs;
    callers use the generic Rlp.Encode<T> path for ReadOnly/Generated BALs.
  - Took theirs for TxReceiptConverter (upstream's boxing-free
    ForcedNumberConversion.Value refactor, already includes our EIP-7778/8037
    diagnostic fields) and ChainSpecBasedSpecProvider (comment-only addition).
  - Took ours for OpcodeBlockTracer (kept t_ prefix on the [ThreadStatic]
    field, our existing convention).
  - For .github/workflows/nethermind-tests-flat.yml took upstream's
    unconditional pre-build + 'dotnet build-server shutdown || true' approach
    (matches their CI peak-memory PR) while keeping our extra Cache NuGet
    packages step.
  - Updated EngineModuleTests.V6 ResultWrapper generic args from IEnumerable to
    IReadOnlyList to match upstream's API change.
@github-actions github-actions Bot removed configuration network sync optimism taiko related to the taiko alethia rollup labels May 11, 2026
@Marchhill
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026

Claude finished @Marchhill's task in 4m 46s —— View job


PR Review: Bal devnet 6 benchmarks

Tasks:

  • Post initial comment
  • Load repo rules
  • Read key changed files
  • Review new BAL type hierarchy
  • Review parallel execution path
  • Review serialization/decoding changes
  • Review benchmark results
  • Post final review

Summary

Third review pass. All High/Medium/Low findings from the previous two reviews (May 7, May 8) were addressed in 31f74f2. This pass looks fresh at the code on the current branch.

The type hierarchy split, binary-search hot paths, double-checked sort, prestate gate mechanics, and parallel-loop scheduling are all sound. The gas validation two-pass algorithm, EIP-8037 inclusion checks, preExecutionDoneTcs fault propagation, and _inUse/_perTxBal publication-barrier model are correct.


Benchmark Results (from auto-run)

Metric Superblocks (PR vs master) Realblocks
AVG -1.14% ✅ -0.17% ✅
MEDIAN +2.71% ⚠️ flat
P90 +0.95% ⚠️ flat
P99 -20.81% ✅ -3.42% ✅

The large P99 win on superblocks (tail latency cut by ~21%) is the intended optimization target and a real improvement. The MEDIAN/P90 regressions are small (2-3%) and may be within run-to-run noise, but @kamilchodola flagged this as worse than PR #11436's baseline. This deserves investigation before merge — at a minimum, an explicit statement that the median/P90 regression is accepted in trade for the P99 win.


Findings

Medium

1. Per-SLOAD heap allocation in ReadOnlySlotChanges.Get() (inline comment)
return [.. tmp.WithoutLeadingZeros()] allocates a new byte[] on every call. In the parallel path every SLOAD goes through this. For a superblock with thousands of storage reads this is continuous allocation pressure on the GC during parallel execution. The caller chain only needs a ReadOnlySpan<byte>, so the backing store could be a caller-supplied stack buffer or a field-backed buffer per-worker.

2. Per-SLOAD heap allocation in TracedAccessWorldState.GetInternal (parallel mode) (inline comment)
MemoryMarshal.CreateReadOnlySpan(...).ToArray() allocates 32 bytes per intra-tx storage read on the parallel path (line 370). The allocation is forced because value is a local EvmWord. A field-backed 32-byte buffer on the worker's TracedAccessWorldState (safe because each worker owns its instance) would eliminate these allocations entirely.

Low

3. SortedDictionary for per-tx account tracking (inline comment)
BlockAccessListAtIndex._accountChanges is SortedDictionary<Address, AccountChangesAtIndex>. Every AddBalanceChange, AddNonceChange, AddStorageChange, etc. call pays O(log n) for the sort. The sorted order is only consumed at merge time by GeneratedAccountChanges which imposes its own sorted structure; BlockAccessListAtIndex itself never needs to iterate in address order during tx execution. A Dictionary + sort-on-demand at merge time would halve the per-change overhead.


Correctness verification (no issues)

  • EnsureSorted double-checked lock: correct. Plain _sortedDirty = trueSignalPrestateLoaded() (TrySetResult release fence) → worker GetAwaiter().GetResult() (acquire fence) → Volatile.Read(ref _sortedDirty) guarantees visibility.
  • _perTxBal publication ordering: plain array write by worker → gasResults[i].SetResult() (release) → validator GetAwaiter().GetResult() (acquire) → plain array read by validator. No Volatile needed; model is sound.
  • preExecutionDoneTcs fault paths: TrySetException + TrySetCanceled safety nets on all error paths; validator never hangs.
  • _lastLoadedBal TOCTOU guard: written only from slot 0 before parallel tx workers start; no races.
  • ValidateBlockAccessList two-pass correctness: generates→suggested and suggested→generates cover missing and surplus cases; prestate-only entries are invisible to per-index queries.
  • EIP-8037 inclusion check (2D gas): CheckPerTxInclusion called before gas accounting in both paths; cumulative worst-case is checked correctly.

Comment thread src/Nethermind/Nethermind.Core/BlockAccessLists/ReadOnlySlotChanges.cs Outdated
Comment thread src/Nethermind/Nethermind.State/TracedAccessWorldState.cs Outdated
Comment thread src/Nethermind/Nethermind.Core/BlockAccessLists/BlockAccessListAtIndex.cs Outdated
Three review comments on the parallel BAL hot path:

  - ReadOnlySlotChanges.Get(uint blockAccessIndex) used to return a fresh
    leading-zero-stripped byte[] every call (one alloc per SLOAD through
    BlockAccessListBasedWorldState). Now takes a caller-owned Span<byte>
    buffer and returns a slice. The single caller
    (BlockAccessListBasedWorldState.GetAtCurrentIndex) owns a 32-byte
    instance scratch buffer — one per parallel worker, single-threaded use.

  - TracedAccessWorldState.GetInternal did the same thing for intra-tx
    SLOADs: MemoryMarshal.CreateReadOnlySpan(...).ToArray() allocated a
    new byte[32] per call. Replaced with a 32-byte instance scratch buffer
    on TracedAccessWorldState (also rented per-worker from the pool).

  - BlockAccessListAtIndex._accountChanges was a SortedDictionary keyed on
    Address. The sorted property is only consumed once at merge time, by
    GeneratedBlockAccessList which has its own SortedDictionary doing the
    re-sort anyway. Swapped for a plain Dictionary so AddBalanceChange /
    AddNonceChange / AddStorageChange / AddAccountRead / etc. are O(1)
    instead of O(log n) on the per-tx hot path.

Test update: AccountChangesPrestateTests.Slot_get_returns_prestate_value
adopts the new ReadOnlySlotChanges.Get(uint, Span<byte>) signature.
@Marchhill
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 11, 2026

Claude finished @Marchhill's task in 7m 12s —— View job


PR Review: Bal devnet 6 benchmarks (Pass 4)

Tasks:

  • Post initial comment
  • Load repo rules
  • Read key changed files since last review
  • Assess May 11 findings resolution status
  • Look for new issues
  • Post final review

Summary

Fourth review pass. The two new commits since the May 11 review (0852865 perf(bal): cut per-SLOAD allocations and SortedDictionary insert cost, 81c6f3a fix lint) address all three Medium findings from that review. No new Critical, High, or Medium issues found.


May 11 Findings — Resolution Status

# Severity Finding Status
1 Medium Per-SLOAD heap allocation in ReadOnlySlotChanges.Get() ✅ Resolved — Get(uint, Span<byte>) now takes a caller-supplied buffer; per-instance _scratchStorage byte[32] field on both BlockAccessListBasedWorldState and TracedAccessWorldState
2 Medium 32-byte heap allocation per intra-tx SLOAD in TracedAccessWorldState.GetInternal ✅ Resolved — CopyTo(_scratchStorage) replaces .ToArray(); buffer verified safe (each TxProcessorWithWorldState owns its own BlockAccessListBasedWorldState — not shared across workers)
3 Low SortedDictionary for per-tx account tracking ✅ Resolved — BlockAccessListAtIndex._accountChanges is now Dictionary<Address, AccountChangesAtIndex>; sorted merge delegated to GeneratedAccountChanges._accountChanges at merge time

Correctness verification (no issues)

  • _scratchStorage thread-safety: Confirmed. ParallelTxProcessorWithWorldStateManager constructs each pooled TxProcessorWithWorldState with its own new BlockAccessListBasedWorldState(...) (line 806 of BlockAccessListManager.cs). Workers rent exclusive instances; no concurrent write to _scratchStorage is possible.
  • StorageChange byte-order round-trip: new StorageChange(index, UInt256) calls ByteSwap() to convert host-endian UInt256 to big-endian EvmWord wire form. ReadOnlySlotChanges.Get and TracedAccessWorldState.GetInternal reinterpret EvmWord as bytes and return big-endian. Round-trip is correct.
  • Prestate publication ordering: Plain writes to _balanceChanges/_nonceChanges/_codeChanges/Changes on the loader → SignalPrestateLoaded() (TrySetResult() release fence) → workers' WaitForPrestate() (GetAwaiter().GetResult() acquire fence). Updated arrays are visible to workers post-gate. Correct.
  • _inUse/_perTxBal publication: Each balIndex slot has exactly one writer; gasResults[i].SetResult (release) → validator GetAwaiter().GetResult() (acquire) → plain array read in MergeAndReturnBal. No Volatile needed. Correct.
  • AddStorageChange in sequential mode: _generatingBlockAccessList.AddStorageChange(storageCell, new(oldValue, true), new(newValue, true)) constructs UInt256 from big-endian spans, then StorageChange applies ByteSwap(). Read-back reinterprets as big-endian bytes. Correct.

Remaining pre-existing Low findings (out of scope per author)

  • ReadOnlyAccountChanges is a misleading name for a class with several mutation methods (LoadPreStateBalance, EnablePrestateGate, etc.). The class-level <summary> now explains the constrained-mutation contract, which partially mitigates this.
  • // todo: maybe split into smaller classes in IBlockAccessListManager.cs (line 16).

Benchmark note

The May 8 benchmark run (before the allocation-optimization commits) showed MEDIAN +2.71% and P90 +0.95% regressions on superblocks alongside the large P99 −20.81% win. Commit 0852865 targets exactly the per-SLOAD allocation and per-insert SortedDictionary costs that would pressure median/P90 under concurrency. A fresh benchmark run is recommended to confirm those metrics recover before merge.

Comment thread src/Nethermind/Nethermind.State/TracedAccessWorldState.cs
Marchhill added 3 commits May 12, 2026 10:35
The GasValidationResultSlot type from master already lives in the tree but
wasn't being used — every IncrementalValidation signature carried a verbose
TaskCompletionSource<(long BlockGasUsed, long BlockStateGasUsed,
IntrinsicGas<EthereumGasPolicy> IntrinsicGas, InvalidBlockException? Exception)>[]
instead.

Swap to GasValidationResultSlot[] across IBlockAccessListManager and its two
implementations, the parallel executor, and the tests. Worker sites become
TrySetResult(new GasValidationResult(...)) and the validator destructures
gasResults[j].GetResult() into the typed record.

Functionally equivalent — GasValidationResultSlot.GetResult blocks via
Monitor.Wait the same way TaskCompletionSource.Task.GetAwaiter().GetResult
did, and TrySetCanceled likewise dispatches a TaskCanceledException via
ExceptionDispatchInfo. The Task preExecutionTask parameter is kept since
our parallel-loop has a separate pre-execution iteration.
Renamed Spec_pr2703_* tests to describe the scenario they pin, and rephrased
docstrings / inline comments / failure messages that anchored to the
upstream PR number. Reader no longer needs to open execution-specs PR 2703
to know what each test covers.

  Eip8037BlockGasIntegrationTests:
    Spec_pr2703_boundary_state_exact_fit_accepts
      -> State_dimension_exact_fit_at_block_gas_limit_accepts
    Spec_pr2703_boundary_state_exceeded_by_one_rejects
      -> State_dimension_one_over_block_gas_limit_rejects
    Spec_pr2703_creation_tx_regular_check_actual_usage_modest_accepts
      -> Creation_tx_intrinsic_state_excluded_from_regular_worst_case_accepts
    Spec_pr2703_single_tx_state_check_exceeds_block_limit_rejects
      -> Single_tx_state_worst_case_over_block_gas_limit_rejects_at_inclusion
    Spec_pr2703_creation_tx_state_check_exceeded_rejects
      -> Creation_tx_state_worst_case_over_remaining_state_budget_rejects_at_inclusion
    Spec_pr2703_eip7825_cap_with_modest_actual_gas_accepts
      -> Regular_worst_case_capped_by_eip7825_with_modest_post_exec_gas_accepts

  Eip8037BlockGasInclusionCheckTests + BlockProcessorTests: rephrased "PR 2703
  test_..." section comments into prose describing what the case pins.

Production-code spec anchors (BlockAccessListManager, Eip8037BlockGasInclusionCheck,
EthereumGasPolicy) keep their "execution-specs PR 2703" references — they were
deliberately added as spec citations and are useful for reviewers tracking the
rule back to upstream.
…emental validation

Ports upstream's column-oriented validation index (lost to merge-conflict
"take ours" resolution when bal-devnet-6 merged into master, since master's
implementation was bound to the unified BlockAccessList / AccountChanges
types we'd already split into ReadOnly*/Generated*/*AtIndex on this branch).

The index flattens the suggested BAL into 4 column-oriented lanes
(balance/nonce/code/storage) keyed by (accountOrdinal, key) at the row of
each tx index. ChangesEqual(other, index) then compares two indexes
row-by-row via ReadOnlySpan<T>.SequenceEqual — no per-account dict lookups,
no merge-walks — which is what ValidateBlockAccessList runs once per tx.

Wiring (BlockAccessListManager):
  - PrepareForProcessing builds the suggested index once and pairs a
    mutable generated index laid out identically. Also computes the
    suggested chargeable-storage-reads tally once for the fast-path
    surplus-reads gas check.
  - MergeAndReturnBal grows an optional Action<BlockAccessListAtIndex>
    callback; the parallel impl invokes it with the live slice between
    target.Merge() and pool-return, the sequential impl with the slice
    held on the worldstate. The manager-side hook (RegisterGeneratedSlice)
    pushes the slice's rows into the mutable index and rolls the
    generated-side read counter forward.
  - ValidateBlockAccessList tries the index first: a single ChangesEqual
    call plus the precomputed surplus-reads check. On mismatch (or before
    the index is populated) it falls through to the existing streaming
    walk that produces precise error diagnostics.

Adaptations from master's version:
  - Build / Count / Fill bind to ReadOnlyBlockAccessList +
    ReadOnlyAccountChanges, and read change arrays directly (no
    ChangeSet.BlockAccessChanges indirection).
  - Add(BlockAccessList) replaced with Add(BlockAccessListAtIndex slice):
    our generated rows arrive as per-tx slices, one push per tx, each
    contributing balance/nonce/code/storage at its own .Index.
  - StorageLane values are EvmWord (matches StorageChange.Value's wire
    type on this branch).

Tests: 12 new BlockAccessListValidationIndexTests covering exact match,
order-insensitive match (by address, by slot), large-row sort path,
overflow isolation, and mismatch detection on each lane. Suite passes
along with all existing BAL tests (244 total).
@Marchhill
Copy link
Copy Markdown
Contributor Author

superseded by #11573

@Marchhill Marchhill closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants