Feat/evm enginetest by kamilchodola · Pull Request #11399 · NethermindEth/nethermind

kamilchodola · 2026-04-29T08:36:12Z

Fixes Closes Resolves #

Please choose one of the keywords above to refer to the issue this PR solves followed by the issue number (e.g. Fixes #000). If no issue number, remove the line. Also, remove everything marked optional that is not applicable. Remove this note after reading.

Changes

List the changes

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Optional. Remove if not applicable.

Documentation

Requires documentation update

Yes
No

If yes, link the PR to the docs update or the issue with the details labeled docs. Remove if not applicable.

Requires explanation in Release Notes

Yes
No

If yes, fill in the details here. Remove if not applicable.

Remarks

Optional. Remove if not applicable.

…ests Parse all fixture files upfront and flatten to individual test cases before distributing to workers, instead of parallelizing by file. Nethermind full suite: 2m48s with 8 workers (was 2m58s file-level).

- Pre-warm thread pool capped at CPU count (fixes starvation at w>=4) - Parallel file parsing via Parallel.For - UTF-8 byte span deserialization (File.ReadAllBytes + ReadOnlySpan<byte>) - Pre-allocated results array instead of ConcurrentBag + sort - Lightweight BlockchainTestsRunner constructor for parallel path - Compiled Regex for filter matching - Pre-initialize KZG once upfront Prague engine tests: 24.1s → 18.5s at w=8 (23% faster). Full suite: 2m48s → 1m45s at w=4 (37% faster).

…r field, hide timeInMs

…irect - Add ForkName to BlockchainTest, use raw network string (e.g. CancunToPragueAtTime15k) - Add lastBlockHash, lastPayloadStatus to JSON output for block/engine tests - Remove stateRoot from block/engine results (only state tests) - Report validation error in `error` even on pass for negative tests: engine tests via PayloadStatus.ValidationError, block tests via BlockRemoved event and ValidateSuggestedBlock, state tests via TransactionResult.ErrorDescription - Suggest invalid blocks (blockHeader=null) to actually test rejection - Enables EELS consume direct exception mapping for all test types

… hardcoded string

Resolve 3 conflicts: - BlockchainTestBase.cs: take master's refactored helper methods and comments - JsonToEthereumTest.cs: keep PR's full fixture key as name - StateTestRunner.cs: merge _suppressOutput check with master's explicit types Also fix all var usages to explicit types and target-typed new (M5).

- H2: use pattern match `is not BlockchainTest test` to prevent NRE - L1: use `is FailedToLoadTest` instead of `as ... is not null` - M2: return exit code 1 on usage error - M3: remove dead ParseBlockchainTestFile method - M4: replace int.Parse with int.TryParse + descriptive FormatException - M5: replace all var with explicit types (already done in merge) - L2: use JsonSerializer.Serialize for error string in EIP-3155 JSON - L3: compile filter regex once in constructor - L4: add guard for null _testsSource in RunTestsAsync

- Fix lastStatus/lastValidationError never updated in RunNewPayloads - Fix traceStack/excludeStack semantic inversion for state tests - Replace hardcoded developer path in nethtest script with $SCRIPT_DIR - Use Regex.IsMatch instead of Regex.Match(...).Success in ParseStateTestFile

Adds a workflow_dispatch workflow that: - Downloads EEST fixtures from ethereum/execution-spec-tests releases - Builds and runs nethtest with configurable test type, workers, filter - Parses JSON results and posts pass/fail summary to GITHUB_STEP_SUMMARY - Includes copy-pastable local repro command in the summary - Uploads results JSON as artifact

- Add push trigger on feat/evm-enginetest so workflow runs on push - Use env vars with fallback defaults so push trigger works without inputs - Include copy-pastable local repro command in GITHUB_STEP_SUMMARY

The EEST archive uses 'blockchain_tests_engine' (with underscore), not 'blockchain_test_engine'. This caused the fixture path resolution to fail on CI.

launchSettings.json overrides CLI args with hardcoded test paths, causing the runner to ignore the actual --engineTest/--input args.

The dotnet run process was being killed silently on CI (likely OOM with 4 workers × 2844 tests). The | tee pipeline with pipefail caused the step to report "operation cancelled" with no diagnostic output. Now: redirect stdout to file, stderr to log, check exit code explicitly, and dump stderr if something goes wrong.

- Use bal@v5.6.0 with fixtures_bal.tar.gz (has BAL/Amsterdam tests) - Let stderr flow to console for live progress visibility - Only redirect stdout (JSON) to file

- Wrap RunTest in try/catch so individual test failures don't crash the whole runner — report them as failed results with the exception message - Improve GITHUB_STEP_SUMMARY: group failures by error type with counts, show failed test names in collapsible details section

When running with suppressOutput (CI/parallel mode), each test result is printed to stderr as "PASS test_name" or "FAIL test_name" so CI logs show live progress instead of silence until the JSON blob appears.

ParallelExecution causes unhandled exceptions from background threads in BlockAccessListManager that crash the process. Disable by default for the CLI test runner; NUnit tests that explicitly override to true are unaffected.

- PASS/FAIL/EXCEPTION prefixes now use ANSI colors (green/red) on stderr - CI workflow captures stderr to file alongside console streaming - If process crashes before writing JSON, Parse results falls back to parsing PASS/FAIL lines from stderr log for the summary - Summary header shows "(process crashed)" when using stderr fallback

- Process substitution 2> >(tee ...) causes bash to die on SIGTERM, skipping all subsequent steps including if: always() ones - Use continue-on-error: true on Run nethtest so crashes don't block - Add separate "Show stderr output" step to dump the log - Strip ANSI codes in stderr fallback parsing

Separate steps can't survive runner SIGTERM (exit 143 from OOM). Now write_summary runs as a trap EXIT handler in the same shell, so the summary is written even if the dotnet process is killed. Also reduce workers to 4 (8 causes OOM on ubuntu-latest).

- Use mkfifo + tee to stream stderr to both console and log file (process substitution caused bash-level crashes on SIGTERM) - Reduce default workers from 4 to 2 to avoid OOM on ubuntu-latest (4 workers hit SIGTERM/exit 143 after ~55s)

The crash (exit 143/SIGTERM) was caused by unhandled exceptions from BlockAccessListManager background threads when ParallelExecution=true, not OOM. Override ParallelExecutionOverride to false in BlockchainTestsRunner only — NUnit tests are unaffected. Restore workers to 4 since memory was not the issue.

Block validator rejections and exception stack traces from internal components were interleaved with PASS/FAIL lines on stderr, making output unreadable — especially with multiple workers. Add ComponentLogManagerOverride to BlockchainTestBase. The CLI runner sets it to LogLevel.Error when in suppressOutput mode, so only real errors appear. PASS/FAIL/EXCEPTION lines with their error messages are still printed. NUnit tests are unaffected (override is null).

…surface

- Strip ANSI codes and extract PASS/FAIL/EXCEPTION lines more robustly - Group errors by text after " — " separator - Show ALL failed tests in details section (no head -100 limit) - Include error message in JSON-based failure list too

claude · 2026-04-29T08:36:40Z

Claude finished @kamilchodola's task in 4m 57s —— View job

PR Review — feat/evm-enginetest

This PR adds EEST blockchain_tests_engine fixture support: a new GitHub Actions workflow, a nethtest launcher script, and C# infrastructure to route engine API calls through IJsonRpcService for realistic deserialization. The approach is sound and the engine-test plumbing (RunNewPayloads, AssertPayloadStatus, validation-error mapping) is well-structured.

Findings

High

1. Workflow push trigger is feature-branch-specific (line 8)
The push trigger fires only on feat/evm-enginetest. After merging to master, the workflow will never run automatically unless updated. Decide before merge whether it should run on master pushes, PRs, or be manual-only.

2. Verify GitHub Actions version tags (line 67)
actions/checkout@v6, actions/upload-artifact@v7 (line 264), actions/setup-dotnet@v5, and actions/cache@v5 are all pinned to versions that may not exist. As of my knowledge cutoff the latest stable versions were @v4 for all of these. If the tags don't resolve, every run will fail immediately with a 404. Please verify each action's latest available tag.

Medium

3. NewPayloadParamCounts static initializer can produce opaque TypeInitializationException (BlockchainTestBase.cs line 360)
The reflection-based dictionary is built at class-load time. A NotSupportedException thrown during static initialization wraps the real cause in a TypeInitializationException, making future API mismatches hard to diagnose. Use a named factory method or defer building until first use.

4. Two distinct EEST exceptions mapped to the same regex (BlockchainTestBase.cs ~line 493)
BlockException.INVALID_DEPOSIT_EVENT_LAYOUT and BlockException.SYSTEM_CONTRACT_CALL_FAILED both match "InvalidBlockLevelAccessList: Suggested block-level access list missing account changes". AssertValidationError will accept either expected exception for a single Nethermind error string, masking regressions where the wrong underlying failure fires. If Nethermind cannot distinguish these cases, add an explicit comment. Fix this →

5. nethtest script missing --no-launch-profile (nethtest line 3)
The workflow always adds --no-launch-profile but the convenience script doesn't, meaning local runs can behave differently from CI when launchSettings.json is present.

6. Block tests without --jsonout produce no stdout output (Program.cs line 143)
When isBlockTest && !jsonOutput, results are computed but neither written to stdout (JSON is gated on forceJson) nor to the console (suppressOutput: true is always passed). Users running nethtest --blockTest ... locally without --jsonout see per-test PASS/FAIL on stderr but no summary. Consider always serializing results or documenting the requirement.

Low

7. LINQ in CollectFiles (Program.cs line 169)
Where + OrderBy in CollectFiles. Repo style prefers for/foreach. Non-hot path so no functional impact.

8. IsEip7843Enabled changed to read-only delegation (OverridableReleaseSpec.cs line 128)
Changed from a settable initialized property to => spec.IsEip7843Enabled. Any existing test code that initializes this override will silently stop having effect (or fail to compile). If intentional, a comment would help.

9. --workers $WORKERS unquoted (workflow line 243)
Safe for numeric input but should be quoted as "$WORKERS" for robustness.

10. No timeout on curl download (workflow line 98)
A slow GitHub release server will hang the step indefinitely. Consider --max-time 300 or similar.

What's working well

Engine test routing through IJsonRpcService is the right architectural choice — it exercises the full deserialization path, not a synthetic shortcut.
ValidationErrorSubstringMappings + ValidationErrorRegexMappings approach is clean and extensible.
The write_summary trap-on-EXIT pattern in the workflow is resilient to test runner crashes.
EthereumTestResult.LastPayloadStatus and LastBlockHash additions are clean structured-output improvements.
The genesisUsesTargetFork / EIP-7928 logic is well-commented.

claude · 2026-04-29T08:39:28Z

+  cancel-in-progress: true
+
+on:
+  push:


High — Workflow trigger is branch-specific and will stop running after merge

The push trigger is hard-coded to feat/evm-enginetest. After this PR merges to master, the workflow will never run automatically. Decide on the post-merge trigger strategy (e.g. push: branches: [master], a pull_request trigger, or workflow_dispatch-only) before merging.

Suggested change

push:

push:

branches: [master]

or remove the push trigger entirely if the workflow is intentionally manual-only.

claude · 2026-04-29T08:39:35Z

+    timeout-minutes: 60
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v6


High — Verify GitHub Actions version numbers exist

actions/checkout@v6 and actions/upload-artifact@v7 (line 264) are pinned to versions that may not exist. As of my knowledge cutoff the latest stable tags were @v4 for both. The same question applies to actions/setup-dotnet@v5 (line 72) and actions/cache@v5 (line 75). If these tags don't exist, the workflow will fail immediately on every run with a 404.

Run gh release list -R actions/checkout | head -5 etc. to verify, or pin to known-stable versions.

claude · 2026-04-29T08:39:46Z

    private static readonly Dictionary<int, int> NewPayloadParamCounts = Enumerable
        .Range(1, EngineApiVersions.NewPayload.Latest)
        .ToDictionary(v => v, v => (typeof(IEngineRpcModule).GetMethod($"engine_newPayloadV{v}")
            ?? throw new NotSupportedException($"engine_newPayloadV{v} not found on IEngineRpcModule")).GetParameters().Length);


Medium — Reflection-based static initializer can produce opaque TypeInitializationException

If engine_newPayloadV{v} is not found on IEngineRpcModule (e.g. after an API refactor), the throw new NotSupportedException(...) fires during the static initializer, which wraps it in a TypeInitializationException. The real cause is buried in the inner exception and does not surface as a clear test failure — it crashes any test that touches this class with a confusing error.

Consider making this lazy (built on first use) or at least wrapping the static init in a try-catch that rethrows with context:

private static readonly Dictionary<int, int> NewPayloadParamCounts = BuildNewPayloadParamCounts(); private static Dictionary<int, int> BuildNewPayloadParamCounts() { var result = new Dictionary<int, int>(); for (int v = 1; v <= EngineApiVersions.NewPayload.Latest; v++) { MethodInfo? method = typeof(IEngineRpcModule).GetMethod($"engine_newPayloadV{v}"); if (method is null) throw new InvalidOperationException($"engine_newPayloadV{v} not found on IEngineRpcModule. Update EngineApiVersions.NewPayload.Latest."); result[v] = method.GetParameters().Length; } return result; }

claude · 2026-04-29T08:40:23Z

@@ -0,0 +1,3 @@
+#!/bin/bash
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+exec dotnet run --no-build -c Release --project "$SCRIPT_DIR/src/Nethermind/Nethermind.Test.Runner/Nethermind.Test.Runner.csproj" -- "$@"


Medium — Script is missing --no-launch-profile that the workflow uses

The workflow runs dotnet run --no-launch-profile -c Release ... but this script uses dotnet run --no-build -c Release ... without --no-launch-profile. Without that flag, dotnet run picks up the project's launchSettings.json if present, which can set environment variables (e.g. ASPNETCORE_ENVIRONMENT, ports) that differ from CI. Add --no-launch-profile for consistent behavior:

Suggested change

exec dotnet run --no-build -c Release --project "$SCRIPT_DIR/src/Nethermind/Nethermind.Test.Runner/Nethermind.Test.Runner.csproj" -- "$@"

exec dotnet run --no-build --no-launch-profile -c Release --project "$SCRIPT_DIR/src/Nethermind/Nethermind.Test.Runner/Nethermind.Test.Runner.csproj" -- "$@"

claude · 2026-04-29T08:40:33Z

-                    !parseResult.GetValue(Options.ExcludeMemory),
-                    parseResult.GetValue(Options.ExcludeStack)));
+                bool forceJson = isEngineTest || jsonOutput;
+                List<EthereumTestResult> results = await RunBlockTestFiles(files, filter, chainId, trace, traceMemory, excludeStack, forceJson, workers);


Medium — Block tests without --jsonout silently produce no stdout output

forceJson = isEngineTest || jsonOutput. When isBlockTest && !jsonOutput, forceJson = false, so the results computed by RunBlockTestFiles are discarded — nothing is written to stdout (the JSON serializer at line 145 is not called, and suppressOutput: true is always passed so per-test console output is suppressed too). A user running nethtest --blockTest --input ./fixtures with no --jsonout flag gets zero output beyond the per-test PASS/FAIL on stderr, with no summary.

Consider either always serializing block test results or explicitly documenting that --jsonout is required for block tests. The workflow always passes --jsonout, but local users will be confused.

claude · 2026-04-29T08:40:40Z

+
+        if (Directory.Exists(path))
+            return Directory.GetFiles(path, "*.json", SearchOption.AllDirectories)
+                .Where(f => !f.Contains("/.meta/") && !f.Contains("\\.meta\\"))


Low — LINQ used in non-hot path (coding style preference)

Both Where and OrderBy are LINQ. Per the repo coding style, for/foreach is preferred over LINQ. Since this is called once at startup (not a hot path), the impact is negligible, but worth aligning with the project convention.

Suggested change

.Where(f => !f.Contains("/.meta/") && !f.Contains("\\.meta\\"))

if (Directory.Exists(path))

{

List<string> result = [];

foreach (string f in Directory.GetFiles(path, "*.json", SearchOption.AllDirectories))

{

if (!f.Contains("/.meta/") && !f.Contains("\\.meta\\"))

result.Add(f);

}

result.Sort(StringComparer.Ordinal);

return result;

}

spencer-tb and others added 30 commits April 4, 2026 12:59

nethtest: add --stateTest, --engineTest, --jsonout, --workers flags

a5d2ab4

test runner: add test-case level parallelism for engine/block/state t…

42f9aa3

…ests Parse all fixture files upfront and flatten to individual test cases before distributing to workers, instead of parallelizing by file. Nethermind full suite: 2m48s with 8 workers (was 2m58s file-level).

test runner: fix statetest JSON schema — remove name suffix, add erro…

7b72dfc

…r field, hide timeInMs

test runner: use full fixture key as test name

31b80d3

test runner: add EIP-3155 trace output to stderr for statetest

e9abae3

revert: restore test-case parallelism and byte-based loading

c4575f0

fix: restore blocktest RunTestsAsync path, add --jsonout to all formats

21a13a1

fix: OverridableReleaseSpec.Name delegates to wrapped spec instead of…

afcb9de

… hardcoded string

ci: add push trigger for feat/evm-enginetest, use env vars for defaults

6b5c209

- Add push trigger on feat/evm-enginetest so workflow runs on push - Use env vars with fallback defaults so push trigger works without inputs - Include copy-pastable local repro command in GITHUB_STEP_SUMMARY

fix: correct EEST fixture directory name (blockchain_tests_engine)

8813d27

The EEST archive uses 'blockchain_tests_engine' (with underscore), not 'blockchain_test_engine'. This caused the fixture path resolution to fail on CI.

fix: add --no-launch-profile to dotnet run in workflow

d1dc2e1

launchSettings.json overrides CLI args with hardcoded test paths, causing the runner to ignore the actual --engineTest/--input args.

ci: switch to bal@v5.6.0 fixtures, stream stderr to console

fc39ae2

- Use bal@v5.6.0 with fixtures_bal.tar.gz (has BAL/Amsterdam tests) - Let stderr flow to console for live progress visibility - Only redirect stdout (JSON) to file

feat: stream per-test PASS/FAIL to stderr for live progress

4f818d1

When running with suppressOutput (CI/parallel mode), each test result is printed to stderr as "PASS test_name" or "FAIL test_name" so CI logs show live progress instead of silence until the JSON blob appears.

Merge remote-tracking branch 'origin/master' into feat/evm-enginetest

1feab79

fix: disable BAL parallel execution in test runner

bd90690

ParallelExecution causes unhandled exceptions from background threads in BlockAccessListManager that crash the process. Disable by default for the CLI test runner; NUnit tests that explicitly override to true are unaffected.

revert: restore original ParallelExecution logic in BlockchainTestBase

c0e479b

ci: bump default workers to 8

ca18b10

kamilchodola added 3 commits April 29, 2026 01:25

revert: re-enable ParallelExecution — the crash is a real BAL bug to …

f23e68d

…surface

github-actions Bot added devops new feature labels Apr 29, 2026

claude Bot reviewed Apr 29, 2026

View reviewed changes

kamilchodola added 2 commits April 29, 2026 10:40

Merge branch 'master' into feat/evm-enginetest

578e2b7

Merge branch 'master' into feat/evm-enginetest

3421ed8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/evm enginetest#11399

Feat/evm enginetest#11399
kamilchodola wants to merge 35 commits intomasterfrom
feat/evm-enginetest

kamilchodola commented Apr 29, 2026

Uh oh!

claude Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	exec dotnet run --no-build -c Release --project "$SCRIPT_DIR/src/Nethermind/Nethermind.Test.Runner/Nethermind.Test.Runner.csproj" -- "$@"
	exec dotnet run --no-build --no-launch-profile -c Release --project "$SCRIPT_DIR/src/Nethermind/Nethermind.Test.Runner/Nethermind.Test.Runner.csproj" -- "$@"

-                .Where(f => !f.Contains("/.meta/") && !f.Contains("\\.meta\\"))
+        if (Directory.Exists(path))
+        {
+            List<string> result = [];
+            foreach (string f in Directory.GetFiles(path, "*.json", SearchOption.AllDirectories))
+            {
+                if (!f.Contains("/.meta/") && !f.Contains("\\.meta\\"))
+                    result.Add(f);
+            }
+            result.Sort(StringComparer.Ordinal);
+            return result;
+        }

Conversation

kamilchodola commented Apr 29, 2026

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Remarks

Uh oh!

claude Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — feat/evm-enginetest

Findings

High

Medium

Low

What's working well

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 29, 2026 •

edited

Loading