feat(tests): stateful test filling for benchmark tests by fselmo · Pull Request #2637 · ethereum/execution-specs

fselmo · 2026-04-08T22:41:34Z

🗒️ Description

Adds a new fill-stateful CLI command that generates BlockchainEngineStatefulFixture files by running benchmark tests against a live EL client via testing_buildBlockV1.

Key points:

Connects to a running client loaded with a network snapshot (e.g. perfnet / bloatnet)
Records engine API payloads per test, separated into setup and execution phases
Resets chain state between tests via debug_setHead
Funds test accounts via CL withdrawals — no pre-funded seed key required
Outputs fixture JSON + a pre_run/global_setup.json for session-level setup (factory deploy, etc.)

The generated fixtures are consumed by benchmarkoor to replay benchmarks against any EL client from the same snapshot. Replaces the gas-benchmarks MITMProxy-based approach.

Run against a running client that has testing, eth, debug available and has testing_buildBlockV1 support...

uv run fill-stateful \
    --clean \
    --engine-jwt-secret-file /path/to/jwt \
    --fork Osaka \
    tests/benchmark/stateful/ \
    -k ...

Optional:

--rpc-endpoint — defaults to http://localhost:8545
--engine-endpoint — derived from RPC endpoint with port 8551
--chain-id — auto-detected from the client
--rpc-seed-key — random key generated and funded via CL withdrawal
--output — defaults to ./fixtures

🔗 Related Issues or PRs

#2560

✅ Checklist

All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
just static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Cute animal picture 🙀

Cute Animal Picture

codecov · 2026-04-08T22:51:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.24%. Comparing base (8ae9dcc) to head (1aa5f99).
⚠️ Report is 11 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #2637   +/-   ##
================================================
  Coverage            86.24%   86.24%           
================================================
  Files                  599      599           
  Lines                36984    36984           
  Branches              3795     3795           
================================================
  Hits                 31895    31895           
  Misses                4525     4525           
  Partials               564      564

Flag	Coverage Δ
unittests	`86.24% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jochem-brouwer

Super exciting this is already close to what we would need! This is starting to come close to indeed what the MITM is doing, except now EELS is driving it 😄 👍

Lemme know if there are any questions after this review 😄 👍

jochem-brouwer · 2026-04-16T01:03:18Z

+            next_block_number = int(HexNumber(head_block["number"]) + 1)
+            next_timestamp = int(HexNumber(head_block["timestamp"]) + 1)
+            payload_attributes = self._payload_attributes(
+                next_block_number=next_block_number,


Block number is not part of payload attributes? See also the amsterdam version of PayloadAttributesV4

jochem-brouwer · 2026-04-16T01:04:56Z

+                next_timestamp=next_timestamp,
+                withdrawals=withdrawals,
+            )
+            new_payload = self.testing_rpc.build_block(


API ref: ethereum/execution-apis#747

jochem-brouwer · 2026-04-16T01:06:23Z

+            new_payload = self.testing_rpc.build_block(
+                parent_block_hash=Hash(head_block["hash"]),
+                payload_attributes=payload_attributes,
+                transactions=None,


If the transactions parameter is JSON null, the client MAY build a block from its local transaction pool (mempool). (from https://github.com/ethereum/execution-apis/blob/main/src/testing/testing_buildBlockV1.yaml)
I don't think we want this (?), I think

Suggested change

transactions=None,

transactions=[],

jochem-brouwer · 2026-04-16T01:15:28Z

+@pytest.fixture(scope="session")
+def sender_fund_refund_gas_limit() -> int:
+    """Gas limit for fund/refund transactions (replaces sender)."""
+    return 21_000


(should be read from fork config as this changes in eip2780)

jochem-brouwer · 2026-04-16T01:17:21Z

+    Resolved before ``execute_required_contracts`` via fixture
+    dependency, so this records the pre-setup chain head.
+    """
+    block = eth_rpc.get_block_by_number("latest")


I think this should not be queried by number, this should be a command line argument where you provide the blockhash. A number is ambiguous, if you have a client and you stop the client at chain head, if it then reorgs we are now building on a block which no one has 😅 \

If no argument for snapshot hash provided, it is ok (with warn/log) to read chain head and use that.

I agree, using the latest block can lead to unexpected behavior, like we saw in test_ec_pairing, where the block did not reset correctly to the original snapshot start point.

jochem-brouwer · 2026-04-16T01:25:22Z

+    start_hex = start_block["number"]
+    logger.info(f"Resetting chain to start block {start_hex}")
+    try:
+        debug_rpc.set_head(start_hex)


I'm not sure what this does, would it also set forkchoice in the client correctly? I think this is what should be done instead using standardized APIs:

Let's say we have done the setup and we have reached the start block. Build two testing_buildBlock on top of this start block. Everything should be empty, no withdrawals, no transactions) and all other parameters should be equal except extraData. One of the blocks will be the block hash you use as parentHash in testing_buildBlockV1 and the other one is the reorg block. We never build on top of this blockhash, we only use this to reorg to.

Why? If you try to fcu back to a block which is already in the current canonical chain, then most/all clients will throw. If we use this approach by creating two different blocks which are basically equal except their extraData, we can use it as fork block because everything is the same (RPC calls would also return the correct data on state, except of course last block hash).

This would use existing API and not debug_setHead as I think this differs for all clients and there might be unforeseen side effects.

BTW, this double-block approach is not my idea, it's from @kamilchodola 😄 👍

If something unclear here let us know 😄 👍

jochem-brouwer · 2026-04-16T01:31:21Z

    )


+class StatefulPreRunFixture(CamelModel):


Is this a special class or a class with a special name? I think these prerun fixtures are essentially the same layout as normal fixtures right?

For benchmark/repricing effort we would need to control this presetup also. For repricing and also stateful benchmarks we cannot do test setup right before the benchmark block (this loads all state into memory cache and makes the ops super fast). We would need to create this state already in the snapshot / start block (the block where all tests start from).
To give an example, we would take a recent snapshot of mainnet and now in the prerun fill it will the desired data for repricings (basically setting up the state which we also have on perf-devnet-3, except that this entire state is much bigger and is thus worse than mainnet).

Once this is done, we don't run any tests (yet) and now use the chain state right after this prerun as the snapshot state. This is what we distribute to benchmarkoor. We also fill the benchmark fixtures on top of this.

We do this because now the state which we target in the benchs is already in the snapshot (we do not create it when actually running the benchs). This forces the client to actually read the DB and not use some memory cache.

(Not super urgent because can also write one big test and once test finishes just use that to get the desired snapshot)

This is a special setup phase that already exists in the gas-benchmarks mechanism (which is globals setting, or something else) It's not for individual benchmark setup; rather, it ensures common contracts (like the factory contract using Nick's method) are pre-deployed on the network. Without this initial setup, all subsequent create2 address calculations and contract deployments would fail.

jochem-brouwer · 2026-04-16T01:33:38Z

+    def set_head(self, block_number: str) -> None:
+        """`debug_setHead`: Reset chain head to the given block."""
+        self.post_request(
+            request=RPCCall(method="setHead", params=[block_number])


The fact that this is a number and not a hash indicates that we cannot for instance perform a reorg via this RPC call (we can only roll back into the canonical chain). This would be limiting. Also don´t know if this updates the forkchoice status of client.

jochem-brouwer · 2026-04-16T01:42:47Z

Okay I realized I might be too skeptical about debug_setHead 😅 Ignore that if it's not right.

LouisTsai-Csie

fill_stateful.py is mixing three things, types, pytest hooks, and fixtures, now without clear separation. It's a bit hard to know where to start.

The entry point (pytest_configure) is at L234 behind classes and helpers, and pytest_sessionfinish is way down at L587. Plus, fixtures are declared out of order (e.g., execute_required_contracts at L291 depends on stuff at L406 and L428), so we need to jump around to understand the flow.

I would suggest adding section headers and reorganizing (maybe types → helpers → pytest hooks → fixtures by lifecycle → per-test fixtures) might make it way easier to follow. wdyt?

LouisTsai-Csie · 2026-04-20T10:20:24Z

+@click.command(
+    name="fill-stateful",
+    context_settings={
+        "help_option_names": ["-h", "--help"],


I run uv run fill-stateful --help command and this shows up, is it expected behavior?

(ethereum-execution) caijiacheng@caijiachengs-MacBook-Air wt-forks-op-gas-cost % uv run fill-stateful --help Retrying execution_testing.rpc.rpc.BaseRPC._make_request in 0.5 seconds as it raised ConnectionError: HTTPConnectionPool(host='localhost', port=8545): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10be56e40>: Failed to establish a new connection: [Errno 61] Connection refused')). Retrying execution_testing.rpc.rpc.BaseRPC._make_request in 1.0 seconds as it raised ConnectionError: HTTPConnectionPool(host='localhost', port=8545): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10bedc910>: Failed to establish a new connection: [Errno 61] Connection refused')). Retrying execution_testing.rpc.rpc.BaseRPC._make_request in 2.0 seconds as it raised ConnectionError: HTTPConnectionPool(host='localhost', port=8545): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10bedccd0>: Failed to establish a new connection: [Errno 61] Connection refused')).

Or i should connect to the node first?

Not sure if this the root cause: the kwargs below (see fill_stateful function) is deleted and thus not being passed to the is_help_or_collectonly_mode function

LouisTsai-Csie · 2026-04-20T11:43:51Z

+    Resolved before ``execute_required_contracts`` via fixture
+    dependency, so this records the pre-setup chain head.
+    """
+    block = eth_rpc.get_block_by_number("latest")


I agree, using the latest block can lead to unexpected behavior, like we saw in test_ec_pairing, where the block did not reset correctly to the original snapshot start point.

LouisTsai-Csie · 2026-04-20T13:20:02Z

+    if is_help_or_collectonly_mode(config):
+        return


Not sure - should this check done before the rpc_endpoint and engine_endpoint check

LouisTsai-Csie · 2026-04-20T13:49:12Z

+    if version >= 4 and response.execution_requests is not None:
+        params.append(response.execution_requests)


Suggested change

if version >= 4 and response.execution_requests is not None:

params.append(response.execution_requests)

if version >= 4:

assert response.execution_requests is not None

params.append(response.execution_requests)

I checked Engine API v4+ (link), and found the request field is required. Should we assert its presence instead of ignoring it when it’s None?

LouisTsai-Csie · 2026-04-20T14:06:56Z

+        # Mixed phases: SETUP takes precedence (a block containing
+        # any setup transaction is considered part of the setup).
+        if TestPhase.SETUP in phases:
+            return TestPhase.SETUP


As I recall, some benchmarks include a single setup transaction for 7702 authorizations, followed by the benchmark transactions within the same block. In these cases, we will be labeling the block as a setup block.

This isn’t an issue for gas-benchmarks, since they keep collecting transactions within the same phase until either the buffer reaches the block gas limit or the phase changes. At that point, it flushes the transactions into a block and generate the payload.

LouisTsai-Csie · 2026-04-20T16:27:35Z

Quick question: who's responsible for mounting clients to the snapshot? In gas-benchmarks, Nethermind clients are mounted via overlay, a layer that captures state changes without modifying the snapshot itself. Does this PR have a similar mechanism? If not, where would that be handled?

fselmo · 2026-04-20T22:21:58Z

@LouisTsai-Csie I appreciate the review. This is a working PoC, not something I plan to merge as is. I should've clarified it's still a work in progress and I'm actually making this a lot more fill based, rather than execute based. I will try to update this soon. Your feedback is valuable for future additions, thanks 🙏🏼.

fselmo added 7 commits April 3, 2026 10:40

feat: fill-stateful pytest pluging implementation

bcb9980

fix: test phase extraction; remove AI guesswork

9b0ec20

WIP: debug_setHead resets

98456aa

WIP: separate global setup step

d757897

use withdrawal to fund; default gen seed key

e0b2621

chore: cleanups related to stateful filling

5bd15ef

add --clean support; remove xdist logging

28429a2

fselmo mentioned this pull request Apr 8, 2026

feat: add support for stateful fixture format ethpandaops/benchmarkoor#182

Draft

fselmo force-pushed the feat/stateful-engine-test-filling branch from 2aa1826 to f01b02b Compare April 8, 2026 23:02

fselmo added C-feat Category: an improvement or new feature A-test-benchmark Area: execution_testing.benchmark and tests/benchmark A-test-cli Area: execution_testing.cli A-test-fixtures Area: execution_testing.fixtures labels Apr 8, 2026

chore: fix unit test for stateful fixture

1aa5f99

fselmo force-pushed the feat/stateful-engine-test-filling branch from f01b02b to 1aa5f99 Compare April 8, 2026 23:13

fselmo changed the title ~~feat(tests): stateful engine test filling for benchmark tests~~ feat(tests): stateful test filling for benchmark tests Apr 9, 2026

fselmo marked this pull request as ready for review April 14, 2026 19:28

jochem-brouwer reviewed Apr 16, 2026

View reviewed changes

LouisTsai-Csie requested changes Apr 20, 2026

View reviewed changes

fselmo marked this pull request as draft April 20, 2026 22:20

		if version >= 4 and response.execution_requests is not None:
		params.append(response.execution_requests)

Conversation

fselmo commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

codecov Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jochem-brouwer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jochem-brouwer commented Apr 16, 2026

Uh oh!

LouisTsai-Csie left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Apr 20, 2026

Uh oh!

fselmo commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fselmo commented Apr 8, 2026 •

edited

Loading

codecov Bot commented Apr 8, 2026 •

edited

Loading