feat: state-actor + eest payload generation#248
Merged
Conversation
Complete the stateful EEST fixture pipeline end-to-end: a builder that generates `blockchain_test_stateful_engine` fixtures and runner support that replays them. Generation (`benchmarkoor build`): - `builder.eest_payloads` boots a geth filler on a writable copy of a snapshot datadir and runs fill-stateful (execution-specs #2637) against it, writing stateful fixtures. Only geth supports testing_buildBlockV1. - Shared builder helpers extracted to pkg/builder/util.go; Dockerfile.eest-filler builds the fill image. Replay (`benchmarkoor run`): - Parse the stateful-engine format (snapshot datadir, no genesis) plus the shared pre_run/<startBlockHash>.json files; convert pre_run + per-test setup payloads into setup steps and the benchmark payloads into the measured test step (pkg/eest, pkg/executor/eest_source.go). - Make eest_fixtures.local_genesis_dir optional — the runner already boots datadir-only with no genesis, so stateful sources need no genesis dir.
…l knobs Run the state-actor and fill-stateful build containers as the invoking host user so their on-disk output (snapshot datadirs, EEST fixtures) is owned by that user instead of root. This avoids permission-denied failures when a later non-root step reads or cleans that output (e.g. the datadir copy, or wiping output_dir on --force). - docker.ContainerSpec gains a User field; container creation honours it and defaults to root when empty (existing containers unchanged). - state-actor and the eest_payloads filler/fill containers set User to the current uid:gid. The uv/pytest-based fill image is made non-root-friendly by redirecting its writable paths (uv cache, pytest cache, $HOME) to /tmp, skipping the runtime venv re-sync, and marking the root-owned /eest git checkout as a git safe.directory. Also rework the fill-stateful benchmark parametrisation: - gas_benchmark_values is now a typed int list ([10, 30]) instead of a comma-separated string (joined into --gas-benchmark-values). - add fixed_opcode_count (float list, thousands of opcodes), mutually exclusive with gas_benchmark_values; an empty list passes the bare --fixed-opcode-count flag (uses the image's .fixed_opcode_counts.json).
Local-directory / tarball EEST sources (e.g. stateful fixtures built locally) have no github_repo, so getGitHubUrl(undefined) threw on the suite-detail page. Guard the helper, make the GitHub repo/link rendering conditional, surface the local_* fields, and badge these sources as "EEST Local".
Make `benchmarkoor build` output consistent with `benchmarkoor run` and easier to follow during the long fixture-generation phase: - Apply the run command's `🔵` log formatter, and stream each container's stdout/stderr directly in run's `🟣 TS LABEL | name | line` client-log style (CLIE for the filler EL client, BULD for the state-actor and fill-stateful build containers) instead of wrapping them through logrus. - Reset ANSI state at the end of every streamed line so an unreset bold/color sequence (e.g. pytest's session header) can't bleed into later lines. - Pass -v to fill-stateful so each test node id and outcome prints as it is built, rather than bare pytest progress dots. - Print a separate build summary per builder (state-actor vs eest-payloads). - Trim the per-line logger to the target name; source_dir/output_dir/fork/ image are logged once instead of suffixed onto every line.
state-actor direct-writes synthetic state (and the genesis state root) into the client DB and emits a chainspec with an empty alloc, so a client booting that snapshot recomputes a different genesis from the chainspec and aborts. geth boots datadir-only (no genesis) and is unaffected; besu and reth need a flag to trust the DB-resident genesis instead: - besu: --genesis-state-hash-cache-enabled=true (else "Supplied genesis block does not match chain data stored"). - reth: --debug.skip-genesis-validation (else "genesis hash in the storage does not match the specified chainspec"); the flag's own help names state-actor as the intended use case. With these, besu replays the stateful fixtures cleanly (with a bootstrap FCU and a stable besu image). reth boots but currently diverges on execution state root (an upstream state-actor-reth snapshot inconsistency).
The fill image previously baked the execution-specs checkout (and its uv venv) at docker build time, so changing the EEST version meant rebuilding the image. Instead, clone the repo at benchmarkoor build time and mount it into the fill container, with the version in config. - pkg/gitrepo: extract the clone-into-cache logic from executor.GitSource into a shared package (CloneOrUpdate + HeadSHA); GitSource now delegates to it, so the test-source and EEST clones share one cache implementation. - builder.eest_payloads: clone eest_repo@eest_ref (defaults: execution-specs / forks/amsterdam) into an on-disk cache and bind-mount the checkout at /eest. uv builds the venv into that user-owned checkout on first use (cached across runs); no UV_NO_SYNC needed. - Dockerfile.eest-filler: slimmed to just the uv/python/git toolchain — no repo or venv baked in. - config: add eest_repo / eest_ref with resolvers + docs.
Replace runner.directories.tmp_cachedir with global.directories.cachedir so a single on-disk cache is shared by both commands. It was misnamed (it defaults to ~/.cache/benchmarkoor, not a temp dir) and lived under `runner`, so the build command couldn't use it — the eest builder hard-coded /tmp instead. - config: add global.directories.cachedir + Config.ResolveCacheDir (default ~/.cache/benchmarkoor); remove runner.directories.tmp_cachedir. tmp_datadir (genuinely ephemeral scratch) stays under runner. - run: resolve the cache dir once and reuse it for executor sources, cpufreq, and the pre-run log buffer; drop the local getExecutorCacheDir helper. - runner: rename internal Config.TmpCacheDir -> CacheDir. - build: resolve the shared cache dir and pass it to the eest builder, which now caches the EEST repo under <cachedir>/eest-repos instead of /tmp. BREAKING: configs using runner.directories.tmp_cachedir must move to global.directories.cachedir (env: BENCHMARKOOR_GLOBAL_DIRECTORIES_CACHEDIR).
A worked example covering all three stages: build datadirs with state-actor,
generate stateful EEST fixtures (config-driven eest_repo/eest_ref), and replay
them per client. Includes the per-client boot specifics found while bringing
this up (besu hyperledger image, reth pinned digest + skip-genesis-validation,
nethermind sync flags, the genesis chainspec map), parametrised via
${STATE_DIR_PREFIX} / ${EEST_FIXTURES_DIR}.
Builder containers bind-mount host files written under os.TempDir(). On macOS $TMPDIR is /var/folders/… where /var is a symlink to /private/var; Docker Desktop shares /private but not the /var alias, so a single-file mount at the unresolved path silently fails to appear in the container — e.g. the geth filler aborts with "open /tmp/config.toml: no such file or directory". Add mountTempDir() (EvalSymlinks of os.TempDir()) and use it for every mount-source temp file: the eest filler's JWT, geth config.toml, and datadir copy, plus the state-actor inline-spec file. No-op on Linux; on macOS it yields /private/var/folders/… which Docker Desktop shares. (OrbStack/colima share the whole FS and aren't affected either way.)
Let builder.eest_payloads.fill_dockerfile point at a Dockerfile that benchmarkoor builds with the container runtime at build time, instead of requiring a pre-built fill_image. The built image is tagged fill_image when set, else benchmarkoor-eest-fill:local; one of fill_image / fill_dockerfile is required. - config: add fill_dockerfile + BuildsFillImage/ResolveFillImageTag; require one of the two and reject a missing Dockerfile at config time. - builder: ensureFillImage builds (once per run, via `<runtime> build`) or pulls, then uses the resulting tag for the fill container. - example: config.state-actor-eest.yaml now builds Dockerfile.eest-filler, so `benchmarkoor build` is self-contained (no manual docker build). - docs + config.example updated.
uv run fill-stateful syncs ethereum-execution[optimized], which pulls in rust-pyspec-glue. On arm64 there is no prebuilt wheel, so its native mdbx-sys crate is compiled from source; the build script runs bindgen, which dlopens libclang. The fill image only carried build-essential, so the build failed with "Unable to find libclang ... set the LIBCLANG_PATH environment variable". Add clang, libclang-dev and pkg-config to the image and pin LIBCLANG_PATH=/usr/lib/llvm-14/lib (bookworm's default LLVM major) so bindgen/clang-sys locates libclang deterministically. Cargo is already bootstrapped by uv at build time, so only the clang toolchain was missing.
…n time On macOS there is no cgroup, so the executor falls back to the Docker Stats API. Two problems compounded there: 1. Correctness: executeRPC read the post-request stats snapshot between http.Do() returning (headers only) and the response body read, i.e. INSIDE the measured timing window. The Docker one-shot ContainerStats(stream=false) call blocks ~1-2s on the daemon's collection cycle, so that latency landed in `duration = bodyReadComplete - wroteRequest` — the value MGas/s is derived from. Every engine_newPayload measured ~2.0s and MGas/s came out ~200x low (e.g. 4.98 MGas/s for a 10M-gas block). 2. Performance: issuing a blocking ~2s one-shot stats request twice per RPC (before + after) made whole runs crawl (~2.5h). Fixes: - Move the after-request stats read out of the timing window via a new collectResourceDelta helper; the body read, bodyReadComplete and duration are captured first, so stats-backend latency can never be attributed to the RPC. - Rewrite dockerReader to stream: one long-lived ContainerStats(stream=true) connection feeds a cached latest sample, so ReadStats() is non-blocking. Includes lifecycle control (context cancel + Close wait), reconnect-on-error and copy-on-read. Add docker_reader_test.go (passes under -race). Caveat: the Docker stats stream refreshes ~1/s, so per-RPC resource deltas for sub-second calls on macOS are coarse (often 0). Linux still uses the fine-grained cgroup reader. The key properties — accurate MGas/s and fast runs — are restored.
besu booted from a state-actor snapshot stays in PoS initial-sync and answers SYNCING to every payload until it receives a bootstrap forkchoiceUpdated. - add runner.client.config.bootstrap_fcu.enabled (required for besu/reth/nethermind; geth is unaffected). - pin besu to hyperledger/besu:26.6.0: :latest resolves to 26.6.1, which regressed and won't accept the bootstrap FCU on a snapshot. 26.6.0 works. geth and besu now both pass 36/36 (osaka bn128) on the example.
…aseDbPath
benchmarkoor mounts the datadir at /data and passes nethermind --datadir=/data,
but nethermind's Init.BaseDbPath then defaults to /data/nethermind_db/<network>/.
So nethermind opened a fresh empty state DB there and ignored the state-actor
snapshot written directly under /data/{state,blocks,headers,blockInfos}. It then
built genesis from the empty-alloc chainspec (whose baked-in stateRoot makes the
genesis hash match, masking the problem), leaving the state DB empty — so
StateReader.HasStateForBlock(genesis) is false, NewPayloadHandler.ShouldProcessBlock
returns false, and every replayed engine_newPayload answers SYNCING (the client
beacon-syncs instead of executing).
Passing --Init.BaseDbPath=/data makes nethermind read the snapshot DBs directly
("Detected HalfPath key scheme"). nethermind now passes 36/36 on the example,
matching geth and besu. No state-actor change is required (state-actor's own
oracle already sets BaseDbPath to the datadir). Scoped to the example's
nethermind extra_args so the global client DefaultCommand still uses --datadir.
reth remains unresolved (separate execution-state divergence, not a path issue).
benchmarkoor's client registry and state-actor both already support ethrex, but the state_actor builder's config validation rejected it as a target client, so it couldn't be used end-to-end. - allow "ethrex" in stateActorSupportedClients (pkg/config) + update the validation error message. - add ethrex to the state-actor EEST example: builder image ghcr.io/ethereum/state-actor-ethrex:main, a target + datadir, the ethrex-genesis.json sidecar under runner.client.config.genesis, and an instance (ghcr.io/lambdaclass/ethrex:latest) with --skip-genesis-validation (ethrex >= v16.0.0 is required for that flag, needed because the snapshot commits a synthetic state root the empty-alloc genesis can't reproduce). Like reth, ethrex has no RPC rollback, so the replay reorgs via the engine API. - document ethrex in docs/configuration.md. Verified: ethrex passes 36/36 on the example, matching geth, besu, nethermind and reth.
## Problem
With besu in `config.state-actor-eest.yaml`, the bootstrap
`engine_forkchoiceUpdatedV3` is
rejected with `SYNCING` — benchmarkoor retries then aborts before any
payload runs. geth, reth,
and nethermind are unaffected.
## Root cause
besu returns `SYNCING` from `forkchoiceUpdated` while
`mergeContext.isSyncing()` is true. On an
isolated snapshot node that hinges on
`SyncState.reachedTerminalDifficulty`, which is set only
when besu's **synchronizer** runs (`DefaultSynchronizer`). This config
passes
`--p2p-enabled=false`, which suppresses the synchronizer, so the flag is
never set and besu
treats the (post-merge) snapshot head as pre-merge → `SYNCING`.
(besu ≤ 26.6.0 masked this: its `isSyncing()` also required
`!isInSync()`, and `isInSync()` is
true here, so it short-circuited to `VALID`. 26.6.1 returns `SYNCING` on
the unset
terminal-difficulty flag alone.)
## Fix
Enable p2p on the besu instance via `extra_args`:
```yaml
- id: besu
image: hyperledger/besu:26.6.1
extra_args:
- --p2p-enabled=true
```
`--max-peers=0` + `--discovery-enabled=false` (besu client defaults)
keep the node isolated — p2p
initializes but no real peers connect. Also bumps besu to the latest
release (26.6.1).
## Verification
- besu **26.6.1** + `--p2p-enabled=true`: bootstrap FCU `VALID`,
**36/36** (`benchmark/compute`, bn128).
- besu **26.6.0** + `--p2p-enabled=true`: **36/36** (no regression).
Also documents the requirement in `docs/configuration.md` (Bootstrap FCU
section).
EL-client and build-tool container output both streamed with the same 🟣 prefix, making them hard to tell apart when the filler client and fill-stateful/state-actor stream side by side. Keep 🟣 for CLIE (matches benchmarkoor run client logs) and give BULD its own 🟠.
Add erigon to stateActorSupportedClients so the builder accepts it, update the validation error message and supported-clients comment, and wire it through the config.state-actor-eest.yaml example (state_actor image + target, plus runner genesis/datadir/instance). dbPath already handles erigon correctly: only geth needs the /geth/chaindata suffix; erigon takes the datadir root like the other non-geth clients. NOTE: state-actor does not publish a state-actor-erigon image yet, so the build/run path is wired but untested end-to-end against a real datadir.
…actor snapshots state-actor pins erigon at the bal-devnet-7 commit and bakes that binary into its image; its `erigon init` encodes the MDBX chain config that erigon's way (chainId as a JSON string). Booting the snapshot with stable erigon (erigontech/erigon:latest, 3.4.x) panics with "cannot unmarshal \"1337\" into a *big.Int". Point the erigon instance at ethpandaops/erigon:bal-devnet-7, which reads the config correctly and supports the amsterdam/BAL (EIP-7928) fork these benchmarks target. Verified end-to-end: build (state-actor-erigon:main) + run replay 180 engine_newPayloadV4 / forkchoiceUpdatedV3 calls with no SYNCING/panic.
… geth still the only working one) fill-stateful forces use_testing_build_block=True, so a filler must implement the testing_buildBlockV1 RPC plus debug_setHead (the per-test chain rewind) and return EEST-compatible receipts. Add per-client filler boot commands (fillerCommand dispatcher + besu/nethermind argv) carrying the same snapshot-boot workarounds the runner uses (besu --p2p-enabled, nethermind --Init.BaseDbPath), the testing namespace each needs (besu TESTING, nethermind Testing module), and a non-zero session-tip pin for besu (its eth_maxPriorityFeePerGas is 0 on a fresh snapshot). Move filler_image onto each target so targets can fill with different clients/images. besu and nethermind are accepted by validation and fully plumbed, but both are blocked upstream today, so their example targets are commented out: - besu: testing_buildBlockV1 (TESTING, besu-eth/besu#9838) + debug_setHead + session fees all work, but its self-built block fails its own engine_newPayloadV4 with a World-State-Root mismatch. - nethermind: testing_buildBlockV1 works via the Testing module + special image, but debug_setHead is unimplemented and its receipt trips EEST's strict TransactionReceipt model. geth remains the only filler that produces fixtures end-to-end.
…n_genesis Add a fork_activation_genesis field to eest_payloads targets so a snapshot built at one fork (e.g. state-actor's osaka) can fill fixtures for the next fork (e.g. amsterdam). The builder reads the snapshot genesis block's timestamp and boots the geth filler with --override.<fork>=<timestamp + 1>, so the fork activates on the first block the filler builds (block N+1) while block 0 stays on the prior fork. --override.genesis can't be used here: state-actor bakes the snapshot state into the DB under an empty-alloc genesis, so geth recomputes an empty-state genesis block and rejects the hash mismatch. The per-fork --override.<fork> flag amends only the in-memory chain config, leaving the genesis block untouched. This needs a geth build that registers the flag (e.g. ethpandaops/geth:bal-devnet-7-amsterdam-override for amsterdam). Adds the example config.state-actor-eest.amsterdam.yaml wiring an osaka snapshot to an amsterdam fill.
EEST's fill bootstrap deploys the Arachnid deterministic-deployment proxy via a keyless tx with a fixed 100k gas limit. Amsterdam's state-expansion gas (EIP-7928/8037) makes that deploy cost ~324k, so the keyless method can't be used. Predeploy the proxy at the canonical 0x4e59…956c via state-actor's create2_factory template (explicit address, ordered before the 2 GB bloat entity so it survives target_size truncation); EEST detects the existing bytecode and skips the keyless deploy.
Fill amsterdam compute fixtures against a geth filler while the state-actor snapshot stays osaka: - Point at skylenet/execution-specs devnets/bal/7-bench-cap-aware-deploy (cap-aware deploy gas sizing + seed-funding skip). - Use the published skylenet/geth:bal-devnet-7-amsterdam-override image, which registers --override.amsterdam to fork at snapshot block ts + 1. - Pre-fund fill-stateful's seed (0x7e5f…bdf) in the snapshot and pin rpc_seed_key so the withdrawal funding block is skipped and start_block == the snapshot block (stable per-test debug_setHead rewind). - Pre-deploy the Arachnid CREATE2 factory in the snapshot. - Filter out the currently-failing compute tests (blockhash pathdb rewind limit, blobhash blob-tx panic, amsterdam gas-repricing mismatches) so the fill is green.
Wire the runner's geth instance to execute the filled amsterdam stateful-engine fixtures: - Use skylenet/geth:bal-devnet-7-amsterdam-override (same image used to fill) so geth registers --override.amsterdam. - Add --override.amsterdam=1 so amsterdam activates at snapshot block 0's timestamp + 1; geth boots from the osaka snapshot datadir and would otherwise reject the amsterdam newPayloadV5 blocks. Validated: geth replays the full green set (830 passed, 0 failed).
…am clients Add per-instance genesis patching so clients that read their fork schedule from the genesis (rather than a CLI flag like geth/erigon's --override.amsterdam) can activate amsterdam on an osaka state-actor snapshot at boot, with no manual chainspec editing: - genesis_fork_override (geth-format): sets config.<fork>Time and inherits the blobSchedule entry of the latest preceding fork. Used by besu, reth, ethrex. - genesis_eip_override (parity/nethermind-format): sets the listed EIPs' params.eip<N>TransitionTimestamp. The devnet-specific EIP list lives in config. Used by nethermind. Both rewrite only the config/params object and preserve every other genesis field verbatim (json.Number round-trip), so the genesis block hash is unchanged and stays compatible with the snapshot datadir. Wire all six clients in config.state-actor-eest.amsterdam.yaml from the benchmarkoor-tests *-bal-full reference. Amsterdam is activated three ways: geth/erigon via --override.amsterdam, besu/reth/ethrex via genesis_fork_override, nethermind via genesis_eip_override (devnet-7 EIP set 7708,7778,7843,7928,7954,7976,7981,8024,8037). Verified on the full green compute set (830 tests): geth, besu, ethrex, nethermind, erigon all 830/0; reth has 8 amsterdam warm-storage/tload repricing failures (client/spec discrepancy, not a config issue).
The 2 GB padding entity (and its target_size truncation-ordering constraints) is not needed for the compute benchmarks. Keep only the functionally-required entities: the CREATE2 factory and the pre-funded fill-stateful seed.
erigon reads its fork schedule from the MDBX datadir, not the genesis file (like geth), so a patched genesis is ignored and amsterdam stays inactive — the --override.amsterdam flag is required. Document this next to the flag so it isn't swapped for genesis_fork_override.
…verride Add the two per-instance genesis override options to the Client Instances table plus a "Genesis Fork & EIP Overrides" subsection explaining the geth-format vs parity/nethermind formats, blobSchedule inheritance, the geth/erigon caveat (they read forks from the datadir, not the genesis), and the wrong-format error.
…thermind:master Drop the stale note about the special testing_build_block_with_opcode_tracing image and its debug_setHead limitation; the standard master image ships the Testing module. Target stays commented out. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
… config its own snapshot dir
global.env declares key:value pairs usable as ${VAR} / ${VAR:-default}
substitutions throughout the file — a per-config default for an env var of the
same name. Resolution order is shell env, then global.env, then the inline
default, so configs stay overridable (e.g. STATE_DIR_PREFIX=/mnt/big in CI).
Read from the raw YAML before expansion so the keys keep their case (Viper
lowercases map keys); a value may itself reference the shell env.
Use it to give each state-actor EEST example its own snapshot dir under
/tmp/benchmarkoor/state-actor/<name> via global.env.STATE_DIR_PREFIX, so configs
that build different snapshots no longer collide. All ${STATE_DIR_PREFIX:-...}
refs collapse to a bare ${STATE_DIR_PREFIX}.
Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…ng, inline address_stubs, besu filler - global.env: new section under Environment Variables (resolution order, per-config-default semantics) + a Global Settings table row + ToC entries. - eest_payloads.config: add the now-hoistable tests, filter, marker, address_stubs and address_stubs_file; note the address-stubs pair hoists as a unit, with an inline example. Move them out of the per-target rows into the hoistable footnote. - besu is now a supported filler (merged TestingBuildBlockV1 coinbase fix, ethpandaops/besu:bal-devnet-7); update the filler-client notes. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…rget flags
Per-builder target filters that scope a single builder's targets by name,
complementing the existing cross-builder --target. A target is built when it
passes the global --target filter AND the per-builder limit for the builder
that owns it; an unset filter is unrestricted. Filter values that name no
existing target error out (the per-builder limits are checked against only
that builder's target names, so a typo or wrong-builder name surfaces).
Makes "build just one client end-to-end" ergonomic:
build --limit-state-actor-target nethermind \
--limit-eest-payload-target payload-generator-nethermind
Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…dirs
Mirror the STATE_DIR_PREFIX treatment for fixtures: each config declares a
unique EEST_FIXTURES_DIR base (/tmp/benchmarkoor/eest-fixtures/<config-name>)
in global.env, and every eest target writes to ${EEST_FIXTURES_DIR}/<client>.
Replaces the per-client EEST_FIXTURES_DIR_BESU / EEST_FIXTURES_DIR_NM vars and
the divergent inline defaults, so configs that build different fixtures no
longer collide under /tmp. The runner replays ${EEST_FIXTURES_DIR}/geth.
Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
… with EEST_FIXTURES_DIR Both are per-config base dirs with <base>/<client> subdirs, so drop the redundant _PREFIX suffix. Updates the four example configs, the global.env docs example, and the config test. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…RES_RUNNER_SOURCE
EEST fixtures are client-agnostic, so the runner only needs one filler's set.
Add EEST_FIXTURES_RUNNER_SOURCE (default geth) to global.env and point the
runner at ${EEST_FIXTURES_DIR}/${EEST_FIXTURES_RUNNER_SOURCE}, so replaying a
cross-client filler's fixtures (e.g. nethermind's) is a one-var override
instead of a config edit.
Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
… flags Skip a whole builder for a build invocation: --skip-state-actor-build runs only eest_payloads, --skip-eest-payload-build runs only state_actor. Useful for re-filling fixtures against an existing snapshot without rebuilding it, or vice-versa. A skipped builder is not constructed; its --limit-*-target is then ignored, and skipping every configured builder is an error. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
The filler images are moving tags (nethermind:master, besu:bal-devnet-7, …), so if-not-present served a stale cached image once pulled. Use always so each build refreshes the filler image to its latest digest. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…d of the snapshot The "No pre_run file for stateful fixture's start block" warning fired for every stateful fixture whose start block equals the snapshot block — the common case when a pre-funded seed lets fill-stateful skip the funding block, where there are no pre_run blocks to replay. Suppress it there (extracted as statefulPreRunMissing) and only warn when start != snapshot, the case where skipping the snapshot→start advance would replay against the wrong state. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…msterdam stateful Recreate the container and roll back the snapshot datadir between tests. Add the same line to full.amsterdam.stateful.yaml but commented out, since the full config's snapshots are large and container-recreate rollback is opt-in there. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…terdam stateful config ethrex builds a snapshot only — it is not a fill-stateful filler (geth/besu/ nethermind are), so it gets a state_actor target but no eest_payloads target. Built and validated in the OrbStack VM (~436 GB datadir). Uncomments the ethrex state-actor image, adds the target with its on-disk size, and refreshes the header now that geth + nethermind + ethrex are enabled by default. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…tateful config Validated in the OrbStack VM: ethrex (v15.0.0-bal-devnet-7) boots on its state-actor snapshot via overlayfs and replays the stateful fixtures (4/4 tstore tests VALID). Uncomments the ethrex genesis, datadir and runner instance. Note: ethrex doesn't support the rpc-debug-setHead rollback, so benchmarkoor skips the between-test rollback — benign here because each EEST fixture re-anchors its setup to the snapshot start block. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
Validated in the OrbStack VM: reth (2.3.0-dev / bal-devnet-7) replays the full 225 stateful bloatnet tests against its state-actor snapshot (~353 GB), 0 fail. Uncomments the reth state-actor image+target and the reth genesis/datadir/runner instance. Note: reth keeps state in one monolithic MDBX file, so its overlayfs upper must sit on a real disk (point TMPDIR at one); the default tmpfs /tmp is too small and reth dies with "No space left on device" copying the DB up. RocksDB/Pebble clients (geth/nethermind/besu/ethrex) are unaffected. Like ethrex, reth has no rpc-debug-setHead so the per-test rollback is skipped (benign — fixtures re-anchor to the snapshot start block). Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…tmp_datadir
Use the dedicated config field (runner.directories.tmp_datadir, exposed as
${TMP_DATADIR}) for the overlay/copy scratch dir instead of the blunt TMPDIR
env. reth's monolithic MDBX gets copied into this dir, so on a tmpfs /tmp
(e.g. the OrbStack VM) point TMP_DATADIR at a real disk. Verified in the VM:
the overlay base lands under TMP_DATADIR and reth runs clean.
Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…eady Previously waitForRPC polled the filler endpoint for the full 15-minute fillerReadyTimeout even when the container had already died on boot (panic, bad flags, OOM, corrupt datadir), hanging at 'Waiting for filler client RPC to become ready'. Race RPC-readiness against container exit via WaitForContainerExit and abort immediately with the exit code.
The OrbStack machine had no `make`, so the repo's Makefile targets (`make build-core`, `make test-core`, …) couldn't run inside it. Add make and git (git also stamps version/commit) to the apt install. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
Switch the provisioner from an ad-hoc `go build -o /usr/local/bin/benchmarkoor` to `make build-core` so the VM build matches the canonical Makefile output (bin/benchmarkoor, with version/commit ldflags) and a later `make build-core` rebuild isn't shadowed by a stale /usr/local/bin copy. Symlink bin/benchmarkoor onto PATH so a bare `benchmarkoor` still works and always resolves to the latest build. Drops the now-redundant GO_BUILD_TAGS plumbing (the Makefile owns the tags) and the GOFLAGS=-mod=mod that was rewriting go.mod. Usage text updated to note the binary lives in bin/. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…rges restoreAddressStubsKeyCasing captured only the last file's eest_payloads.config block, but Viper deep-merges that map across --config files — so a config.address_stubs defined in an earlier file kept its Viper-lowercased keys when a later file touched a different config field, silently breaking EEST's exact-match stub resolution. Accumulate the config-block stubs from every file (later wins per key), mirroring the merge. Adds a two-file regression test. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
The container log-streaming goroutine (spawned by RunInitContainer, which returns on ContainerWait without joining it) can still Write() into the fill's error-tail buffer while runFill reads String() on the non-zero-exit path — a data race on the unsynchronized bytes.Buffer. Guard Write/String with a mutex. Adds a concurrent write/read test (run with -race). Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…d in cleanup The eest_payloads builder creates a shared "benchmarkoor-build" docker/podman network but nothing removed it, leaving a dangling network after every build. Remove it best-effort in the build teardown (fresh context so it runs even on SIGINT), and add it to `benchmarkoor cleanup` (detected, previewed, removed with the other resources). Adds a NetworkExists method to the container-manager interface (docker + podman impls) so cleanup can detect a lone network without its preview/empty-check skipping it; exports the network name as builder.EESTBuildNetwork. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…mary ConvertStatefulFixture's test only checked SetupLines[0]'s method name (engine_newPayloadV4 for every payload), so a regression that put the fixture's setup block before the shared pre_run blocks would still pass. Assert the pre_run→setup→benchmark order by execution-payload blockHash. Add TestSummarise (exit-code aggregation / failed-target naming) plus a TestMain to init the cmd package logger so log-calling functions don't nil-panic under test. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
The upstream state-actor-besu image fixes the commitment-phase OOM that previously killed the besu snapshot build, so besu is now buildable (~329 GB) and validated end-to-end in the OrbStack VM: 225/225 stateful bloatnet tests, 0 failed, 0 BAL mismatches (besu also supports the rpc-debug-setHead rollback, unlike reth/ethrex). Enable the besu state-actor image+target and the besu genesis/datadir/runner instance; refresh the header now that geth/nethermind/ ethrex/reth/besu are all enabled (~1.7 TB) and only erigon stays commented (OOM). Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…just a string) The spec was a "|" block scalar (a string), so editors gave it no YAML syntax highlighting. Allow it to be authored as a structured YAML mapping instead. The field is excluded from Viper decoding (Viper can't decode a mapping into a string, and a Viper round-trip would coerce numbers to float and drop comments); normalizeStateActorSpec re-parses the raw YAML with yaml.Node and serializes the spec body — a mapping is re-marshaled, a "|" block scalar is taken verbatim, so both forms keep working. Re-parsing preserves number formatting (e.g. 2_000_000_000, 335545082), value casing (mixed-case addresses) and comments. Converts the four state-actor example specs to structured YAML and updates the docs. Validated end-to-end: state-actor consumes the re-serialized spec and builds a snapshot successfully. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
…he same runner instance EEST payload targets already validate that the two genesis overrides are mutually exclusive (they patch different genesis formats — geth-format <fork>Time vs parity eip<N>TransitionTimestamp), but the runner ClientInstance carried the same fields with no such check. Setting both only failed at boot when the override apply rejected the genesis format; now it's caught at config-validation time with a clear message, mirroring the target-side check. Claude-Session: https://claude.ai/code/session_0143C1YkFtZ7gYbYkxbadHVw
skylenet
added a commit
that referenced
this pull request
Jun 30, 2026
Follow-up to #248. Two related changes that, together, let CI exercise the full `state-actor` → `eest_payloads` → `run` pipeline on a GitHub-hosted runner. ## 1. Build the fill image from an embedded Dockerfile by default `builder.eest_payloads.fill_image` and `fill_dockerfile` are now **both optional**. When neither is set, benchmarkoor builds the fill image from a `Dockerfile.eest-filler` **embedded in the binary** (`go:embed`). The Dockerfile copies nothing from the build context (execution-specs is bind-mounted at run time), so it builds against an empty temp dir written by `resolveFillDockerfile` and cleaned up afterwards. - Moved `Dockerfile.eest-filler` into `pkg/builder/` so it can be embedded (`.dockerignore` doesn't exclude it, so the in-image Go build still embeds it). - `BuildsFillImage()` now also builds when no `fill_image` is configured; dropped the "one of fill_image/fill_dockerfile is required" validation. - Dropped the now-unnecessary `fill_dockerfile` line from the example configs; updated docs + `config.example.yaml`. - Tests: `TestEmbeddedFillDockerfile`, `TestResolveFillDockerfile` (configured + embedded paths, with cleanup), `TestBuildsFillImage` truth table. ## 2. state-actor + eest build/run e2e via the composite action - **`action.yaml`** — optional build phase (`build-config` / `build-config-urls` / `build-args`) that runs `benchmarkoor build` before the existing run step. Existing jobs are unaffected (the phase is skipped when no build config is passed). - **`ci.action.yaml`** — new `state-actor` job: builds OSAKA nethermind + besu snapshots, fills the compute `bn128` subset with the besu filler, then replays the fixtures on nethermind and besu. `datadir_method: copy` + a 256 MB snapshot keep it within an `ubuntu-latest` runner. Runs on PRs (consistent with the other action jobs). ## Validation - Full `go test ./...` green (except the pre-existing macOS-only `/proc/mounts` schelk tests); `golangci-lint --new-from-rev=origin/master` 0 issues; actionlint clean. - **VM end-to-end with the embedded Dockerfile and no `fill_dockerfile`**: fill image built from the embed → 36/36 filled (osaka bn128), nethermind + besu snapshots OK, replay 72/72 (36 each), 0 failed — including verifying `cleanup_on_start` does not wipe the freshly-built snapshots.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
benchmarkoor buildstage that materialises the artifacts EEST stateful benchmarks need, plus the runner-side support to replay them. Two builders run decoupled frombenchmarkoor run, in declaration order:builder.state_actor— uses state-actor to write per-client genesis snapshot datadirs directly in each EL's native on-disk format (geth Pebble, reth/erigon MDBX, besu/nethermind RocksDB, ethrex), bypassing the normal genesis-replay path.builder.eest_payloads— runs EESTfill-statefulagainst a filler client booted on a snapshot, recording engine-API payloads as benchmark fixtures.benchmarkoor runthen replays the generatedblockchain_test_stateful_enginefixtures against the pristine snapshot.What's included
benchmarkoor buildcommand--target(by name, across builders),--limit-state-actor-target/--limit-eest-payload-target(scope a single builder),--skip-state-actor-build/--skip-eest-payload-build,--force.Stateful fixture replay (runner)
blockchain_test_stateful_enginesupportConfig surface
global.env— config-local${VAR}defaults (shell env still wins).eest_payloads.confighoisting oftests,filter,marker,address_stubs,address_stubs_file(per-target override; the address-stubs pair hoists as a unit).address_stubsalongsideaddress_stubs_file.marker(pytest-m).genesis_fork_override/genesis_eip_overrideto activate a fork the snapshot doesn't schedule (besu/reth/ethrex/nethermind).UI
Examples & docs
config.state-actor-eest.*example configs (osaka/amsterdam × compute/stateful, plus the full bloatnet/repricing stateful)..hack/orbstack.shto provision an OrbStack Linux VM (overlayfs + dedicated dockerd) for the large snapshots.docs/configuration.mdbuilder reference (state_actor, eest_payloads, build flags, global.env).Validation
End-to-end in an OrbStack Linux VM against the full ~200 GB bloatnet/repricing prestate:
TestingBuildBlockV1coinbase fix is merged (ethpandaops/besu:bal-devnet-7); the besu snapshot build is in progress.