feat(test-benchmark): add worst-case depth attack benchmarks for Ethereum state tries using deterministic deploy (ethereum#1976)

marioevz · CPerezz · CPerezz · commit 250bd04c70f7 · 2026-02-27T17:11:08.000+01:00
* feat: add worst-case depth attack benchmarks for Ethereum state tries This PR introduces comprehensive benchmarks to test Ethereum clients under worst-case scenarios involving extremely deep state and account tries. The attack scenario: - Pre-deployed contracts with deep storage tries (depth=9) maximizing traversal costs - CREATE2-based deterministic addressing for reproducible benchmarks - AttackOrchestrator contract that batches up to 2,510 attacks per transaction - Tests measure state root recomputation impact when modifying deep slots Key components: - depth_9.sol, depth_10.sol: Contracts with deep storage tries - s9_acc3.json: Pre-computed CREATE2 addresses and auxiliary accounts (15k contracts) - AttackOrchestrator.sol: Optimized attack coordinator (3,650 gas per attack) - deep_branch_testing.py: EEST test harness for pre-deployed contracts - README.md: Complete documentation and setup instructions Performance optimizations: - Reduced gas forwarding from 50k to 3,650 per attack (8.3x throughput increase) - MAX_ATTACKS_PER_TX increased from 303 to 2,510 - Precise EVM opcode cost analysis with safety margins - Read init_code_hash directly from JSON instead of recompiling Deployment setup and instructions available at: https://gist.github.com/CPerezz/44d521c0f9e6adf7d84187a4f2c11978 This benchmark helps identify performance bottlenecks in state trie handling and validates client implementations under extreme depth conditions. * fix(AttackOrchestrator): increase gas forwarded to 5300 for SSTORE The attack() call was forwarding only 3650 gas, which is insufficient for SSTORE operations on cold storage slots. SSTORE requires: - 2100 gas for cold slot access - 2900 gas for zero-to-nonzero write - Plus dispatch overhead (~200 gas) Updated to forward 5300 gas to ensure SSTORE succeeds. * feat(Verifier): add contract for post-attack storage verification Adds a minimal Verifier contract that checks if a target contract's deepest storage slot was updated to the expected attack value. This enables the test to verify attack success without expensive post-state checks on all attacked contracts. The verify() function calls getDeepest() on the target and compares the returned value against the expected attack value. * refactor(deep_branch_testing): use CREATE2 address derivation and fix gas Major refactor of the depth benchmark test for execute mode: - Remove stubs dependency; derive contract addresses directly from init_code_hash + Nick's deployer using CREATE2 formula - Deploy AttackOrchestrator and Verifier as part of test execution - Dynamically compute NUM_CONTRACTS based on gas_benchmark_value - Add verification transaction at end of block to confirm attack success - Fix gas constants based on empirical measurements: - GAS_PER_ATTACK: 8014 -> 8050 (measured ~8042) - MAX_ATTACKS_PER_TX: 1990 -> 1980 (safety margin) - TX_OVERHEAD: 22900 -> 22600 (more accurate) The previous gas constants caused all attack transactions to run out of gas, as the 28 gas/attack shortfall compounded over 1990 attacks to ~55k gas deficit. * refactor(depth-benchmarks): download assets from GitHub, embed bytecode - Embed AttackOrchestrator and Verifier bytecode directly in Python - Add download_mined_asset() to fetch JSON/SOL files from GitHub - Cache downloaded files locally in .cache/ directory - Remove local .sol and .json asset files (now downloaded on demand) - Update test parameters to use (10, 6) available from GitHub - Add gist reference for contract sources Contract sources: https://gist.github.com/CPerezz/8686da933fa5c045fbdf7c31e20e6c71 Mined assets: https://github.com/CPerezz/worst_case_miner/tree/master/mined_assets * style: run ruff format on deep_branch_testing.py * fix: add mypy type annotations for deep_branch_testing.py * refactor(depth-benchmarks): code review improvements - Remove unused ATTACK_SELECTOR constant - Extract magic numbers to named constants (gas limits, fees, etc.) - Add zero contracts validation to prevent edge case bugs - Fix unused fork parameter (rename to _fork) - Replace print warning with warnings.warn - Fix docstring math discrepancy (~2,742 not 2,750) - Fix line length issues and add proper type annotations * feat(git): Add `CPerezz/worst_case_miner` submodule * feat(tests/benchmarking): Update deep branch tests * fix: Update test file description * cleanup * refactor: Simplify using new tools * fix: Review comments * fix(tests): Update submodule * fix: review comments --------- Co-authored-by: CPerezz <cperezz19@pm.me>
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,4 @@
+[submodule "tests/benchmark/stateful/bloatnet/depth_benchmarks/.worst_case_miner"]
+    path = tests/benchmark/stateful/bloatnet/depth_benchmarks/.worst_case_miner
+    url = https://github.com/CPerezz/worst_case_miner
+    branch = master
diff --git a/tests/benchmark/stateful/bloatnet/depth_benchmarks/.worst_case_miner b/tests/benchmark/stateful/bloatnet/depth_benchmarks/.worst_case_miner
@@ -0,0 +1 @@
+Subproject commit c75646fe1c09db3759b093fd044afd2c5008e8be
diff --git a/tests/benchmark/stateful/bloatnet/depth_benchmarks/README.md b/tests/benchmark/stateful/bloatnet/depth_benchmarks/README.md
@@ -0,0 +1,72 @@
+# Depth Benchmark Tests
+
+This directory contains tests for worst-case depth attacks on Ethereum state and account tries.
+
+## Scenario Description
+
+These benchmarks test the worst-case scenario for Ethereum clients when dealing with extremely deep state and account tries. The attack involves:
+
+1. **Pre-deployed contracts** with deep storage tries that maximize trie traversal costs
+2. **CREATE2-based addressing** for deterministic contract addresses across test runs
+3. **Optimized batched attacks** using an AttackOrchestrator contract that can execute up to 1,980 attacks per transaction
+4. **Account trie depth** increased by funding auxiliary accounts that make the path deeper
+
+The test measures the performance impact of state root recomputation and IO when modifying deep storage slots across thousands of contracts, simulating the maximum theoretical load on the state trie.
+
+## Contract Sources
+
+- **Pre-mined assets** (depth\__.sol, s_\_acc\*.json): https://github.com/CPerezz/worst_case_miner/tree/master/mined_assets
+
+For complete deployment setup and instructions, see the gist: https://gist.github.com/CPerezz/44d521c0f9e6adf7d84187a4f2c11978
+
+To update the submodule in this repository to the latest master in `CPerezz/worst_case_miner` run the following command: `git submodule update --remote --merge tests/benchmark/stateful/bloatnet/depth_benchmarks/.worst_case_miner`.
+
+## Prerequisites
+
+- Python with `uv` package manager
+- Anvil (Ethereum node implementation) or another EVM client
+- Nick's factory deployed at `0x4e59b44847b379578588920ca78fbf26c0b4956c` (automatically deployed by `execute` otherwise)
+
+## Workflow
+
+### Step 1: Start the Node (Anvil in this example)
+
+```bash
+# Start Anvil with high gas limit and auto-mining
+anvil --hardfork prague --block-time 6 --steps-tracing --gas-limit 500000000 --balance 99999999999999 --port 8545
+```
+
+### Step 2: Obtain the mined assets
+
+```bash
+git submodule update --init --recursive
+```
+
+### Step 3: Run Attack Test
+
+Execute the worst-case depth attack test:
+
+```bash
+# Run the attack test
+export RPC_ENDPOINT=<RPC endpoint>
+export RPC_SEED_KEY=<Account with funds>
+export RPC_CHAIN_ID=<RPC chain ID>
+uv run execute remote \
+  --gas-benchmark-values 60 \
+  --fork Prague \
+  -m stateful \
+  tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py
+```
+
+## Available Configurations
+
+Currently available pre-mined assets from [worst_case_miner](https://github.com/CPerezz/worst_case_miner/tree/master/mined_assets):
+
+| Storage Depth | Account Depth | File          |
+| ------------- | ------------- | ------------- |
+| 10            | 6             | s10_acc6.json |
+| 10            | 7             | s10_acc7.json |
+| 11            | 6             | s11_acc6.json |
+| 11            | 7             | s11_acc7.json |
+
+To generate new configurations, use [worst_case_miner](https://github.com/CPerezz/worst_case_miner).
diff --git a/tests/benchmark/stateful/bloatnet/depth_benchmarks/__init__.py b/tests/benchmark/stateful/bloatnet/depth_benchmarks/__init__.py
@@ -0,0 +1,3 @@
+"""
+abstract: BloatNet worst-case attack benchmark for maximum SSTORE stress.
+"""
diff --git a/tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py b/tests/benchmark/stateful/bloatnet/depth_benchmarks/test_deep_branch.py

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+"""`
	`2`	`+abstract: BloatNet worst-case attack benchmark for maximum SSTORE stress.`
	`3`	`+"""`