feat(scripts): use dynamic addresses in ported static tests by leolara · Pull Request #2695 · ethereum/execution-specs

leolara · 2026-04-16T07:49:39Z

🗒️ Description

Modify filler_to_python to generate ported static tests that use dynamically-assigned addresses (pre.fund_eoa(), pre.deploy_contract() without address=) instead of hardcoding every address from the original fillers. This makes the ported tests behave like hand-written EEST tests, where the framework assigns addresses at fill time.

What changed

scripts/filler_to_python/analyzer.py — core logic:

Build a dependency graph of contracts by scanning bytecode for address references, then topologically sort so each contract is declared before it's used.
Resolve known addresses in bytecode via _bytes_to_op_expr(code_bytes, addr_to_var), so emitted Op expressions use variable names instead of hex literals.
Resolve known addresses in pre-state and post-state storage values, replacing them with variable references or compute_create_address(...) expressions.
Detect patterns that are incompatible with dynamic addresses and fall back to hardcoded:
- Oversized contracts (>24 KiB)
- Circular bytecode-reference dependencies (self-referencing contracts)
- Address arithmetic (Op.ADD(contract_var, ...)) — sequential-address dispatch tables
- Computed call targets (address=Op.ADD(Op.CALLDATALOAD(...), ...)) — runtime dispatch
- Short-PUSH address refs (PUSH<20 for addresses with leading-zero bytes)
- Unresolvable post-state addresses
FORCE_HARDCODED_TESTS allowlist (51 tests) for cases that can't converge under exact-no-stack trace verification with dynamic addresses: EIP-2929 gas differences, CREATE2 collisions, keccak-derived storage keys, structural tx-rejection differences, etc.

scripts/filler_to_python/ir.py — use_dynamic flag on AccountIR, SenderIR, AccessListEntryIR; nonce field on SenderIR.

scripts/filler_to_python/templates/state_test.py.j2 — conditional rendering: pre.fund_eoa(amount=...) for dynamic senders/EOAs, pre.deploy_contract(...) without address= for dynamic contracts.

scripts/filler_to_python/render.py — format_storage handles string variable references alongside int values.

packages/testing/src/execution_testing/client_clis/trace_comparators.py — exact-no-stack now also excludes return_data (addresses produced by runtime opcodes like CALLER land on the stack and in return_data after RETURN/REVERT; excluding only stack just shifts the divergence forward).

tests/ported_static/ — all 2151 test files regenerated.

Trace verification

Verified against baseline traces (generated once from forks/amsterdam tip) using --verify-traces-comparator exact-no-stack:

37,740 / 37,740 trace comparisons equivalent (100%)
0 divergent test files
1,686 / 1,686 test files fully equivalent

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
just static
```
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

…resses

codecov · 2026-04-16T09:37:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.26%. Comparing base (363df20) to head (bff76bc).
⚠️ Report is 4 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #2695   +/-   ##
================================================
  Coverage            86.26%   86.26%           
================================================
  Files                  599      599           
  Lines                37038    37038           
  Branches              3795     3795           
================================================
  Hits                 31949    31949           
  Misses                4525     4525           
  Partials               564      564

Flag	Coverage Δ
unittests	`86.26% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

leolara · 2026-04-17T11:10:30Z

Update since the initial description

Analyzer enhancements (round 2)

Coinbase pinning — when an account's address matches env.current_coinbase, force use_dynamic=False. Prevents the generator bug where coinbase was rebound to a fresh fund_eoa EOA, leaving the env's coinbase address unfunded.
Sender pinning on unresolvable post-state — when has_unresolved = True, disable dynamic for every account including sender (was: only contracts). Required because CREATE-derived post-state addresses depend on the sender address.
Nested CREATE resolution (depth 2, nonce 0–15) in _resolve_address — covers post-state addresses produced by a contract that was itself CREATE'd from a known address.
Address-like storage-value detection — has_unresolved also triggers on any post-state storage value that is a large int (≥ 2³²) not resolved to a variable. Catches unresolvable addresses inside storage that would otherwise fail at fill time.

`FORCE_HARDCODED_TESTS` expanded: 46 → 130

The initial list targeted the 63 exact-no-stack trace divergences. CI then surfaced additional tests that fail at state-assertion time (not trace time) under dynamic addresses — the same 253-test tail documented in plan-deviations.md. After the analyzer round-2 fixes recovered ~97 of those automatically, the remaining ~79 files went to the allowlist with categorised comments (gas measurements, keccak storage, collision semantics, precompile-as-EOA, CREATE2-derived addresses).

Investigation: pre-existing `fill --traces` framework bug

Running fill --traces on pre-execution-rejected transactions (e.g. EIP-3607 CREATE-into-non-empty, CREATE collisions) crashes with FileNotFoundError in transition_tool.py::collect_traces. Reproduced identically on origin/forks/amsterdam in a clean worktree — not introduced by this PR. CI is unaffected because just fill omits --traces.

Root cause: when a tx raises EthereumException during process_transaction, the tracer's TransactionEnd event never fires (so no trace-N-<hash>.jsonl is written), but the tx still appears in result.receipts. collect_traces iterates receipts and assumes a trace file exists for each.

Suggested fix (separate PR against forks/amsterdam): ~5-line defensive change in collect_traces to skip missing trace files and append an empty TransactionTraces.

Verification

Regeneration: 2137 ports + 14 @manually-enhanced skipped = 2151/2151 ok.
Full fill+verify with exact-no-stack (-n 10, not slow): 37,740 / 37,740 trace comparisons equivalent, 0 divergences.
Fill without --traces (same config as CI): all tests pass on this branch and on forks/amsterdam.
For the 117 --traces-only failures (3 files, all in the hardcoded allowlist): their fixtures without --traces are byte-identical to forks/amsterdam (27/27 shared, 27/27 content-identical after stripping _info).

leolara · 2026-04-17T11:15:11Z

Final verification results

Completed the full fill+verify run (-n 10, not slow):

Fill: 60,359 passed, 117 failed
Trace verification (exact-no-stack): 39,002 / 39,002 equivalent — 0 divergences ✅

The 117 fill failures are all from the 3 STRUCTURAL tests in FORCE_HARDCODED_TESTS, caused by the pre-existing fill --traces framework bug. CI (which uses just fill without --traces) is unaffected.

Every trace that was successfully generated matches baseline under exact-no-stack.

kclowes

This looks good to me @leolara, nice work!! I left a couple comments that claude flagged, so feel free to disregard if they're incorrect. @marioevz suggested to look at the pre_alloc mutable flag, so that one is probably worth addressing here or at least in a new PR shortly after this. I'll approve so I'm not blocking, and I'm happy to implement any of these comments if that's helpful too, just lmk.

kclowes · 2026-04-22T21:14:18Z

+    dynamic_eoa_addrs: set[Address | EOA] = set()
+    for acct in accounts:
+        if acct.is_eoa and acct.use_dynamic and acct.address is not None:
+            dynamic_eoa_addrs.add(acct.address)


Functionally harmless, so feel free to take or leave or address in a new PR, but:

dynamic_eoa_addrs only collects addresses of dynamic EOAs, so the skip check on this line only filters EOAs. Non-tagged dynamic contracts fall through to line 1696 and
get emitted as contract_N = Address(0x…) — which is then shadowed by pre.deploy_contract(...) in the accounts block. Tagged contracts are correctly handled by the ContractTag skip at
line 1672; this is the gap for the non-tagged path.

Example: tests/ported_static/stRandom/test_random_statetest307.py — line 36 declares contract_0 = Address(0x095E7BAE…), then line 66 rebinds contract_0 = pre.deploy_contract(...) with no
address= kwarg.

A scan of tests/ported_static/ shows this potentially affects 204 files / 360 variable bindings.
Suggested fix:

skip: set[Address | EOA] = set() for acct in accounts: if acct.use_dynamic and acct.address is not None: skip.add(acct.address)

Then in lines 1676, and 1686, something like:

if resolved in skip: continue

kclowes · 2026-04-22T22:09:05Z

 {% if has_exceptions and not is_multi_case %}
 @pytest.mark.exception_test
 {% endif %}
 @pytest.mark.pre_alloc_mutable


It looks like the majority of tests ported don't need this @pytest.mark.pre_alloc_mutable line (Claude says 1358/2137 don't fwiw).

Consider adding a boolean so tests can be run with the execute plugin:

ir.py — add field:

@dataclass class IntermediateTestModel: ... needs_mutable_pre: bool = False

analyzer.py — compute it before returning IR (around line 452):

needs_mutable_pre = ( not sender_ir.use_dynamic or any(a.is_eoa and not a.use_dynamic for a in accounts) or any(a.oversized_code for a in accounts) )

state_test.py.j2:162 — wrap the mark:

{% if needs_mutable_pre %} @pytest.mark.pre_alloc_mutable {% endif %}

kclowes · 2026-04-22T22:44:22Z

+    short_push_refs: set[Address] = set()
+    # True when a short-PUSH ref targets an address that is not a
+    # deployed contract in the pre-state (e.g. an external tag like
+    # <contract:0x...dead> only referenced from bytecode). In that case
+    # no contract can be pinned, so fall back to globally disabling
+    # dynamic addresses for the whole test.
+    short_push_unpinnable = False
+    for addr, cb in code_bytes_map.items():
+        refs = _find_address_refs_in_bytecode(cb, all_known_addrs)
+        # Track deps on other contracts. Keep self-references — they
+        # create self-loops detected as cycles, forcing hardcoded addr.
+        deps[addr] = set(refs) & known_contract_addrs
+        for ref_addr, push_size in refs.items():
+            if push_size < 20:
+                if ref_addr in known_contract_addrs:
+                    short_push_refs.add(ref_addr)
+                else:
+                    short_push_unpinnable = True


Claude generated, feel free to take or leave if incorrect:

At analyzer.py:1094-1111, a short-PUSH ref to a non-contract (e.g. a precompile at 0x01 funded as an EOA) sets short_push_unpinnable=True. The fallback at line 1225-1246 then disables dynamic for contracts only — EOAs escape and land at pre.fund_eoa() random addresses, losing the baseline precompile-funding semantic.

Example: tests/ported_static/stRevertTest/test_revert_precompiled_touch_exact_oog_paris.py emits addr_5 = pre.fund_eoa(amount=1) instead of pinning 0x01. The bytecode's CALL 0x01
hits the empty precompile, so touch semantics diverge from the filler.

Scope: 278 fillers have pre-state accounts at 0x01-0x10; only 4 are in FORCE_HARDCODED_TESTS, leaving ~274 unprotected.

Why trace verification didn't catch it: baseline traces were generated from the same converter output, so both sides share the bug and compare equivalent. An independent baseline (e.g.
retesteth against the original filler) would flag the divergence.

Fix: pin EOA short-PUSH targets individually instead of falling back to the global short_push_unpinnable:

# around analyzer.py:1094 known_eoa_addrs = {a.address for a in raw_accounts if a.is_eoa and a.address is not None} short_push_eoa_refs: set[Address] = set() for addr, cb in code_bytes_map.items(): refs = _find_address_refs_in_bytecode(cb, all_known_addrs) deps[addr] = set(refs) & known_contract_addrs for ref_addr, push_size in refs.items(): if push_size < 20: if ref_addr in known_contract_addrs: short_push_refs.add(ref_addr) elif ref_addr in known_eoa_addrs: short_push_eoa_refs.add(ref_addr) else: short_push_unpinnable = True # around analyzer.py:1146, after the propagation loop for acct in raw_accounts: if acct.address in short_push_eoa_refs: acct.use_dynamic = False

The 4 existing allowlist entries then become candidates for removal pending an independent re-verification.

leolara added 7 commits April 16, 2026 15:15

Ported static without hardcoded addresses

01dcdb6

Resolve CREATE-derived addresses in post-state storage values

4cd4b5f

Fall back to hardcoded contracts when post-state has unresolvable add…

f9fb2f9

…resses

Detect short-PUSH address refs and computed call targets

85e8d43

Harcode hardcoded addresses static tests

34d6999

Add some more hardcoded

4c8c279

lint

f74f2a3

leolara force-pushed the leolara/static-test-non-harcoded-addresses branch from 8c0f890 to f74f2a3 Compare April 16, 2026 08:15

format-check

4f56f3b

leolara added 4 commits April 16, 2026 17:16

Fix coinbase rebind

83757d3

Fix post-state expecting account with CREATE(2)

bb85fde

Disable dynamic addresses for post-state that is not resolvable

356c92b

Add more tests to FORCE_HARDCODED_TESTS

69b066b

lint

bff76bc

leolara marked this pull request as ready for review April 17, 2026 12:32

marioevz self-assigned this Apr 22, 2026

marioevz self-requested a review April 22, 2026 12:49

kclowes approved these changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scripts): use dynamic addresses in ported static tests#2695

feat(scripts): use dynamic addresses in ported static tests#2695
leolara wants to merge 13 commits intoethereum:forks/amsterdamfrom
leolara:leolara/static-test-non-harcoded-addresses

leolara commented Apr 16, 2026

Uh oh!

codecov Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

leolara commented Apr 17, 2026

Uh oh!

leolara commented Apr 17, 2026

Uh oh!

kclowes left a comment

Uh oh!

kclowes Apr 22, 2026

Uh oh!

kclowes Apr 22, 2026

Uh oh!

kclowes Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leolara commented Apr 16, 2026

🗒️ Description

What changed

Trace verification

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

codecov Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

leolara commented Apr 17, 2026

Update since the initial description

Analyzer enhancements (round 2)

FORCE_HARDCODED_TESTS expanded: 46 → 130

Investigation: pre-existing fill --traces framework bug

Verification

Uh oh!

leolara commented Apr 17, 2026

Final verification results

Uh oh!

kclowes left a comment

Choose a reason for hiding this comment

Uh oh!

kclowes Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

kclowes Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

kclowes Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 16, 2026 •

edited

Loading

`FORCE_HARDCODED_TESTS` expanded: 46 → 130

Investigation: pre-existing `fill --traces` framework bug