Skip to content

feat(scripts): use dynamic addresses in ported static tests#2695

Open
leolara wants to merge 13 commits intoethereum:forks/amsterdamfrom
leolara:leolara/static-test-non-harcoded-addresses
Open

feat(scripts): use dynamic addresses in ported static tests#2695
leolara wants to merge 13 commits intoethereum:forks/amsterdamfrom
leolara:leolara/static-test-non-harcoded-addresses

Conversation

@leolara
Copy link
Copy Markdown
Member

@leolara leolara commented Apr 16, 2026

🗒️ Description

Modify filler_to_python to generate ported static tests that use dynamically-assigned addresses (pre.fund_eoa(), pre.deploy_contract() without address=) instead of hardcoding every address from the original fillers. This makes the ported tests behave like hand-written EEST tests, where the framework assigns addresses at fill time.

What changed

scripts/filler_to_python/analyzer.py — core logic:

  • Build a dependency graph of contracts by scanning bytecode for address references, then topologically sort so each contract is declared before it's used.
  • Resolve known addresses in bytecode via _bytes_to_op_expr(code_bytes, addr_to_var), so emitted Op expressions use variable names instead of hex literals.
  • Resolve known addresses in pre-state and post-state storage values, replacing them with variable references or compute_create_address(...) expressions.
  • Detect patterns that are incompatible with dynamic addresses and fall back to hardcoded:
    • Oversized contracts (>24 KiB)
    • Circular bytecode-reference dependencies (self-referencing contracts)
    • Address arithmetic (Op.ADD(contract_var, ...)) — sequential-address dispatch tables
    • Computed call targets (address=Op.ADD(Op.CALLDATALOAD(...), ...)) — runtime dispatch
    • Short-PUSH address refs (PUSH<20 for addresses with leading-zero bytes)
    • Unresolvable post-state addresses
  • FORCE_HARDCODED_TESTS allowlist (51 tests) for cases that can't converge under exact-no-stack trace verification with dynamic addresses: EIP-2929 gas differences, CREATE2 collisions, keccak-derived storage keys, structural tx-rejection differences, etc.

scripts/filler_to_python/ir.pyuse_dynamic flag on AccountIR, SenderIR, AccessListEntryIR; nonce field on SenderIR.

scripts/filler_to_python/templates/state_test.py.j2 — conditional rendering: pre.fund_eoa(amount=...) for dynamic senders/EOAs, pre.deploy_contract(...) without address= for dynamic contracts.

scripts/filler_to_python/render.pyformat_storage handles string variable references alongside int values.

packages/testing/src/execution_testing/client_clis/trace_comparators.pyexact-no-stack now also excludes return_data (addresses produced by runtime opcodes like CALLER land on the stack and in return_data after RETURN/REVERT; excluding only stack just shifts the divergence forward).

tests/ported_static/ — all 2151 test files regenerated.

Trace verification

Verified against baseline traces (generated once from forks/amsterdam tip) using --verify-traces-comparator exact-no-stack:

  • 37,740 / 37,740 trace comparisons equivalent (100%)
  • 0 divergent test files
  • 1,686 / 1,686 test files fully equivalent

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@leolara leolara force-pushed the leolara/static-test-non-harcoded-addresses branch from 8c0f890 to f74f2a3 Compare April 16, 2026 08:15
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.26%. Comparing base (363df20) to head (bff76bc).
⚠️ Report is 4 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2695   +/-   ##
================================================
  Coverage            86.26%   86.26%           
================================================
  Files                  599      599           
  Lines                37038    37038           
  Branches              3795     3795           
================================================
  Hits                 31949    31949           
  Misses                4525     4525           
  Partials               564      564           
Flag Coverage Δ
unittests 86.26% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leolara
Copy link
Copy Markdown
Member Author

leolara commented Apr 17, 2026

Update since the initial description

Analyzer enhancements (round 2)

  • Coinbase pinning — when an account's address matches env.current_coinbase, force use_dynamic=False. Prevents the generator bug where coinbase was rebound to a fresh fund_eoa EOA, leaving the env's coinbase address unfunded.
  • Sender pinning on unresolvable post-state — when has_unresolved = True, disable dynamic for every account including sender (was: only contracts). Required because CREATE-derived post-state addresses depend on the sender address.
  • Nested CREATE resolution (depth 2, nonce 0–15) in _resolve_address — covers post-state addresses produced by a contract that was itself CREATE'd from a known address.
  • Address-like storage-value detectionhas_unresolved also triggers on any post-state storage value that is a large int (≥ 2³²) not resolved to a variable. Catches unresolvable addresses inside storage that would otherwise fail at fill time.

FORCE_HARDCODED_TESTS expanded: 46 → 130

The initial list targeted the 63 exact-no-stack trace divergences. CI then surfaced additional tests that fail at state-assertion time (not trace time) under dynamic addresses — the same 253-test tail documented in plan-deviations.md. After the analyzer round-2 fixes recovered ~97 of those automatically, the remaining ~79 files went to the allowlist with categorised comments (gas measurements, keccak storage, collision semantics, precompile-as-EOA, CREATE2-derived addresses).

Investigation: pre-existing fill --traces framework bug

Running fill --traces on pre-execution-rejected transactions (e.g. EIP-3607 CREATE-into-non-empty, CREATE collisions) crashes with FileNotFoundError in transition_tool.py::collect_traces. Reproduced identically on origin/forks/amsterdam in a clean worktree — not introduced by this PR. CI is unaffected because just fill omits --traces.

Root cause: when a tx raises EthereumException during process_transaction, the tracer's TransactionEnd event never fires (so no trace-N-<hash>.jsonl is written), but the tx still appears in result.receipts. collect_traces iterates receipts and assumes a trace file exists for each.

Suggested fix (separate PR against forks/amsterdam): ~5-line defensive change in collect_traces to skip missing trace files and append an empty TransactionTraces.

Verification

  • Regeneration: 2137 ports + 14 @manually-enhanced skipped = 2151/2151 ok.
  • Full fill+verify with exact-no-stack (-n 10, not slow): 37,740 / 37,740 trace comparisons equivalent, 0 divergences.
  • Fill without --traces (same config as CI): all tests pass on this branch and on forks/amsterdam.
  • For the 117 --traces-only failures (3 files, all in the hardcoded allowlist): their fixtures without --traces are byte-identical to forks/amsterdam (27/27 shared, 27/27 content-identical after stripping _info).

@leolara
Copy link
Copy Markdown
Member Author

leolara commented Apr 17, 2026

Final verification results

Completed the full fill+verify run (-n 10, not slow):

  • Fill: 60,359 passed, 117 failed
  • Trace verification (exact-no-stack): 39,002 / 39,002 equivalent — 0 divergences

The 117 fill failures are all from the 3 STRUCTURAL tests in FORCE_HARDCODED_TESTS, caused by the pre-existing fill --traces framework bug. CI (which uses just fill without --traces) is unaffected.

Every trace that was successfully generated matches baseline under exact-no-stack.

@leolara leolara marked this pull request as ready for review April 17, 2026 12:32
@marioevz marioevz self-assigned this Apr 22, 2026
@marioevz marioevz self-requested a review April 22, 2026 12:49
Copy link
Copy Markdown
Contributor

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @leolara, nice work!! I left a couple comments that claude flagged, so feel free to disregard if they're incorrect. @marioevz suggested to look at the pre_alloc mutable flag, so that one is probably worth addressing here or at least in a new PR shortly after this. I'll approve so I'm not blocking, and I'm happy to implement any of these comments if that's helpful too, just lmk.

Comment on lines +1642 to +1645
dynamic_eoa_addrs: set[Address | EOA] = set()
for acct in accounts:
if acct.is_eoa and acct.use_dynamic and acct.address is not None:
dynamic_eoa_addrs.add(acct.address)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally harmless, so feel free to take or leave or address in a new PR, but:

dynamic_eoa_addrs only collects addresses of dynamic EOAs, so the skip check on this line only filters EOAs. Non-tagged dynamic contracts fall through to line 1696 and
get emitted as contract_N = Address(0x…) — which is then shadowed by pre.deploy_contract(...) in the accounts block. Tagged contracts are correctly handled by the ContractTag skip at
line 1672; this is the gap for the non-tagged path.

Example: tests/ported_static/stRandom/test_random_statetest307.py — line 36 declares contract_0 = Address(0x095E7BAE…), then line 66 rebinds contract_0 = pre.deploy_contract(...) with no
address= kwarg.

A scan of tests/ported_static/ shows this potentially affects 204 files / 360 variable bindings.
Suggested fix:

skip: set[Address | EOA] = set()
for acct in accounts:
    if acct.use_dynamic and acct.address is not None:
        skip.add(acct.address)

Then in lines 1676, and 1686, something like:

if resolved in skip:
    continue

{% if has_exceptions and not is_multi_case %}
@pytest.mark.exception_test
{% endif %}
@pytest.mark.pre_alloc_mutable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the majority of tests ported don't need this @pytest.mark.pre_alloc_mutable line (Claude says 1358/2137 don't fwiw).

Consider adding a boolean so tests can be run with the execute plugin:

ir.py — add field:

  @dataclass
  class IntermediateTestModel:
      ...
      needs_mutable_pre: bool = False

analyzer.py — compute it before returning IR (around line 452):

  needs_mutable_pre = (
      not sender_ir.use_dynamic
      or any(a.is_eoa and not a.use_dynamic for a in accounts)
      or any(a.oversized_code for a in accounts)
  )

state_test.py.j2:162 — wrap the mark:

  {% if needs_mutable_pre %}
  @pytest.mark.pre_alloc_mutable
  {% endif %}

Comment on lines +1094 to +1111
short_push_refs: set[Address] = set()
# True when a short-PUSH ref targets an address that is not a
# deployed contract in the pre-state (e.g. an external tag like
# <contract:0x...dead> only referenced from bytecode). In that case
# no contract can be pinned, so fall back to globally disabling
# dynamic addresses for the whole test.
short_push_unpinnable = False
for addr, cb in code_bytes_map.items():
refs = _find_address_refs_in_bytecode(cb, all_known_addrs)
# Track deps on other contracts. Keep self-references — they
# create self-loops detected as cycles, forcing hardcoded addr.
deps[addr] = set(refs) & known_contract_addrs
for ref_addr, push_size in refs.items():
if push_size < 20:
if ref_addr in known_contract_addrs:
short_push_refs.add(ref_addr)
else:
short_push_unpinnable = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude generated, feel free to take or leave if incorrect:

At analyzer.py:1094-1111, a short-PUSH ref to a non-contract (e.g. a precompile at 0x01 funded as an EOA) sets short_push_unpinnable=True. The fallback at line 1225-1246 then disables dynamic for contracts only — EOAs escape and land at pre.fund_eoa() random addresses, losing the baseline precompile-funding semantic.

Example: tests/ported_static/stRevertTest/test_revert_precompiled_touch_exact_oog_paris.py emits addr_5 = pre.fund_eoa(amount=1) instead of pinning 0x01. The bytecode's CALL 0x01
hits the empty precompile, so touch semantics diverge from the filler.

Scope: 278 fillers have pre-state accounts at 0x01-0x10; only 4 are in FORCE_HARDCODED_TESTS, leaving ~274 unprotected.

Why trace verification didn't catch it: baseline traces were generated from the same converter output, so both sides share the bug and compare equivalent. An independent baseline (e.g.
retesteth against the original filler) would flag the divergence.

Fix: pin EOA short-PUSH targets individually instead of falling back to the global short_push_unpinnable:

# around analyzer.py:1094
known_eoa_addrs = {a.address for a in raw_accounts
                   if a.is_eoa and a.address is not None}
short_push_eoa_refs: set[Address] = set()
for addr, cb in code_bytes_map.items():
    refs = _find_address_refs_in_bytecode(cb, all_known_addrs)
    deps[addr] = set(refs) & known_contract_addrs
    for ref_addr, push_size in refs.items():
        if push_size < 20:
            if ref_addr in known_contract_addrs:
                short_push_refs.add(ref_addr)
            elif ref_addr in known_eoa_addrs:
                short_push_eoa_refs.add(ref_addr)
            else:
                short_push_unpinnable = True

# around analyzer.py:1146, after the propagation loop
for acct in raw_accounts:
    if acct.address in short_push_eoa_refs:
        acct.use_dynamic = False

The 4 existing allowlist entries then become candidates for removal pending an independent re-verification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants