Skip to content

feat(scripts): use dynamic addresses in ported static tests#2695

Merged
leolara merged 18 commits intoethereum:forks/amsterdamfrom
leolara:leolara/static-test-non-harcoded-addresses
Apr 26, 2026
Merged

feat(scripts): use dynamic addresses in ported static tests#2695
leolara merged 18 commits intoethereum:forks/amsterdamfrom
leolara:leolara/static-test-non-harcoded-addresses

Conversation

@leolara
Copy link
Copy Markdown
Member

@leolara leolara commented Apr 16, 2026

🗒️ Description

Modify filler_to_python to generate ported static tests that use dynamically-assigned addresses (pre.fund_eoa(), pre.deploy_contract() without address=) instead of hardcoding every address from the original fillers. This makes the ported tests behave like hand-written EEST tests, where the framework assigns addresses at fill time.

What changed

scripts/filler_to_python/analyzer.py — core logic:

  • Build a dependency graph of contracts by scanning bytecode for address references, then topologically sort so each contract is declared before it's used.
  • Resolve known addresses in bytecode via _bytes_to_op_expr(code_bytes, addr_to_var), so emitted Op expressions use variable names instead of hex literals.
  • Resolve known addresses in pre-state and post-state storage values, replacing them with variable references or compute_create_address(...) expressions.
  • Detect patterns that are incompatible with dynamic addresses and fall back to hardcoded:
    • Oversized contracts (>24 KiB)
    • Circular bytecode-reference dependencies (self-referencing contracts)
    • Address arithmetic (Op.ADD(contract_var, ...)) — sequential-address dispatch tables
    • Computed call targets (address=Op.ADD(Op.CALLDATALOAD(...), ...)) — runtime dispatch
    • Short-PUSH address refs (PUSH<20 for addresses with leading-zero bytes)
    • Unresolvable post-state addresses
  • FORCE_HARDCODED_TESTS allowlist (51 tests) for cases that can't converge under exact-no-stack trace verification with dynamic addresses: EIP-2929 gas differences, CREATE2 collisions, keccak-derived storage keys, structural tx-rejection differences, etc.

scripts/filler_to_python/ir.pyuse_dynamic flag on AccountIR, SenderIR, AccessListEntryIR; nonce field on SenderIR.

scripts/filler_to_python/templates/state_test.py.j2 — conditional rendering: pre.fund_eoa(amount=...) for dynamic senders/EOAs, pre.deploy_contract(...) without address= for dynamic contracts.

scripts/filler_to_python/render.pyformat_storage handles string variable references alongside int values.

packages/testing/src/execution_testing/client_clis/trace_comparators.pyexact-no-stack now also excludes return_data (addresses produced by runtime opcodes like CALLER land on the stack and in return_data after RETURN/REVERT; excluding only stack just shifts the divergence forward).

tests/ported_static/ — all 2151 test files regenerated.

Trace verification

Verified against baseline traces (generated once from forks/amsterdam tip) using --verify-traces-comparator exact-no-stack:

  • 37,740 / 37,740 trace comparisons equivalent (100%)
  • 0 divergent test files
  • 1,686 / 1,686 test files fully equivalent

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@leolara leolara force-pushed the leolara/static-test-non-harcoded-addresses branch from 8c0f890 to f74f2a3 Compare April 16, 2026 08:15
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.17%. Comparing base (8db70f9) to head (a2f8705).

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2695   +/-   ##
================================================
  Coverage            88.17%   88.17%           
================================================
  Files                  577      577           
  Lines                35659    35659           
  Branches              3490     3490           
================================================
  Hits                 31442    31442           
  Misses                3654     3654           
  Partials               563      563           
Flag Coverage Δ
unittests 88.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leolara
Copy link
Copy Markdown
Member Author

leolara commented Apr 17, 2026

Update since the initial description

Analyzer enhancements (round 2)

  • Coinbase pinning — when an account's address matches env.current_coinbase, force use_dynamic=False. Prevents the generator bug where coinbase was rebound to a fresh fund_eoa EOA, leaving the env's coinbase address unfunded.
  • Sender pinning on unresolvable post-state — when has_unresolved = True, disable dynamic for every account including sender (was: only contracts). Required because CREATE-derived post-state addresses depend on the sender address.
  • Nested CREATE resolution (depth 2, nonce 0–15) in _resolve_address — covers post-state addresses produced by a contract that was itself CREATE'd from a known address.
  • Address-like storage-value detectionhas_unresolved also triggers on any post-state storage value that is a large int (≥ 2³²) not resolved to a variable. Catches unresolvable addresses inside storage that would otherwise fail at fill time.

FORCE_HARDCODED_TESTS expanded: 46 → 130

The initial list targeted the 63 exact-no-stack trace divergences. CI then surfaced additional tests that fail at state-assertion time (not trace time) under dynamic addresses — the same 253-test tail documented in plan-deviations.md. After the analyzer round-2 fixes recovered ~97 of those automatically, the remaining ~79 files went to the allowlist with categorised comments (gas measurements, keccak storage, collision semantics, precompile-as-EOA, CREATE2-derived addresses).

Investigation: pre-existing fill --traces framework bug

Running fill --traces on pre-execution-rejected transactions (e.g. EIP-3607 CREATE-into-non-empty, CREATE collisions) crashes with FileNotFoundError in transition_tool.py::collect_traces. Reproduced identically on origin/forks/amsterdam in a clean worktree — not introduced by this PR. CI is unaffected because just fill omits --traces.

Root cause: when a tx raises EthereumException during process_transaction, the tracer's TransactionEnd event never fires (so no trace-N-<hash>.jsonl is written), but the tx still appears in result.receipts. collect_traces iterates receipts and assumes a trace file exists for each.

Suggested fix (separate PR against forks/amsterdam): ~5-line defensive change in collect_traces to skip missing trace files and append an empty TransactionTraces.

Verification

  • Regeneration: 2137 ports + 14 @manually-enhanced skipped = 2151/2151 ok.
  • Full fill+verify with exact-no-stack (-n 10, not slow): 37,740 / 37,740 trace comparisons equivalent, 0 divergences.
  • Fill without --traces (same config as CI): all tests pass on this branch and on forks/amsterdam.
  • For the 117 --traces-only failures (3 files, all in the hardcoded allowlist): their fixtures without --traces are byte-identical to forks/amsterdam (27/27 shared, 27/27 content-identical after stripping _info).

@leolara
Copy link
Copy Markdown
Member Author

leolara commented Apr 17, 2026

Final verification results

Completed the full fill+verify run (-n 10, not slow):

  • Fill: 60,359 passed, 117 failed
  • Trace verification (exact-no-stack): 39,002 / 39,002 equivalent — 0 divergences

The 117 fill failures are all from the 3 STRUCTURAL tests in FORCE_HARDCODED_TESTS, caused by the pre-existing fill --traces framework bug. CI (which uses just fill without --traces) is unaffected.

Every trace that was successfully generated matches baseline under exact-no-stack.

@leolara leolara marked this pull request as ready for review April 17, 2026 12:32
@marioevz marioevz self-assigned this Apr 22, 2026
@marioevz marioevz self-requested a review April 22, 2026 12:49
Copy link
Copy Markdown
Contributor

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @leolara, nice work!! I left a couple comments that claude flagged, so feel free to disregard if they're incorrect. @marioevz suggested to look at the pre_alloc mutable flag, so that one is probably worth addressing here or at least in a new PR shortly after this. I'll approve so I'm not blocking, and I'm happy to implement any of these comments if that's helpful too, just lmk.

Comment on lines +1642 to +1645
dynamic_eoa_addrs: set[Address | EOA] = set()
for acct in accounts:
if acct.is_eoa and acct.use_dynamic and acct.address is not None:
dynamic_eoa_addrs.add(acct.address)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally harmless, so feel free to take or leave or address in a new PR, but:

dynamic_eoa_addrs only collects addresses of dynamic EOAs, so the skip check on this line only filters EOAs. Non-tagged dynamic contracts fall through to line 1696 and
get emitted as contract_N = Address(0x…) — which is then shadowed by pre.deploy_contract(...) in the accounts block. Tagged contracts are correctly handled by the ContractTag skip at
line 1672; this is the gap for the non-tagged path.

Example: tests/ported_static/stRandom/test_random_statetest307.py — line 36 declares contract_0 = Address(0x095E7BAE…), then line 66 rebinds contract_0 = pre.deploy_contract(...) with no
address= kwarg.

A scan of tests/ported_static/ shows this potentially affects 204 files / 360 variable bindings.
Suggested fix:

skip: set[Address | EOA] = set()
for acct in accounts:
    if acct.use_dynamic and acct.address is not None:
        skip.add(acct.address)

Then in lines 1676, and 1686, something like:

if resolved in skip:
    continue

{% if has_exceptions and not is_multi_case %}
@pytest.mark.exception_test
{% endif %}
@pytest.mark.pre_alloc_mutable
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the majority of tests ported don't need this @pytest.mark.pre_alloc_mutable line (Claude says 1358/2137 don't fwiw).

Consider adding a boolean so tests can be run with the execute plugin:

ir.py — add field:

  @dataclass
  class IntermediateTestModel:
      ...
      needs_mutable_pre: bool = False

analyzer.py — compute it before returning IR (around line 452):

  needs_mutable_pre = (
      not sender_ir.use_dynamic
      or any(a.is_eoa and not a.use_dynamic for a in accounts)
      or any(a.oversized_code for a in accounts)
  )

state_test.py.j2:162 — wrap the mark:

  {% if needs_mutable_pre %}
  @pytest.mark.pre_alloc_mutable
  {% endif %}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this but changed a bit the predicate.

oversized implies not use_dynamic and also there could be cotract addressses that don't use dynamic.

so

needs_mutable_pre = not sender_ir.use_dynamic or any(
        not a.use_dynamic for a in accounts
    )

Comment on lines +1094 to +1111
short_push_refs: set[Address] = set()
# True when a short-PUSH ref targets an address that is not a
# deployed contract in the pre-state (e.g. an external tag like
# <contract:0x...dead> only referenced from bytecode). In that case
# no contract can be pinned, so fall back to globally disabling
# dynamic addresses for the whole test.
short_push_unpinnable = False
for addr, cb in code_bytes_map.items():
refs = _find_address_refs_in_bytecode(cb, all_known_addrs)
# Track deps on other contracts. Keep self-references — they
# create self-loops detected as cycles, forcing hardcoded addr.
deps[addr] = set(refs) & known_contract_addrs
for ref_addr, push_size in refs.items():
if push_size < 20:
if ref_addr in known_contract_addrs:
short_push_refs.add(ref_addr)
else:
short_push_unpinnable = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude generated, feel free to take or leave if incorrect:

At analyzer.py:1094-1111, a short-PUSH ref to a non-contract (e.g. a precompile at 0x01 funded as an EOA) sets short_push_unpinnable=True. The fallback at line 1225-1246 then disables dynamic for contracts only — EOAs escape and land at pre.fund_eoa() random addresses, losing the baseline precompile-funding semantic.

Example: tests/ported_static/stRevertTest/test_revert_precompiled_touch_exact_oog_paris.py emits addr_5 = pre.fund_eoa(amount=1) instead of pinning 0x01. The bytecode's CALL 0x01
hits the empty precompile, so touch semantics diverge from the filler.

Scope: 278 fillers have pre-state accounts at 0x01-0x10; only 4 are in FORCE_HARDCODED_TESTS, leaving ~274 unprotected.

Why trace verification didn't catch it: baseline traces were generated from the same converter output, so both sides share the bug and compare equivalent. An independent baseline (e.g.
retesteth against the original filler) would flag the divergence.

Fix: pin EOA short-PUSH targets individually instead of falling back to the global short_push_unpinnable:

# around analyzer.py:1094
known_eoa_addrs = {a.address for a in raw_accounts
                   if a.is_eoa and a.address is not None}
short_push_eoa_refs: set[Address] = set()
for addr, cb in code_bytes_map.items():
    refs = _find_address_refs_in_bytecode(cb, all_known_addrs)
    deps[addr] = set(refs) & known_contract_addrs
    for ref_addr, push_size in refs.items():
        if push_size < 20:
            if ref_addr in known_contract_addrs:
                short_push_refs.add(ref_addr)
            elif ref_addr in known_eoa_addrs:
                short_push_eoa_refs.add(ref_addr)
            else:
                short_push_unpinnable = True

# around analyzer.py:1146, after the propagation loop
for acct in raw_accounts:
    if acct.address in short_push_eoa_refs:
        acct.use_dynamic = False

The 4 existing allowlist entries then become candidates for removal pending an independent re-verification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed this here: 9a3fee6

@marioevz
Copy link
Copy Markdown
Member

There are some tests failing when running uv run filll --clean --fork Osaka ./tests/ported_static, and I think it's because the address of the sender is used directly inside of the contract's bytecode. I think this needs to be fixed before merging. cc @kclowes @leolara

@leolara
Copy link
Copy Markdown
Member Author

leolara commented Apr 25, 2026

There are some tests failing when running uv run filll --clean --fork Osaka ./tests/ported_static, and I think it's because the address of the sender is used directly inside of the contract's bytecode. I think this needs to be fixed before merging. cc @kclowes @leolara

I think it is fixed. It was not detected because I was not running the slow ones. I think that is also why the CI didn't get it?

leolara added 18 commits April 26, 2026 18:19
Tests that fully use dynamic addressing (pre.fund_eoa for senders/EOAs,
pre.deploy_contract without address=) don't mutate the pre-allocation
and shouldn't carry @pytest.mark.pre_alloc_mutable — having it blocks
them from running under the execute plugin against live networks.

Add IntermediateTestModel.needs_mutable_pre and only emit the marker
when the sender or any account is hardcoded (i.e. use_dynamic=False
on sender or any account). 771/2151 tests now drop the marker.
The previous predicate missed two assert_mutable() triggers:

- pre.fund_eoa(nonce=...): trips when sender has explicit nonce kwarg.
  Caught by sender_emits_nonce_kwarg = bool(sender_ir.nonce) which
  matches the template's `{% if sender.nonce %}` guard.

- pre.deploy_contract(..., nonce=0): trips for any contract whose
  nonce in the filler is 0 or unset (the template emits nonce=0 by
  default). Caught by contract_nonce_zero scanning non-EOA accounts.

Result: 2083/2151 tests carry @pytest.mark.pre_alloc_mutable, 68 are
execute-plugin-eligible. Previous (incorrect) predicate produced 771
eligible — those would all have failed at fill time with
"Cannot set items in immutable mode".
Two related changes that close the precompile-as-EOA gap.

1. In analyze(), after model.pre.setup(...) resolves tags, override
   tags[name] = Address(hint) when a SenderTag's name is a hex literal
   in 0x01..0x10. The static-state setup() resolves <eoa:0x01> through
   eoa_from_hash(), giving the EOA a random placeholder address even
   though the LLL source CALLs the literal precompile. Restoring the
   hint makes addr_to_var register 0x01 so downstream detection sees
   the bytecode's PUSH1 0x01 as a known EOA reference.

2. In _build_accounts, track short_push_eoa_refs alongside the existing
   short_push_refs (the reviewer's suggestion). EOAs referenced via
   PUSH<20 get pinned with use_dynamic=False so the funded account
   lands at the literal precompile address instead of a random
   pre.fund_eoa() address.

Together these let three precompile-touch tests come off the
FORCE_HARDCODED_TESTS allowlist:
- stRevertTest/test_revert_precompiled_touch_nonce.py
- stRevertTest/test_revert_precompiled_touch_noncestorage.py
- stRevertTest/test_revert_precompiled_touch_storage_paris.py

The remaining ecrecover/precompile-touch entries stay on the list for
unrelated reasons (cryptographic signatures hardcoded in tx data).
Full Osaka fill: 20,253 / 20,253 — no regressions.
The previous commit's hint override (tags[name] = Address(0x01)) makes
addr_to_var register 0x01, so tx-data and post-state resolutions
substitute hex literals with addr_5/addr_6/... variable references.
Without pinning those EOAs hardcoded, addr_X resolves through
pre.fund_eoa() at runtime to a random address, and tx-data carries
that random address instead of the precompile literal — a contract
that CALLs 0x01 in its bytecode would still hit the precompile while
tx-data carries the wrong address, breaking trace equivalence.

Track the hint-overridden addresses in pinned_eoa_addrs, thread that
through _build_accounts, and pin the matching EOAs after the
short-PUSH EOA pinning. Without this, full fill+verify-traces showed
576 divergences on stRevertTest/test_revert_precompiled_touch_exact_oog_paris;
with this fix, all 39,002 / 39,002 comparisons are equivalent under
exact-no-stack.
@leolara
Copy link
Copy Markdown
Member Author

leolara commented Apr 26, 2026

PR summary (current state)

What this PR does

Convert all ported static tests under tests/ported_static/ from hard-coded addresses (literal Address(0x…) and EOA(key=…)) to dynamic allocation (pre.fund_eoa(), pre.deploy_contract() without address=). Tests now behave like hand-written EEST tests, where the framework picks addresses at fill time. 68 of 2151 tests become runnable under the execute plugin against live networks (no pre_alloc_mutable marker).

Achieved without changing test semantics — verified by 39,002 / 39,002 trace comparisons equivalent under exact-no-stack against pre-conversion baseline traces.

What changed

scripts/filler_to_python/analyzer.py — bulk of the work. New analyzer logic:

  • Topological sort of contracts by bytecode address references; substitute known addresses with variable names (addr_to_var) in emitted Op expressions.
  • Resolve CREATE-derived addresses in post-state storage and account references via compute_create_address(...) (depth 2, nonce 0–15).
  • Coinbase pinning: any account whose address equals env.current_coinbase is kept hardcoded so the env's fee_recipient and the pre-state entry agree.
  • Sender pinning when post-state has unresolvable addresses or address-like storage values (>2³² ints) — keeps CREATE addresses derivable from sender consistent with baseline.
  • Short-PUSH detection: contracts referenced via PUSH<20 (i.e. addresses with leading-zero bytes) are pinned to baseline addresses.
  • Same for EOAs (precompile-touch case): EOAs referenced via short PUSH or hint-overridden into the precompile range (<eoa:0x...01>) are pinned literally at 0x01–0x10.
  • Computed call targets (address=Op.ADD/MLOAD/CALLDATALOAD/...) and address arithmetic trigger a global hardcoded fallback.
  • FORCE_HARDCODED_TESTS allowlist (144 entries) for tests that fundamentally don't survive dynamic addresses (gas-precise tests, keccak-derived storage keys, cryptographic signatures in tx-data, CREATE2 collisions, structural pre-execution rejections).
  • needs_mutable_pre flag — emitted only when assert_mutable() would actually fire (sender or any account hardcoded, sender funded with non-default nonce, or any contract emitted with nonce=0).

scripts/filler_to_python/ir.pyuse_dynamic on AccountIR/SenderIR/AccessListEntryIR, nonce on SenderIR, needs_mutable_pre on IntermediateTestModel.

scripts/filler_to_python/templates/state_test.py.j2 — conditional rendering: pre.fund_eoa(amount=...) / pre.deploy_contract(...) for dynamic tests, EOA(key=...) / pre[var] = Account(...) / pre.deploy_contract(..., address=Address(...)) for hardcoded ones. @pytest.mark.pre_alloc_mutable is only emitted when needs_mutable_pre.

scripts/filler_to_python/render.pyformat_storage handles string variable references; new context fields piped to the template.

tests/ported_static/ — all 2151 test files regenerated. 14 @manually-enhanced files preserved as-is via the existing skip mechanism.

Verification

Check Result
Regeneration 2137 / 2151 OK + 14 skipped (manually-enhanced)
just static clean
Full fill (CI mode) 60,476 / 60,476 passed
Full fill --traces 60,476 / 60,476 passed (depends on PR #2709 in forks/amsterdam)
--verify-traces exact-no-stack 39,002 / 39,002 equivalent

Note: The trace report shows 78 additional "divergences" on 3 STRUCTURAL tests (CREATE-collision / EIP-3607). These are spurious and tracked in #2758 — an asymmetry between in-memory and disk-loaded traces in the verify-traces machinery (extends #2709). The tests themselves pass, fixtures are byte-identical to baseline, and the comparison artifact is independent of this PR.

@leolara leolara merged commit e7043cc into ethereum:forks/amsterdam Apr 26, 2026
18 checks passed
leolara added a commit to leolara/execution-specs that referenced this pull request Apr 30, 2026
…ing entries

The list previously contained 5469 entries — many for tests that pass on
snøbal/4 today (test source updated, addresses now dynamic, etc.) or for
tests that no longer exist after PR ethereum#2695's regeneration. Empirically:

- With the bloated list: 17199 passed, 1731 skipped, 0 failed.
- With no list:          17565 passed, 0 skipped, 1365 failed (gas).
- After this prune:      17565 passed, 1365 skipped, 0 failed.

The remaining 455 entries are exactly those that match a currently-
failing fork_Amsterdam parametrization (still legitimate
InsufficientTransactionGasError under EIP-8037's two-dimensional gas
model). Section-header counts updated to reflect the actual contents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants