feat(scripts): use dynamic addresses in ported static tests#2695
Conversation
8c0f890 to
f74f2a3
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2695 +/- ##
================================================
Coverage 88.17% 88.17%
================================================
Files 577 577
Lines 35659 35659
Branches 3490 3490
================================================
Hits 31442 31442
Misses 3654 3654
Partials 563 563
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Update since the initial descriptionAnalyzer enhancements (round 2)
|
Final verification resultsCompleted the full fill+verify run (
The 117 fill failures are all from the 3 STRUCTURAL tests in Every trace that was successfully generated matches baseline under |
kclowes
left a comment
There was a problem hiding this comment.
This looks good to me @leolara, nice work!! I left a couple comments that claude flagged, so feel free to disregard if they're incorrect. @marioevz suggested to look at the pre_alloc mutable flag, so that one is probably worth addressing here or at least in a new PR shortly after this. I'll approve so I'm not blocking, and I'm happy to implement any of these comments if that's helpful too, just lmk.
| dynamic_eoa_addrs: set[Address | EOA] = set() | ||
| for acct in accounts: | ||
| if acct.is_eoa and acct.use_dynamic and acct.address is not None: | ||
| dynamic_eoa_addrs.add(acct.address) |
There was a problem hiding this comment.
Functionally harmless, so feel free to take or leave or address in a new PR, but:
dynamic_eoa_addrs only collects addresses of dynamic EOAs, so the skip check on this line only filters EOAs. Non-tagged dynamic contracts fall through to line 1696 and
get emitted as contract_N = Address(0x…) — which is then shadowed by pre.deploy_contract(...) in the accounts block. Tagged contracts are correctly handled by the ContractTag skip at
line 1672; this is the gap for the non-tagged path.
Example: tests/ported_static/stRandom/test_random_statetest307.py — line 36 declares contract_0 = Address(0x095E7BAE…), then line 66 rebinds contract_0 = pre.deploy_contract(...) with no
address= kwarg.
A scan of tests/ported_static/ shows this potentially affects 204 files / 360 variable bindings.
Suggested fix:
skip: set[Address | EOA] = set()
for acct in accounts:
if acct.use_dynamic and acct.address is not None:
skip.add(acct.address)Then in lines 1676, and 1686, something like:
if resolved in skip:
continue| {% if has_exceptions and not is_multi_case %} | ||
| @pytest.mark.exception_test | ||
| {% endif %} | ||
| @pytest.mark.pre_alloc_mutable |
There was a problem hiding this comment.
It looks like the majority of tests ported don't need this @pytest.mark.pre_alloc_mutable line (Claude says 1358/2137 don't fwiw).
Consider adding a boolean so tests can be run with the execute plugin:
ir.py — add field:
@dataclass
class IntermediateTestModel:
...
needs_mutable_pre: bool = Falseanalyzer.py — compute it before returning IR (around line 452):
needs_mutable_pre = (
not sender_ir.use_dynamic
or any(a.is_eoa and not a.use_dynamic for a in accounts)
or any(a.oversized_code for a in accounts)
)state_test.py.j2:162 — wrap the mark:
{% if needs_mutable_pre %}
@pytest.mark.pre_alloc_mutable
{% endif %}
There was a problem hiding this comment.
I added this but changed a bit the predicate.
oversized implies not use_dynamic and also there could be cotract addressses that don't use dynamic.
so
needs_mutable_pre = not sender_ir.use_dynamic or any(
not a.use_dynamic for a in accounts
)
| short_push_refs: set[Address] = set() | ||
| # True when a short-PUSH ref targets an address that is not a | ||
| # deployed contract in the pre-state (e.g. an external tag like | ||
| # <contract:0x...dead> only referenced from bytecode). In that case | ||
| # no contract can be pinned, so fall back to globally disabling | ||
| # dynamic addresses for the whole test. | ||
| short_push_unpinnable = False | ||
| for addr, cb in code_bytes_map.items(): | ||
| refs = _find_address_refs_in_bytecode(cb, all_known_addrs) | ||
| # Track deps on other contracts. Keep self-references — they | ||
| # create self-loops detected as cycles, forcing hardcoded addr. | ||
| deps[addr] = set(refs) & known_contract_addrs | ||
| for ref_addr, push_size in refs.items(): | ||
| if push_size < 20: | ||
| if ref_addr in known_contract_addrs: | ||
| short_push_refs.add(ref_addr) | ||
| else: | ||
| short_push_unpinnable = True |
There was a problem hiding this comment.
Claude generated, feel free to take or leave if incorrect:
At analyzer.py:1094-1111, a short-PUSH ref to a non-contract (e.g. a precompile at 0x01 funded as an EOA) sets short_push_unpinnable=True. The fallback at line 1225-1246 then disables dynamic for contracts only — EOAs escape and land at pre.fund_eoa() random addresses, losing the baseline precompile-funding semantic.
Example: tests/ported_static/stRevertTest/test_revert_precompiled_touch_exact_oog_paris.py emits addr_5 = pre.fund_eoa(amount=1) instead of pinning 0x01. The bytecode's CALL 0x01
hits the empty precompile, so touch semantics diverge from the filler.
Scope: 278 fillers have pre-state accounts at 0x01-0x10; only 4 are in FORCE_HARDCODED_TESTS, leaving ~274 unprotected.
Why trace verification didn't catch it: baseline traces were generated from the same converter output, so both sides share the bug and compare equivalent. An independent baseline (e.g.
retesteth against the original filler) would flag the divergence.
Fix: pin EOA short-PUSH targets individually instead of falling back to the global short_push_unpinnable:
# around analyzer.py:1094
known_eoa_addrs = {a.address for a in raw_accounts
if a.is_eoa and a.address is not None}
short_push_eoa_refs: set[Address] = set()
for addr, cb in code_bytes_map.items():
refs = _find_address_refs_in_bytecode(cb, all_known_addrs)
deps[addr] = set(refs) & known_contract_addrs
for ref_addr, push_size in refs.items():
if push_size < 20:
if ref_addr in known_contract_addrs:
short_push_refs.add(ref_addr)
elif ref_addr in known_eoa_addrs:
short_push_eoa_refs.add(ref_addr)
else:
short_push_unpinnable = True
# around analyzer.py:1146, after the propagation loop
for acct in raw_accounts:
if acct.address in short_push_eoa_refs:
acct.use_dynamic = False
The 4 existing allowlist entries then become candidates for removal pending an independent re-verification.
I think it is fixed. It was not detected because I was not running the slow ones. I think that is also why the CI didn't get it? |
Tests that fully use dynamic addressing (pre.fund_eoa for senders/EOAs, pre.deploy_contract without address=) don't mutate the pre-allocation and shouldn't carry @pytest.mark.pre_alloc_mutable — having it blocks them from running under the execute plugin against live networks. Add IntermediateTestModel.needs_mutable_pre and only emit the marker when the sender or any account is hardcoded (i.e. use_dynamic=False on sender or any account). 771/2151 tests now drop the marker.
The previous predicate missed two assert_mutable() triggers:
- pre.fund_eoa(nonce=...): trips when sender has explicit nonce kwarg.
Caught by sender_emits_nonce_kwarg = bool(sender_ir.nonce) which
matches the template's `{% if sender.nonce %}` guard.
- pre.deploy_contract(..., nonce=0): trips for any contract whose
nonce in the filler is 0 or unset (the template emits nonce=0 by
default). Caught by contract_nonce_zero scanning non-EOA accounts.
Result: 2083/2151 tests carry @pytest.mark.pre_alloc_mutable, 68 are
execute-plugin-eligible. Previous (incorrect) predicate produced 771
eligible — those would all have failed at fill time with
"Cannot set items in immutable mode".
Two related changes that close the precompile-as-EOA gap. 1. In analyze(), after model.pre.setup(...) resolves tags, override tags[name] = Address(hint) when a SenderTag's name is a hex literal in 0x01..0x10. The static-state setup() resolves <eoa:0x01> through eoa_from_hash(), giving the EOA a random placeholder address even though the LLL source CALLs the literal precompile. Restoring the hint makes addr_to_var register 0x01 so downstream detection sees the bytecode's PUSH1 0x01 as a known EOA reference. 2. In _build_accounts, track short_push_eoa_refs alongside the existing short_push_refs (the reviewer's suggestion). EOAs referenced via PUSH<20 get pinned with use_dynamic=False so the funded account lands at the literal precompile address instead of a random pre.fund_eoa() address. Together these let three precompile-touch tests come off the FORCE_HARDCODED_TESTS allowlist: - stRevertTest/test_revert_precompiled_touch_nonce.py - stRevertTest/test_revert_precompiled_touch_noncestorage.py - stRevertTest/test_revert_precompiled_touch_storage_paris.py The remaining ecrecover/precompile-touch entries stay on the list for unrelated reasons (cryptographic signatures hardcoded in tx data). Full Osaka fill: 20,253 / 20,253 — no regressions.
The previous commit's hint override (tags[name] = Address(0x01)) makes addr_to_var register 0x01, so tx-data and post-state resolutions substitute hex literals with addr_5/addr_6/... variable references. Without pinning those EOAs hardcoded, addr_X resolves through pre.fund_eoa() at runtime to a random address, and tx-data carries that random address instead of the precompile literal — a contract that CALLs 0x01 in its bytecode would still hit the precompile while tx-data carries the wrong address, breaking trace equivalence. Track the hint-overridden addresses in pinned_eoa_addrs, thread that through _build_accounts, and pin the matching EOAs after the short-PUSH EOA pinning. Without this, full fill+verify-traces showed 576 divergences on stRevertTest/test_revert_precompiled_touch_exact_oog_paris; with this fix, all 39,002 / 39,002 comparisons are equivalent under exact-no-stack.
bfd6ce8 to
a2f8705
Compare
PR summary (current state)What this PR doesConvert all ported static tests under Achieved without changing test semantics — verified by 39,002 / 39,002 trace comparisons equivalent under What changed
Verification
|
…ing entries The list previously contained 5469 entries — many for tests that pass on snøbal/4 today (test source updated, addresses now dynamic, etc.) or for tests that no longer exist after PR ethereum#2695's regeneration. Empirically: - With the bloated list: 17199 passed, 1731 skipped, 0 failed. - With no list: 17565 passed, 0 skipped, 1365 failed (gas). - After this prune: 17565 passed, 1365 skipped, 0 failed. The remaining 455 entries are exactly those that match a currently- failing fork_Amsterdam parametrization (still legitimate InsufficientTransactionGasError under EIP-8037's two-dimensional gas model). Section-header counts updated to reflect the actual contents.
🗒️ Description
Modify
filler_to_pythonto generate ported static tests that use dynamically-assigned addresses (pre.fund_eoa(),pre.deploy_contract()withoutaddress=) instead of hardcoding every address from the original fillers. This makes the ported tests behave like hand-written EEST tests, where the framework assigns addresses at fill time.What changed
scripts/filler_to_python/analyzer.py— core logic:_bytes_to_op_expr(code_bytes, addr_to_var), so emitted Op expressions use variable names instead of hex literals.compute_create_address(...)expressions.Op.ADD(contract_var, ...)) — sequential-address dispatch tablesaddress=Op.ADD(Op.CALLDATALOAD(...), ...)) — runtime dispatchFORCE_HARDCODED_TESTSallowlist (51 tests) for cases that can't converge underexact-no-stacktrace verification with dynamic addresses: EIP-2929 gas differences, CREATE2 collisions, keccak-derived storage keys, structural tx-rejection differences, etc.scripts/filler_to_python/ir.py—use_dynamicflag onAccountIR,SenderIR,AccessListEntryIR;noncefield onSenderIR.scripts/filler_to_python/templates/state_test.py.j2— conditional rendering:pre.fund_eoa(amount=...)for dynamic senders/EOAs,pre.deploy_contract(...)withoutaddress=for dynamic contracts.scripts/filler_to_python/render.py—format_storagehandles string variable references alongside int values.packages/testing/src/execution_testing/client_clis/trace_comparators.py—exact-no-stacknow also excludesreturn_data(addresses produced by runtime opcodes likeCALLERland on the stack and inreturn_dataafterRETURN/REVERT; excluding only stack just shifts the divergence forward).tests/ported_static/— all 2151 test files regenerated.Trace verification
Verified against baseline traces (generated once from
forks/amsterdamtip) using--verify-traces-comparator exact-no-stack:🔗 Related Issues or PRs
N/A.
✅ Checklist
type(scope):.just static@ported_frommarker.