feat(ci): overhaul fixture releases#2888
Conversation
abb005f to
8bd6b77
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2888 +/- ##
===================================================
+ Coverage 87.16% 90.50% +3.34%
===================================================
Files 586 535 -51
Lines 35791 32407 -3384
Branches 3364 3011 -353
===================================================
- Hits 31198 29331 -1867
+ Misses 3943 2559 -1384
+ Partials 650 517 -133
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
7eac7b3 to
d558bc7
Compare
|
This PR needs to be merged to fully test out the new workflow, but I did some smaller smoke tests on my fork:
To keep the workflow runnable in minutes on a fork (no Smoke test releases can be found here: https://github.com/spencer-tb/execution-specs/releases |
d558bc7 to
b2ac847
Compare
|
|
|
This is great, thanks. Just one question about benchmark releases. Does it make sense for those to also have benchmark-consensus and benchmark-devnet variants? For example, recently ive been hitting a slight complication which ive had to work around - the benchmark fixtures dont have BALs in them so I couldnt use them locally for optimisation work on top of the devnet changes. I had to synthesise my own fixtures essentially. If we had benchmark-devnet variant for those I wouldve not had to do this workaround |
|
I support @taratorio 's idea if this does not complicate the workflow too much |
danceratopz
left a comment
There was a problem hiding this comment.
Thanks so much for this effort! Looking forward to getting all releases in execution-specs!
I feel a little bit weird about hijacking the MAJOR version for fork/devnet version (fork version is MINOR in EELS releases, so another slight inconsistency). We do have to hope PandaOps never launches feat-devnet-alphaone or similar to avoid invalid versions 😆
On face value, it seems to fit well. And i think one of your major motivations is to avoid hard-coding releases in the hive-tests runner config repos...
https://github.com/ethpandaops/hive-tests/blob/52cf2fcfa6bb698c11344994d8e223e69dd3a969/.github/workflows/hive-devnet-7.yaml#L161-L162
But it might be worth thinking through a little more. If we have two devnet versions (bal-devnet-3,bal-devnet-7) running at the same time, does it still simplify artifact download? I guess we never have hive runner configs for two devnets simultaneously?
As-is we could already do:
consume cache --input=bal-devnet-7@latest
and if we aim to deprecate consume cache then both versions are equally convenient with gh release download, I think?
Versioning scheme in this PR:
#!/usr/bin/env bash
TAG=$(gh release list --repo ethereum/execution-specs --limit 100 --json tagName \
--jq 'map(select(.tagName | startswith("tests-bal-devnet@v7."))) | .[0].tagName')
gh release download "$TAG" --repo ethereum/execution-specs --pattern '*.tar.gz'Versioning scheme currently:
#!/usr/bin/env bash
TAG=$(gh release list --repo ethereum/execution-specs --limit 100 --json tagName \
--jq 'map(select(.tagName | startswith("tests-bal-devnet-7@v"))) | .[0].tagName')
gh release download "$TAG" --repo ethereum/execution-specs --pattern '*.tar.gz'If this does not greatly help downstream convenience I would opt for explicit hard-coded release names here bal-devnet-7) and not add our own custom convention to versioning.
If we keep it, I wonder, at the risk of complicating the workflows, we should automate/hard-code the major version to avoid user error (see the docs corrections below) as suggested here. This would avoid duplicating this in the case of the devnet branch and put it close to the fill --until=<fork> config in the case of a fork branch. The comments on the invalid tag/version highlight that this is error prone.
Could you restructure the docs to keep all (or most of) the "Formats and Release Layout" section, but move it below the "Release Tracks" (or "Test Release Types" if we rename)?
I think removing blockchain_test_engine from the benchmark spec deserves its own PR as it gets a bit lost here.
Hey @taratorio, thanks for asking and the feedback! Yes, this def makes sense in general and we can add these as required.
Were you filling these fixtures yourself? The latest release https://github.com/ethereum/execution-specs/releases/tag/tests-benchmark%40v0.0.9 is only filled for Osaka. There might have been a bit of flux here due to incompatible/incomplete t8n tools for EELS and geth (benchmark releases have been using geth), but as-is today 😆 on |
yes, I filled a few that I was interested in with erigon but not via t8n but my own quick and hacky way |
|
Just discussed with @spencer-tb and @LouisTsai-Csie, this is our suggestion going forward:
|
I mostly agree with this comment except for points 3 and 4, since I think the MAJOR should refer to the devnet/fork: My suggestion is as follows: For
For
I.e. if a client is targeting to join fork/devnet X, it should target MAJOR equal to X, must take the latest MINOR, and should ideally take the latest PATCH. With fork/devnet as MAJOR, the version number alone tells a client whether a release is relevant to them. Under the alternative (MAJOR tracks any spec change), you'd have to combine the version and the -N suffix in the release name to figure out whether a spec change targets your devnet or a different one. Devnet compatibility shouldn't require parsing two fields IMO. Concretely, the mindset just from looking at the version (ignoring the name) should be: I'm a client dev targeting devnet 7, and I was passing tests contained in On the topic of parallel maintenance of two different devnets, this scheme handles this naturally IMO. E.g. if we are maintaining devnets 3 & 7 for the same feature, releasing v3.0.1 after or alongside v7.0.0 is not a problem (think Python 2.x vs 3.x). It is a well-known Semver pattern, and Semver is good at this. We should simply make this rule clear and follow it so we are predictable. On On benchmarking, major and minor should mirror the feature they are targeting, while the patch moves freely at its own pace. |
|
🚢 |
ac49382 to
93a501c
Compare
9ffaf6a to
bfafa90
Compare
bfafa90 to
4b74a4a
Compare
|
PR is now consolidated. Description updated. Benchmark release changes are moved to this PR: #2954. |
danceratopz
left a comment
There was a problem hiding this comment.
Thanks for handling all the comments, respectively pushing the benchmark changes to #2954!
I would like to go over the docs again before merge, but posting this beforehand as the larger blocker right now: Both the validation scripts and docs assume our branches have the fork bal-devnet-7 (as the rest of the ecosystem uses), whereas we currently use devnets/bal/7 in EELS.
| ## Fixture Output Directory Structure | ||
|
|
||
| Inside each format directory, fixtures are grouped by **target fork**. | ||
|
|
||
| The top-level subdirectory identifies the fork **under test**. Below it, | ||
| fixtures mirror the `./tests/` source layout: each directory corresponds | ||
| to the fork where the functionality was originally introduced. Because | ||
| tests declare `valid_from`, a single target fork directory contains | ||
| fixtures from every prior fork whose tests are still valid at that fork. | ||
|
|
||
| ### Consensus fixture layout | ||
|
|
||
| ```text | ||
| fixtures/ | ||
| └── blockchain_tests/ | ||
| ├── for_prague/ # filled targeting Prague | ||
| │ ├── istanbul/ # tests introduced in Istanbul | ||
| │ │ └── eip1344_chainid/... | ||
| │ ├── cancun/ # tests introduced in Cancun | ||
| │ │ └── eip4844_blobs/... | ||
| │ └── prague/ # tests introduced in Prague | ||
| │ └── eip7702_set_code_tx/... | ||
| └── for_osaka/ # filled targeting Osaka | ||
| ├── istanbul/ | ||
| │ └── eip1344_chainid/... | ||
| ├── cancun/ | ||
| │ └── eip4844_blobs/... | ||
| ├── prague/ | ||
| │ └── eip7702_set_code_tx/... | ||
| └── osaka/ # tests introduced in Osaka | ||
| └── eip7692_eof_v1/... | ||
| ``` | ||
| | Blob Transaction Tests | - using the [eels/execute-blobs Simulator](./execute/hive.md#the-eelsexecute-blobs-simulator) | None; executed directly from Python source,</br>using a release tag | | ||
|
|
||
| Other format directories (`state_tests/`, `blockchain_tests_engine/`) | ||
| follow the same layout. | ||
| ## Release Tracks | ||
|
|
||
| ### Benchmark fixture layout | ||
| Fixtures are released on independent tracks. Each track has its own tag namespace, artifact, and cadence. | ||
|
|
||
| When filling with `--gas-benchmark-values`, benchmark tests additionally | ||
| include the gas limit in the subdirectory name (`for_{fork}_at_{gas}M`, | ||
| where `{gas}` is in millions, zero-padded to four digits), with one | ||
| subdirectory per gas value: | ||
| | Track | Tag | Artifact | Scope | Built from | | ||
| | --------- | -------------------- | ------------------------------- | ------------------------------------------------------------------ | ----------------- | | ||
| | Consensus | `consensus@vX.Y.Z` | `fixtures_consensus.tar.gz` | All forks, all tests (including legacy tests) | latest `forks/*` branch | | ||
| | Devnet | `<feat>-devnet@vX.Y.Z` | `fixtures_<feat>-devnet.tar.gz` | All forks, all tests, for an upcoming-fork feature under active devnet testing | the devnet branch | | ||
| | Benchmark | `benchmark@vX.Y.Z` | `fixtures_benchmark.tar.gz` | EVM benchmarking tests | latest `forks/*` branch | | ||
|
|
||
| ```text | ||
| fixtures/ | ||
| └── blockchain_tests/ | ||
| ├── for_osaka_at_0001M/ # 1M gas benchmark | ||
| │ └── benchmark/compute/... | ||
| └── for_osaka_at_0002M/ # 2M gas benchmark | ||
| └── benchmark/compute/... | ||
| ``` |
| branch: | ||
|
|
||
| ```bash | ||
| gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v7.0.0 -f branch=bal-devnet-7 |
There was a problem hiding this comment.
This (and the changes in .github/scripts/) is forward looking to an EELS branch scheme rename?
devnets/bal/7 -> bal-devnet-7.
I just asked in the chat ;-)
| def validate_inputs(feature: str, version: str, branch: str) -> None: | ||
| """ | ||
| Validate the release dispatch inputs before building a matrix. | ||
|
|
||
| Centralize the feature/version checks here so they are unit-testable | ||
| rather than living as inline bash in the release workflow. | ||
|
|
||
| For `<feat>-devnet` releases the major version (`X` of `vX.Y.Z`) | ||
| must equal the devnet number encoded in the release branch, so a | ||
| `bal-devnet` release from `bal-devnet-7` must be tagged `v7.*.*`. | ||
| """ | ||
| if not feature: | ||
| fail("feature name is empty") | ||
| if not VERSION_RE.match(version): | ||
| fail(f"version '{version}' must match vX.Y.Z (e.g. v20.0.0)") | ||
|
|
||
| # A bare `devnet` has no friendly `<feat>-` prefix to tag with. | ||
| if feature in ("devnet", "-devnet"): | ||
| fail("devnet releases require a <feat>- prefix, e.g. bal-devnet") | ||
|
|
||
| # `<feat>-devnet-<n>`: the devnet index belongs in the version (X of | ||
| # vX.Y.Z), not in the feature name. | ||
| if "-devnet-" in feature: | ||
| suggested_feature, _, suggested_index = feature.rpartition("-") | ||
| fail( | ||
| "devnet index must go in 'version', not the feature name; " | ||
| f"did you mean feature={suggested_feature} " | ||
| f"version=v{suggested_index}.0.0?" | ||
| ) | ||
|
|
||
| if feature.endswith("-devnet"): | ||
| if not branch: | ||
| fail( | ||
| "devnet releases require a 'branch' input, " | ||
| "e.g. branch=bal-devnet-7" | ||
| ) | ||
| match = re.search(r"(\d+)$", branch) | ||
| if not match: | ||
| fail( | ||
| f"could not parse a devnet number from branch '{branch}' " | ||
| "(expected a trailing number, e.g. bal-devnet-7)" | ||
| ) | ||
| devnet_number = int(match.group(1)) | ||
| major = int(version.lstrip("v").split(".")[0]) | ||
| if major != devnet_number: | ||
| minor_patch = version.split(".", 1)[1] | ||
| fail( | ||
| f"version major (v{major}) must equal the devnet number " | ||
| f"({devnet_number}) from branch '{branch}'; " | ||
| f"did you mean version=v{devnet_number}.{minor_patch}?" | ||
| ) |
There was a problem hiding this comment.
This validation assumes execution-specs uses <feat>-devnet-<X> scheme, currently we use devnets/bal/7.
We need to sort this out first, there's some debate in the aforementioned chat.
🗒️ Description
This PR outlines a detailed description and updates our test fixture release process for non-benchmark tests.
Key Changes
evm-impl.yamlis merged intoevm.yamland each entry now carries itsimpl,repo,ref,evm-binandxdist.benchmarkclient alias is kept sofeature.yamlworkflows can reference a client by name, andbuild-evm-baseresolves which builder to run via theimplfield.tests,devnetandbenchmarkfeatures (inforks/amsterdam).devnetis specifically chosen so we don't ever need to updatefeature.yamlfor future devnet features. Thedevnetkeyword is used as a substring match on release tagging, and<feat>-devnetkeeps its friendly name.release_fixture_feature.yamlis replaced withrelease_fixtures.yaml:workflow_dispatchwith explicitfeature+versioninputs. The git tag istests-<feature>@<version>(thetestsfeature tags astests@<version>) and the release title is<feature>@<version>.versionmust matchvX.Y.Z,featuremust be non-empty,*-devnetrequires abranch, and anevmoverride must be a key inevm.yaml.evm/evm_repo/evm_refinputs override the client impl and the t8n tool repo/ref for a one off release.osakarange (no standalonebposplit).Tagging A Release
To tag releases we must now use the Github CLI. This has the benefit of only creating tags in EELS if the fixture building process is successful. No more tag deletion and recreation, only workflow triggers.
The following can be ran locally or optionally triggered with the github actions website UI.
Downloading A Release
Fixtures can be downloaded by 2 seperate methods.
gh release downloadfor the raw tarball, orconsume cacheif you want tag resolution (@latest), local caching, and--inputintegration with consume subcommands.Fixed Release Types
tests@vX.Y.ZIn the past (pre-Weld) we had the following release features:
stable&develop, wherestablewas a subset fordevelop. To converge on these 2 features we now define thetestsfeature. This is the invariant to thebenchmarkfeature.The
testsfeature acts as our mainnet set of tests. That is for nowfill --until BPO4, all tests for all forks until last mainnet fork. This will be released weekly on any change or addition to the tests always from the latest development branch:forks/amsterdamcurrently. Clients will use this release on their main/master branches in CI, and eventually this release will contain all tests fromethereum/testsðereum/legacy-tests(allowing us to archive both of these repos). TLDR; one tag type to verify you will not break mainnet. Additionally this will be ran in our Hive CI (under what is currently labelled generic).Here we define a new semantic versioning type for our
testsreleases:The first release of this type in EELS will be
tests@v20.0.0, to catch up from the last fork. For the fork under development (Amsterdam) the firsttestsrelease will be tagged once all CFI'd EIPs are deemed successful in a devnet (purposely ambiguous here, things change in ethereum). Typically this will resolve to the last devnet before the first testnet; this first release can be viewed as the testnet release. For Amsterdam we will tagtests@v24.0.0likely afterglamsterdam-devnet-6is deemed successful. Spec changes can still occur and that is why we have Y in the new fixture versioning scheme.tests-<feat>-devnet@vX.Y.ZThis release type will follow on from the current devnet release process but more explicitly. Today
<feat>isbaland soon it will beglamsterdam.<feat>can essentially be any keyword but typically is the fork name or the headliner feature.The
devnetfeature is entirely for test releases during the fork development process for the upcoming fork. Here we still fill for all forks so clients can make sure they do not introduce any regressions, currentlyfill --until Amsterdam, all tests for all forks until the development fork. Clients will use this release on theirdevnetbranches in CI; they must pass all of these tests before being included in the devnet that the release is tagged for. This release will be ran in Hive CI under the same naming scheme as we do currently.For the
baldevnets today we use the tagtests-bal@vX.Y.Z. This PR will change the process totests-bal-devnet@vX.Y.Z. Asbal-devnet-7is the last of thebal's we will start the new tagging scheme forglamsterdam-devnet-5. The devnet releases will additionally follow a new versioning scheme:glamsterdam-devnet-5this would be 5,testsreleases.Following the latter, the first
glamsterdam-devnet-5release will be tagged astests-glamsterdam-devnet@v5.0.0. All devnet releases must be tagged from a devnet branch in EELS, in this caseglamsterdam-devnet-5. Here we are specifically choosing to diverge from the EELS/branch naming scheme to align every repo under the same devnet name.🔗 Related Issues or PRs
✅ Checklist
just statictype(scope):.Cute Animal Picture