Skip to content

feat: support nightly spec tests#9221

Merged
nflaig merged 6 commits intoChainSafe:unstablefrom
brech1:feat/nightly-workflow
Apr 24, 2026
Merged

feat: support nightly spec tests#9221
nflaig merged 6 commits intoChainSafe:unstablefrom
brech1:feat/nightly-workflow

Conversation

@brech1
Copy link
Copy Markdown
Contributor

@brech1 brech1 commented Apr 15, 2026

Motivation

The ethereum/consensus-specs CI generates spec test artifacts daily via tests.yml, without requiring a release. This also works for forks and feature branches.

Description

Extends pnpm download-spec-tests with optional arguments to download from workflow artifacts:

pnpm download-spec-tests                                          # old behavior
pnpm download-spec-tests latest                                   # latest available run
pnpm download-spec-tests 2026-04-14                               # latest run from that date
pnpm download-spec-tests latest brech1/consensus-specs            # fork
pnpm download-spec-tests latest brech1/consensus-specs feat/x     # fork + branch

Since these are not releases, the artifacts must be downloaded through the GitHub API, this requires a GITHUB_TOKEN, it can be set in the local .env file.

Checks

I checked locally with an experimental branch:

pnpm download-spec-tests latest brech1/consensus-specs test/payload-att-message

@brech1 brech1 requested a review from a team as a code owner April 15, 2026 23:55
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to download nightly spec tests from GitHub Actions artifacts, adding a new utility downloadNightlyTests and updating the beacon node test suite to support this via CLI arguments. The review feedback identifies a critical bug where the GitHub Artifacts API returns ZIP files while the extraction logic expects TAR format. Additionally, it is recommended to await asynchronous download operations to ensure predictable execution and to use URLSearchParams for more robust API query construction.

for (const test of opts.testsToDownload) {
const artifact = artifacts.find((a) => a.name === `${test}.tar.gz` && !a.expired);
if (artifact) {
urlByTest[test] = artifact.archive_download_url;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The archive_download_url from the GitHub Artifacts API returns a ZIP archive containing the artifact, not the raw file itself. However, downloadGenericSpecTests (specifically at line 98 of downloadTests.ts) uses tar -xzf to extract the downloaded file. This will fail because tar (especially GNU tar used in Linux environments) cannot handle the ZIP format. You will need to unzip the archive first to retrieve the .tar.gz file or update the extraction logic to support ZIP archives.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The archive: false line on the upload-artifacts job on the tests workflow submits the raw archive (already .tar.gz) to prevent double compression.

Comment on lines +23 to +24
downloadTests(blsSpecTests, console.log).catch(onError);
downloadNightlyTests(opts, console.log, nightlyArg).catch(onError);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These asynchronous calls are not awaited, which means the script will continue execution and potentially finish the main block while the downloads are still in progress. While the Node.js process will stay alive until the promises settle, it is better practice to explicitly await them to ensure predictable execution and proper error handling.

  await Promise.all([
    downloadTests(blsSpecTests, console.log),
    downloadNightlyTests(opts, console.log, nightlyArg),
  ]).catch(onError);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

Comment on lines +23 to +27
const query = "status=success&per_page=1" + (branch ? `&branch=${branch}` : "") + (date ? `&created=${date}` : "");
const {workflow_runs} = await ghApiFetch<WorkflowRunsResponse>(
`/repos/${repo}/actions/workflows/tests.yml/runs?${query}`,
token
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using string concatenation to build query parameters can lead to incorrect URLs if variables like branch contain special characters (e.g., a forward slash in a branch name like feat/x). Using URLSearchParams ensures that all parameters are correctly encoded.

Suggested change
const query = "status=success&per_page=1" + (branch ? `&branch=${branch}` : "") + (date ? `&created=${date}` : "");
const {workflow_runs} = await ghApiFetch<WorkflowRunsResponse>(
`/repos/${repo}/actions/workflows/tests.yml/runs?${query}`,
token
);
const params = new URLSearchParams({status: "success", per_page: "1"});
if (branch) params.append("branch", branch);
if (date) params.append("created", date);
const {workflow_runs} = await ghApiFetch<WorkflowRunsResponse>(
`/repos/${repo}/actions/workflows/tests.yml/runs?${params.toString()}`,
token
);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5780392587

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +66 to +70
log(`Skipping ${test} (not found in run ${runId})`);
}
}

if (available.length === 0) throw new Error(`No matching artifacts found in run ${runId}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail when expected nightly artifacts are missing

This code only errors when zero artifacts are found, but it silently proceeds when a subset is missing (for example, a workflow run that did not upload mainnet.tar.gz). In that case downloadGenericSpecTests still writes version.txt for nightly-<runId>, so subsequent runs treat the incomplete dataset as fully cached and do not re-download the missing presets. That leaves spec test directories partially populated and can cause downstream spec runs to fail in non-obvious ways.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected behavior

@brech1 brech1 force-pushed the feat/nightly-workflow branch from 78c5fac to 75f7a64 Compare April 16, 2026 13:14
@brech1 brech1 force-pushed the feat/nightly-workflow branch from 75f7a64 to c2f06a1 Compare April 17, 2026 04:31
@nflaig
Copy link
Copy Markdown
Member

nflaig commented Apr 23, 2026

@lodekeeper can you please review this PR and see why CI is failing, also test this workflow on your lodekeeper/lodestar fork and see if it works as expected and if we are passing the nightly spec tests

@lodekeeper
Copy link
Copy Markdown
Contributor

Ran through this on my lodekeeper/lodestar fork — CI failure is fully explained by an outdated lockfile, and the nightly-download path works end-to-end. There is one downstream surprise from the spec master that's worth capturing as a follow-up, but it's not a PR-9221 regression.

1. CI failure on #9221 — outdated pnpm-lock.yaml

Every failing job dies in the same spot:

ERR_PNPM_OUTDATED_LOCKFILE  Cannot install with "frozen-lockfile" because pnpm-lock.yaml is not up to date with <ROOT>/packages/beacon-node/package.json
Failure reason: specifiers in the lockfile don't match specifiers in package.json:
* 1 dependencies were added: dotenv@^16.4.5

dotenv@^16.4.5 was added to packages/beacon-node/package.json (used by test/spec/downloadTests.ts) but pnpm-lock.yaml wasn't regenerated. Running pnpm install --lockfile-only adds exactly 3 lines. After that, pnpm build + local type checks succeed.

Staged fix branch on my fork: lodekeeper#20 (single commit b4c68811). Happy to cherry-pick onto this PR, or just pnpm install && git add pnpm-lock.yaml && git commit locally — either works.

2. Nightly-download path works end-to-end

Ran pnpm download-spec-tests latest locally against ethereum/consensus-specs on this PR branch. It:

  • hit /repos/ethereum/consensus-specs/actions/workflows/tests.yml/runs?status=success&per_page=1, resolved to run 24809971298,
  • fetched general.tar.gz (169 MB), mainnet.tar.gz (877 MB), minimal.tar.gz (515 MB) from archive_download_url,
  • extracted all three cleanly into spec-tests/tests/{general,mainnet,minimal}/.

On Gemini's concern about archive_download_url returning a ZIP: brech1's reply is right in practice — with archive: false on the upload-artifact step on ethereum/consensus-specs side, the raw .tar.gz comes through and tar -xzf extracts fine. Verified empirically.

3. Nightly spec tests pass — except one new pre-existing gap

Ran against the nightly-downloaded tests on this branch:

  • pnpm test:spec:general2437 / 2437 passing.
  • pnpm test:spec:minimal55355 passing, 95 skipped, 1 failed.

The single failure is Error: Unknown sanity test epoch_boundary in test/spec/presets/sanity.test.ts:28. Nightly ships minimal/gloas/sanity/epoch_boundary (and heze/sanity/epoch_boundary), which come from consensus-specs #5127 "Add epoch boundary tests for gloas" on master. Lodestar's sanity runner switches on testName and only knows slots / blocks, so the new category throws.

This is exactly the kind of drift the nightly-tests feature is designed to surface — not a regression from this PR. A tiny case "epoch_boundary": addition (presumably a thin wrapper around the existing sanityBlocks flow, since epoch-boundary cases look like a constrained variant of blocks) would close it out.

Summary

  • Unblock feat: support nightly spec tests #9221 with the lockfile update (or pull my fork-PR-20 commit).
  • Nightly download + spec-test execution works as designed on the fork.
  • Filing/fixing the epoch_boundary runner case is an orthogonal follow-up.

@nflaig
Copy link
Copy Markdown
Member

nflaig commented Apr 23, 2026

@lodekeeper can you please open a PR against this one and fix everything you found, so we don't have to bother @brech1 with that stuff?

lodekeeper added a commit to lodekeeper/lodestar that referenced this pull request Apr 23, 2026
Upstream consensus-specs ChainSafe#5127 adds epoch_boundary sanity tests for
Gloas (and post-Gloas `heze`). The case format is identical to blocks
(pre / post / blocks_N / meta.blocks_count), so route it through the
existing sanityBlocks runner.

Surfaced by pnpm test:spec:minimal against nightly artifacts downloaded
via the new nightly workflow in ChainSafe#9221.

🤖 Generated with AI assistance
@lodekeeper
Copy link
Copy Markdown
Contributor

@nflaig — opened brech1#1 against this branch with the two fixes:

  1. pnpm-lock.yaml regenerated for dotenv@^16.4.5 (fixes the Build ERR_PNPM_OUTDATED_LOCKFILE).
  2. test/spec/presets/sanity.test.ts — dispatch epoch_boundary through the existing sanityBlocks runner (fixture format matches blocks; new suite from consensus-specs#5127).

After both fixes applied locally: pnpm test:spec:minimal → 56128 passed, 5 failed, 95 skipped. The 5 failures are gloas/sanity/epoch_boundary/pyspec_tests/epoch_boundary_full_parent_* (+ switch_to_compounding_across_epoch_boundary) — all hitting Block parent root ... does not match state latest block at processBlockHeader.ts:34. These are pre-existing Lodestar gaps surfaced by the new gloas test fixtures, not regressions from this PR. I'll investigate those separately once the nightly workflow is in.

@nflaig
Copy link
Copy Markdown
Member

nflaig commented Apr 23, 2026

After both fixes applied locally: pnpm test:spec:minimal → 56128 passed, 5 failed, 95 skipped. The 5 failures are gloas/sanity/epoch_boundary/pyspec_tests/epoch_boundary_full_parent_* (+ switch_to_compounding_across_epoch_boundary) — all hitting Block parent root ... does not match state latest block at processBlockHeader.ts:34. These are pre-existing Lodestar gaps surfaced by the new gloas test fixtures, not regressions from this PR. I'll investigate those separately once the nightly workflow is in.

@lodekeeper what do you mean pre-existing gaps in lodestar? are you sure, maybe the tests are faulty? please investigate further

@lodekeeper
Copy link
Copy Markdown
Contributor

@nflaig — you were right to push back. I was wrong to call this a Lodestar gap. After tracing the failures, it's a test-generator bug in the new test_epoch_boundary.py in consensus-specs#5127.

Mechanism, concretely:

  • All 5 failing tests call set_parent_block_full(spec, state) after yield "pre", state — i.e. between signed_block_1 = state_transition_and_sign_block(...) and block_2 = _build_....
  • set_parent_block_full mutates state.latest_block_hash (and state.latest_execution_payload_bid.block_hash), both of which are part of hash_tree_root(state).
  • The very next process_slot snapshots hash_tree_root(state) into state.latest_block_header.state_root, so the mutation flows into latest_block_header.
  • Python computes block_2.parent_root = hash_tree_root(state.latest_block_header) using the mutated state, and serializes block_2 that way.
  • The fixture only serializes pre / post / blocks_0 / blocks_1 / meta.blocks_count. There is no fixture entry to carry the between-blocks mutation, so every consumer (not just Lodestar) replaying the test will see a different state.latest_block_header.state_root after block_1 and fail processBlockHeader's parent-root check on block_2.
  • All 3 passing test_epoch_boundary_empty_parent_* call set_parent_block_full before yield "pre", so the mutation is captured in the pre-state — that's why they pass.

Fix belongs in consensus-specs, one of:

a) Extend the fixture format with execution_payload_envelopes_N.ssz_snappy and teach the sanity runner to deliver the envelope between blocks (bigger, but closer to on-wire semantics).
b) Restructure the affected tests so the FULL-parent path is exercised by actually running spec.process_execution_payload in the generator, i.e. remove the out-of-band set_parent_block_full helper. That's the smaller fix and stays within the existing fixture format.

I can open the consensus-specs PR — leaning (b) unless you'd rather keep the helper. brech1#1 stays scoped to the two review findings (lockfile + dispatcher); those 5 failures are not the nightly workflow's problem, and they'll flip green once the upstream test PR lands.

case "slots":
return sanitySlots(fork, testName, testSuite);
case "blocks":
case "epoch_boundary":
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to see, maybe these need to be moved under blocks on the spec side but this works for now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — if consensus-specs ends up categorizing these under blocks/pyspec_tests/ (either by renaming the file or by routing epoch_boundary through the existing blocks handler in the generator), the Lodestar side stays a no-op: the case "epoch_boundary": fallthrough is harmless if the directory disappears, and can be dropped later. Either way this doesn't block #9221.

Copy link
Copy Markdown
Member

@nflaig nflaig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for adding this @brech1

@nflaig nflaig merged commit d166e3b into ChainSafe:unstable Apr 24, 2026
19 checks passed
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.52%. Comparing base (78c66ba) to head (a69ecdf).
⚠️ Report is 2 commits behind head on unstable.

Additional details and impacted files
@@            Coverage Diff            @@
##           unstable    #9221   +/-   ##
=========================================
  Coverage     52.52%   52.52%           
=========================================
  Files           848      848           
  Lines         61304    61304           
  Branches       4510     4510           
=========================================
  Hits          32201    32201           
  Misses        29038    29038           
  Partials         65       65           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

nflaig added a commit that referenced this pull request Apr 24, 2026
Follow up to #9221 which adds
a worfklow to run nightly spec tests
@brech1 brech1 deleted the feat/nightly-workflow branch April 24, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants