Skip to content

.github: dry-run release pipeline nightly#2360

Open
charludo wants to merge 4 commits intomainfrom
ch/nightly-release-pipeline
Open

.github: dry-run release pipeline nightly#2360
charludo wants to merge 4 commits intomainfrom
ch/nightly-release-pipeline

Conversation

@charludo
Copy link
Copy Markdown
Collaborator

e2e_nightly workflow no longer triggers automatically. Instead, release workflow dry-runs every night. "dry run" here means no backport PRs, no draft releases. The artifacts are still published to S3, so we/pm can use them.

@charludo charludo added the no changelog PRs not listed in the release notes label Apr 27, 2026
@charludo charludo requested review from burgerdev and sespiros April 27, 2026 14:04
Copy link
Copy Markdown
Member

@burgerdev burgerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm, but please hold until tomorrow, 2026-04-28 for the actual release.

Comment thread .github/workflows/release.yml Outdated
Comment thread .github/workflows/release.yml Outdated
Comment thread .github/workflows/release.yml Outdated
@sespiros
Copy link
Copy Markdown
Collaborator

Another thing to consider, with the refactor, e2e nightly run will now depend on release-x86_64-linux to succeed. If that fails for any reason we lose the entire night's e2e run.

@charludo charludo force-pushed the ch/nightly-release-pipeline branch from 3bedf3a to 95e03b5 Compare April 29, 2026 11:42
@charludo
Copy link
Copy Markdown
Collaborator Author

Sorry @burgerdev @sespiros, with your suggestions + the promote-to-release idea + some refactoring to remove repetition, this might as well be v2 of this PR.

The good thing about this version though, is we can just start the nightly run manually later, before merging any of this, to see if it works out.

@charludo charludo requested review from burgerdev and sespiros April 29, 2026 11:44
Copy link
Copy Markdown
Collaborator

@sespiros sespiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort, I find this easier to follow than the current workflow. Did a first pass (still reading) and left a few comments.

Also wdyt about having a folder under actions to group all the release relevant actions?

Comment thread .github/actions/release_build_linux/action.yml Outdated
Comment thread .github/actions/prepare_github_release/action.yml
Comment thread .github/actions/prepare_post_release_pr/action.yml
@charludo
Copy link
Copy Markdown
Collaborator Author

charludo commented Apr 30, 2026

Thanks for the review! And sorry, I know it's a lot o yaml 😄

Also wdyt about having a folder under actions to group all the release relevant actions?

I think more organization is never a bad idea. Unfortunately, after moving every action that is only required by the release workflows into an .github/actions/release/ dir, most actions are now in that dir, and some of those that aren't are also used by the release (just not exclusively).

I applied your renaming suggestions, and I think with those clearer names the actions dir is more readable already. We can move the actions into the mentioned new dir, but I'm not completely sure it would help with organization as much as I'd like.

@charludo charludo force-pushed the ch/nightly-release-pipeline branch from 95e03b5 to 69af4b3 Compare April 30, 2026 06:43
@charludo
Copy link
Copy Markdown
Collaborator Author

charludo commented Apr 30, 2026

(force-push because of rebase on main after the upload/download-artifacts PR merge)

@charludo charludo force-pushed the ch/nightly-release-pipeline branch from 69af4b3 to 3a8818c Compare April 30, 2026 07:53
@charludo
Copy link
Copy Markdown
Collaborator Author

(and again for the flake update)

@charludo charludo requested a review from sespiros April 30, 2026 07:53
@charludo charludo force-pushed the ch/nightly-release-pipeline branch from 4363aa6 to df2273a Compare May 7, 2026 05:35
@charludo charludo force-pushed the ch/nightly-release-pipeline branch from df2273a to 3eb1a58 Compare May 7, 2026 08:40
@charludo
Copy link
Copy Markdown
Collaborator Author

charludo commented May 7, 2026

@sespiros could you take a final look at the 3 new commits? They're relatively small. Also already includes the docs additions.

Copy link
Copy Markdown
Collaborator

@sespiros sespiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changes. Left some inline comments. And also some general comments:

  • I think the first and the second commit could be squashed since the first one references steps.version-info.outputs.previous_minor_tag which doesn't exist yet.
  • In general I prefer playbook-style pages to not contain steps that leave the reader wondering whether a step is enforced by the code or not. Steps that can be enforced in code should be enforced in code. If there is a possibility for the code to be wrong, either fix the code or try to make this known as part of the step. As an example, going through the playbook and reading "Check that the latest nighty has passed tests" makes me think "if I skip this I could promote a release with failing tests?", "why isn't the code checking that already?" etc.

echo "next_minor_pre_without_v=${PART_MAJOR}.$((PART_MINOR + 1)).0-pre"
echo "next_patch_pre_without_v=${PART_MAJOR}.${PART_MINOR}.$((PART_PATCH + 1))-pre"
if (( PART_MINOR > 0 )); then
echo "previous_minor_tag=v${PART_MAJOR}.$((PART_MINOR - 1)).0"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work if there were patch releases? So let's say we have published 1.20.1, shouldn't the nightly's 1.21.0-XXXX-YY-ZZ compare vs that patch release instead of 1.20.0? I think we should solve this in code, maybe grab list of releases and figure out the latest one.

run: |
gh release delete "${NIGHTLY_TAG}" --repo "${GH_REPO}" --yes --cleanup-tag

create-github-stuff:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also rename this to prepare-github-release?

next_minor: ${{ needs.prepare.outputs.NEXT_MINOR }}
github_token: ${{ github.token }}

update-main:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this to prepare-post-release-pr?


publish:
name: Publish release
needs: [prepare, promote, create-github-stuff, update-main]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the workflow would be easier to read like:

Suggested change
needs: [prepare, promote, create-github-stuff, update-main]
needs: [prepare, promote, prepare-github-release, prepare-post-release-pr]

prepare still feels a bit generic but ok.

VERSION: ${{ needs.prepare.outputs.EFFECTIVE_VERSION }}
run: |
gh release edit "${VERSION}" --draft=false --repo "${{ github.repository }}"
- name: Regenerate release notes against the now-existing tag
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add an equivalent step in release.yml as well?

GH_REPO: ${{ github.repository }}
run: |
failed=$(gh run view "${NIGHTLY_RUN_ID}" --repo "${GH_REPO}" --json jobs \
--jq '[.jobs[] | select(.name | startswith("e2e release")) | select(.conclusion != "success")] | map("\(.name): \(.conclusion)") | .[]')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the e2e release tests the only ones we care about here? Shouldn't we check that all tests have passed? That would also make the manual check step 1 of the playbook optional. If any test has failed, running the promotion workflow would fail here too.

Comment thread dev-docs/release.md
The CI should create a nightly minor draft release, called `v1.x.0-yyyy-mm-dd`, and runs the full test suite against it.
This draft release can be promoted to actual release.

1. Check that in the latest nightly, all checks and tests passed. A draft release is created even if that's not the case, but the promotion won't work if a job failed.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

1. (Optional) Check that in the latest nightly, all checks and tests passed. Promotion won't be able to run if job has failed during the nightly draft release. 

To signify that this is not a load-bearing check, but see also my other comment in release_promote.yml about having these checks be checked automatically.

Comment thread dev-docs/release.md
export NIGHTLY_RUN_ID=<run-id>
```

3. Sanity-check the corresponding draft release on GitHub. The tag should match `vX.Y.Z-yyyy-mm-dd` where
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this step also enforced by code? Why have a separate step for it? Personally I find it confusing but if you think the is some possibility for the code to misbehave then maybe either fix the code or clarify this here or mark it also as (Optional)?

Comment thread dev-docs/release.md
Comment on lines +26 to +27
5. Review the release notes. If label/title/description changes are necessary, change them on the PR itself. Test the binary artifact.
The release notes will be re-generated by the CI once the new version tag has been created.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after step 4, the promotion step is blocked at human review (awaiting for someone to click the button). Then if I follow step 5, review the notes and I make edits, then click the button, iiuc the regeneration step that happens after publish will eat my edits no? Do we want this?

A couple of ways to resolve this:

  • push the tag -> generates notes -> then publish. Any edits the user will do before clicking submit are the final ones.
  • Make it clear that user's shouldn't be doing manual edits to the release text at this point because changes are overwritten.

What adds to the confusion is "change them on the PR itself" (and I saw we have this in the other section too). It doesn't mean the auto-generated PR, it means the contributor's PR in the list of PRs. So changing an individual contribution's PR would allow the regeneration to pick the changes but changing anything in the release's text at this step would cause it to be overwritten.

Comment thread dev-docs/release.md
Comment on lines +10 to +15
2. Find and export the workflow run ID of the release you want to promote:

```sh
gh run list --workflow=release_nightly.yml --limit 10
export NIGHTLY_RUN_ID=<run-id>
```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process-wise, should we allow selecting arbitrary nightly release ids here? There might be subtle bugs that can happen since now nightly releases override the same tag (I am looking at

version: v${{ inputs.without_v }}
), say if someone selects yesterday's nightly for promotion while registries have today's nightly images under the latest tag.

I think in this scenario, the cli would continue working because it picks up images by hash but if someone pulls an image by the release's tag and compare against the one the cli is using will get a hash mismatch because yesterday's nightly cli (which was promoted) had an old hash while the tag points to today's nightly.

So I think this should be solved:

  • either by re-tagging the images after promotion if we want to allow promoting older nightlies or actually better,
  • make this impossible in code. Only allow the latest nightly to be promoted (and have that be reflected somehow in the playbook here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog PRs not listed in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants