Conversation
burgerdev
left a comment
There was a problem hiding this comment.
Thanks, lgtm, but please hold until tomorrow, 2026-04-28 for the actual release.
|
Another thing to consider, with the refactor, |
3bedf3a to
95e03b5
Compare
|
Sorry @burgerdev @sespiros, with your suggestions + the promote-to-release idea + some refactoring to remove repetition, this might as well be v2 of this PR. The good thing about this version though, is we can just start the nightly run manually later, before merging any of this, to see if it works out. |
sespiros
left a comment
There was a problem hiding this comment.
Thanks for your effort, I find this easier to follow than the current workflow. Did a first pass (still reading) and left a few comments.
Also wdyt about having a folder under actions to group all the release relevant actions?
|
Thanks for the review! And sorry, I know it's a lot o yaml 😄
I think more organization is never a bad idea. Unfortunately, after moving every action that is only required by the release workflows into an I applied your renaming suggestions, and I think with those clearer names the actions dir is more readable already. We can move the actions into the mentioned new dir, but I'm not completely sure it would help with organization as much as I'd like. |
95e03b5 to
69af4b3
Compare
|
(force-push because of rebase on main after the upload/download-artifacts PR merge) |
69af4b3 to
3a8818c
Compare
|
(and again for the flake update) |
4363aa6 to
df2273a
Compare
df2273a to
3eb1a58
Compare
|
@sespiros could you take a final look at the 3 new commits? They're relatively small. Also already includes the docs additions. |
sespiros
left a comment
There was a problem hiding this comment.
Thanks for changes. Left some inline comments. And also some general comments:
- I think the first and the second commit could be squashed since the first one references
steps.version-info.outputs.previous_minor_tagwhich doesn't exist yet. - In general I prefer playbook-style pages to not contain steps that leave the reader wondering whether a step is enforced by the code or not. Steps that can be enforced in code should be enforced in code. If there is a possibility for the code to be wrong, either fix the code or try to make this known as part of the step. As an example, going through the playbook and reading "Check that the latest nighty has passed tests" makes me think "if I skip this I could promote a release with failing tests?", "why isn't the code checking that already?" etc.
| echo "next_minor_pre_without_v=${PART_MAJOR}.$((PART_MINOR + 1)).0-pre" | ||
| echo "next_patch_pre_without_v=${PART_MAJOR}.${PART_MINOR}.$((PART_PATCH + 1))-pre" | ||
| if (( PART_MINOR > 0 )); then | ||
| echo "previous_minor_tag=v${PART_MAJOR}.$((PART_MINOR - 1)).0" |
There was a problem hiding this comment.
Will this work if there were patch releases? So let's say we have published 1.20.1, shouldn't the nightly's 1.21.0-XXXX-YY-ZZ compare vs that patch release instead of 1.20.0? I think we should solve this in code, maybe grab list of releases and figure out the latest one.
| run: | | ||
| gh release delete "${NIGHTLY_TAG}" --repo "${GH_REPO}" --yes --cleanup-tag | ||
|
|
||
| create-github-stuff: |
There was a problem hiding this comment.
Shouldn't we also rename this to prepare-github-release?
| next_minor: ${{ needs.prepare.outputs.NEXT_MINOR }} | ||
| github_token: ${{ github.token }} | ||
|
|
||
| update-main: |
There was a problem hiding this comment.
and this to prepare-post-release-pr?
|
|
||
| publish: | ||
| name: Publish release | ||
| needs: [prepare, promote, create-github-stuff, update-main] |
There was a problem hiding this comment.
I think the workflow would be easier to read like:
| needs: [prepare, promote, create-github-stuff, update-main] | |
| needs: [prepare, promote, prepare-github-release, prepare-post-release-pr] |
prepare still feels a bit generic but ok.
| VERSION: ${{ needs.prepare.outputs.EFFECTIVE_VERSION }} | ||
| run: | | ||
| gh release edit "${VERSION}" --draft=false --repo "${{ github.repository }}" | ||
| - name: Regenerate release notes against the now-existing tag |
There was a problem hiding this comment.
Shouldn't we add an equivalent step in release.yml as well?
| GH_REPO: ${{ github.repository }} | ||
| run: | | ||
| failed=$(gh run view "${NIGHTLY_RUN_ID}" --repo "${GH_REPO}" --json jobs \ | ||
| --jq '[.jobs[] | select(.name | startswith("e2e release")) | select(.conclusion != "success")] | map("\(.name): \(.conclusion)") | .[]') |
There was a problem hiding this comment.
Are the e2e release tests the only ones we care about here? Shouldn't we check that all tests have passed? That would also make the manual check step 1 of the playbook optional. If any test has failed, running the promotion workflow would fail here too.
| The CI should create a nightly minor draft release, called `v1.x.0-yyyy-mm-dd`, and runs the full test suite against it. | ||
| This draft release can be promoted to actual release. | ||
|
|
||
| 1. Check that in the latest nightly, all checks and tests passed. A draft release is created even if that's not the case, but the promotion won't work if a job failed. |
There was a problem hiding this comment.
How about:
1. (Optional) Check that in the latest nightly, all checks and tests passed. Promotion won't be able to run if job has failed during the nightly draft release.
To signify that this is not a load-bearing check, but see also my other comment in release_promote.yml about having these checks be checked automatically.
| export NIGHTLY_RUN_ID=<run-id> | ||
| ``` | ||
|
|
||
| 3. Sanity-check the corresponding draft release on GitHub. The tag should match `vX.Y.Z-yyyy-mm-dd` where |
There was a problem hiding this comment.
Isn't this step also enforced by code? Why have a separate step for it? Personally I find it confusing but if you think the is some possibility for the code to misbehave then maybe either fix the code or clarify this here or mark it also as (Optional)?
| 5. Review the release notes. If label/title/description changes are necessary, change them on the PR itself. Test the binary artifact. | ||
| The release notes will be re-generated by the CI once the new version tag has been created. |
There was a problem hiding this comment.
So after step 4, the promotion step is blocked at human review (awaiting for someone to click the button). Then if I follow step 5, review the notes and I make edits, then click the button, iiuc the regeneration step that happens after publish will eat my edits no? Do we want this?
A couple of ways to resolve this:
- push the tag -> generates notes -> then publish. Any edits the user will do before clicking submit are the final ones.
- Make it clear that user's shouldn't be doing manual edits to the release text at this point because changes are overwritten.
What adds to the confusion is "change them on the PR itself" (and I saw we have this in the other section too). It doesn't mean the auto-generated PR, it means the contributor's PR in the list of PRs. So changing an individual contribution's PR would allow the regeneration to pick the changes but changing anything in the release's text at this step would cause it to be overwritten.
| 2. Find and export the workflow run ID of the release you want to promote: | ||
|
|
||
| ```sh | ||
| gh run list --workflow=release_nightly.yml --limit 10 | ||
| export NIGHTLY_RUN_ID=<run-id> | ||
| ``` |
There was a problem hiding this comment.
Process-wise, should we allow selecting arbitrary nightly release ids here? There might be subtle bugs that can happen since now nightly releases override the same tag (I am looking at
), say if someone selects yesterday's nightly for promotion while registries have today's nightly images under the latest tag.I think in this scenario, the cli would continue working because it picks up images by hash but if someone pulls an image by the release's tag and compare against the one the cli is using will get a hash mismatch because yesterday's nightly cli (which was promoted) had an old hash while the tag points to today's nightly.
So I think this should be solved:
- either by re-tagging the images after promotion if we want to allow promoting older nightlies or actually better,
- make this impossible in code. Only allow the latest nightly to be promoted (and have that be reflected somehow in the playbook here).
e2e_nightlyworkflow no longer triggers automatically. Instead,releaseworkflow dry-runs every night. "dry run" here means no backport PRs, no draft releases. The artifacts are still published to S3, so we/pm can use them.