Skip to content

CI: Check versioned release notes exist before releasing#1907

Open
cpcloud wants to merge 4 commits intoNVIDIA:mainfrom
cpcloud:issue-1326
Open

CI: Check versioned release notes exist before releasing#1907
cpcloud wants to merge 4 commits intoNVIDIA:mainfrom
cpcloud:issue-1326

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Apr 14, 2026

Summary

  • Adds a check-release-notes job to the release workflow that verifies the versioned release-notes file (e.g. 13.1.0-notes.rst) exists and is non-empty for each package being released
  • Blocks doc, upload-archive, and publish-testpypi jobs via needs: gates so releases cannot proceed with missing notes
  • .postN tags are silently skipped (no notes file expected)
  • Helper script at toolshed/check_release_notes.py with 20 pytest tests

Test plan

  • 20/20 pytest tests pass locally (tag parsing, component mapping, missing/empty/post detection, CLI exit codes)
  • Verify check-release-notes job runs in CI on a test release dispatch
  • Confirm .postN tags skip without failure

Closes #1326

🤖 Generated with Claude Code

@cpcloud cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026
@cpcloud cpcloud added P0 High priority - Must do! CI/CD CI/CD infrastructure labels Apr 14, 2026
@cpcloud cpcloud self-assigned this Apr 14, 2026
@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started my review by asking Cursor to look only at the purely technical aspects. For completeness, I'm pasting the response below (no need to read it yet). The main thing it surfaced is what looks like a pre-existing workflow design issue: in the current release.yml on main, all paired with a single git-tag is not really usable. I'll post a separate comment with a more detailed analysis of that point.

I think it would be most useful to decide first, at a high level, what we want to do about all, before revisiting the lower-level details.

I played around with some possible UI/input shapes in PR #1947 (see screenshots in the PR description). Not sure whether you want to make that kind of change in this PR. A simpler alternative that would seem fine to me would be to remove all for now, in this PR, and remove the corresponding all handling from the new check_release_notes.py script. If we go that route, I think that would leave only the much simpler tag-parsing finding to worry about.

Separate high-level question: is toolshed/ the intended home for this script, or would ci/tools/ be a better fit? I could imagine either, but I expected ci/tools/ first.

  Findings
  • Medium: toolshed/check_release_notes.py:27 makes component=all mean “all four packages must have release notes for the same tag
    version”, and toolshed/tests/test_check_release_notes.py:77 explicitly enshrines that with v13.1.0. That does not match this repo’s
    actual versioning split: the issue being fixed is about the shared v13.x line for cuda-bindings/cuda-python, while cuda-core and
    cuda-pathfinder release under their own tag families and note files such as cuda_core/docs/source/release/0.7.0-notes.rst and
    cuda_pathfinder/docs/source/release/1.5.3-notes.rst. I verified locally that toolshed/check_release_notes.py --git-tag v13.1.0
    --component all now fails by demanding nonexistent cuda_core/docs/source/release/13.1.0-notes.rst and
    cuda_pathfinder/docs/source/release/13.1.0-notes.rst, so if all is still meant to support repo-level releases, this gate is
    incorrect.
  • Low: toolshed/check_release_notes.py:31 introduces tag parsing that disagrees with the parser already used later in the same release
     flow at ci/tools/validate-release-wheels:25. For example, a historical tag like cuda-core-v0.1.1rc1 is interpreted here as version
    0.1.1rc1, while wheel validation interprets the same tag as 0.1.1; .post1 tags are skipped here but still normalized later. That
    gives one workflow two different definitions of the release version, which is a maintenance trap and a likely source of false
    failures if rc/post tags are ever reused.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented Apr 17, 2026

Generated with Cursor GPT-5.4 Extra High Fast


Analysis: Can all in the current release.yml workflow on main actually work?

Assume a single commit is tagged with all three release-family tags:

  • v13.3.0
  • cuda-core-v0.8.0
  • cuda-pathfinder-v1.6.0

Now assume we manually run the current release.yml workflow on main, choose component=all, and supply any one of those tags as git-tag.

The concrete question is:

Would that produce a complete and valid set of releases for all four package families?

  • cuda-bindings == 13.3.0
  • cuda-python == 13.3.0
  • cuda-core == 0.8.0
  • cuda-pathfinder == 1.6.0

Why This Is Subtle

At first glance, all plus a single tag looks nonsensical because the repo no longer has a single version namespace. But there is a real reason to hesitate before concluding that: a single tag-triggered CI run on a multi-tagged commit can still build all four wheel families.

The reason is that CI and release are using tags differently.

  • /.github/workflows/ci.yml triggers on all three tag families: v*, cuda-core-v*, and cuda-pathfinder-v*.
  • On non-PR events, ci.yml does not do path-based narrowing; it runs the full pipeline.
  • /.github/workflows/build-wheel.yml builds all four package families in that run:
    • cuda.pathfinder
    • cuda.bindings
    • cuda.core
    • cuda-python

So a CI run triggered by a tag push is not scoped to just one package family.

Why The Wheels Can Still Be Correctly Versioned

The packages do not all derive their version from the same tag family.

  • cuda_bindings/pyproject.toml uses setuptools-scm with tag_regex = ^(?P<version>v...) and git describe --match v*.
  • cuda_python/setup.py does the same shared-line lookup for v*.
  • cuda_core/pyproject.toml uses only cuda-core-v*.
  • cuda_pathfinder/pyproject.toml uses only cuda-pathfinder-v*.

That means version resolution is family-specific, not "whatever tag triggered the workflow."

So if one commit really has all three tags, then the packages can resolve like this:

  • cuda-bindings -> 13.3.0
  • cuda-python -> 13.3.0
  • cuda-core -> 0.8.0
  • cuda-pathfinder -> 1.6.0

This is not just theoretical. The repo already has historical commits that carry both a shared v... tag and a family-specific tag on the same commit, for example:

  • v13.0.2 together with cuda-core-v0.4.0
  • v13.2.0 together with cuda-pathfinder-v1.4.2

And git describe --match ... resolves the expected tag family separately on those commits.

So the answer to the narrow build question is:

Yes, a single CI run on a multi-tagged commit can plausibly build all four wheel families with their own correct versions.

Where It Breaks

The current release workflow is not built around "one commit with several independently meaningful tags." It is built around "one input tag defines one release version."

That assumption shows up in several places.

1. The release workflow selects a CI run by the exact tag

ci/tools/lookup-run-id resolves the input tag to a commit SHA, then filters GitHub Actions runs for the successful push run whose headBranch equals that exact tag.

So the release workflow is not saying "give me a successful CI run for this commit." It is saying "give me the successful CI run for this exact tag ref."

2. The release workflow still creates exactly one GitHub Release

/.github/workflows/release.yml and /.github/workflows/release-upload.yml use inputs.git-tag as the release identifier and upload target.

So even before thinking about wheel validation, the workflow still has only one GitHub Release object in mind:

  • either the release for v13.3.0
  • or the release for cuda-core-v0.8.0
  • or the release for cuda-pathfinder-v1.6.0

It has no notion of "release all four families under their own tags."

3. component=all enforces one version across all downloaded wheels

This is the decisive point.

ci/tools/download-wheels with component=all downloads all wheel artifacts from the selected CI run.

Then ci/tools/validate-release-wheels parses exactly one expected version from inputs.git-tag and applies that expected version to all distributions in the all set:

  • cuda_core
  • cuda_bindings
  • cuda_pathfinder
  • cuda_python

That means:

  • if git-tag=v13.3.0, then validation expects all four distributions to be version 13.3.0
  • if git-tag=cuda-core-v0.8.0, then validation expects all four distributions to be version 0.8.0
  • if git-tag=cuda-pathfinder-v1.6.0, then validation expects all four distributions to be version 1.6.0

Under the example above, each of those cases fails:

  • git-tag=v13.3.0 fails because cuda-core and cuda-pathfinder are not 13.3.0
  • git-tag=cuda-core-v0.8.0 fails because cuda-bindings and cuda-python are not 0.8.0
  • git-tag=cuda-pathfinder-v1.6.0 fails because cuda-bindings, cuda-python, and cuda-core are not 1.6.0

So even though the CI run may have built the right wheels, the release workflow cannot interpret them correctly when component=all.

Bottom Line

There are really two different questions here.

  1. Can one multi-tagged commit support a CI run that builds all four package families with correct versions?

Yes, plausibly.

  1. Does the current release.yml workflow make sense with component=all and a single git-tag?

No, not really.

In the current workflow, all does not mean:

release all independently versioned package families from one commit

Instead, it behaves more like:

release all downloaded artifacts as though they belong to one version namespace derived from one tag

That assumption no longer matches the structure of this repo.

So the statement

"all paired with just one git tag does not make sense"

is reasonable for the current workflow on main.

Final Caveat

Even the more optimistic part, "a single tag-triggered CI run can build all four correctly," is somewhat operationally fragile.

If the CI run triggered by, say, v13.3.0 starts before cuda-core-v0.8.0 and cuda-pathfinder-v1.6.0 are present on the remote, then the family-specific setuptools-scm lookups may not see those tags yet. In that situation, cuda-core and cuda-pathfinder may resolve against older family tags and produce dev/local versions instead of the intended release versions.

So all is not just semantically awkward in the current design; it is also timing-sensitive and therefore fragile in practice.

@cpcloud cpcloud requested a review from rwgk April 21, 2026 14:17
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Do tests in toolshed/ run?!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — they weren't picked up anywhere. Fixed in b5cb9b2: tests moved to ci/tools/tests, registered under pytest testpaths, and the check-release-notes job now self-tests the script via pytest ci/tools/tests before invoking it.

Comment thread toolshed/check_release_notes.py Outdated
"cuda-bindings": ["cuda_bindings"],
"cuda-pathfinder": ["cuda_pathfinder"],
"cuda-python": ["cuda_python"],
"all": ["cuda_bindings", "cuda_core", "cuda_pathfinder", "cuda_python"],
Copy link
Copy Markdown
Member

@leofang leofang Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ralf's post above captured this. all is no longer compatible with our new way of tagging a release (via setuptools-scm). In fact, all has never been used in production, I believe. Let's just remove this line for now and do a cleanup later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b5cb9b2all dropped from COMPONENT_TO_PACKAGES and the argparse choices, tests covering it removed. Agreed that the broader all cleanup (workflow input, validate-release-wheels, etc.) is for a separate PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate high-level question: is toolshed/ the intended home for this script, or would ci/tools/ be a better fit? I could imagine either, but I expected ci/tools/ first.

I agree with this review comment. This file should be moved to ci/tools. toolshed/ is for convenient scripts that we rarely have to re-run, especially they are not used in the CI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to ci/tools/check_release_notes.py in b5cb9b2, alongside validate-release-wheels.

cpcloud and others added 4 commits April 21, 2026 11:47
Add a check-release-notes job to the release workflow that verifies
the versioned release-notes file (e.g. 13.1.0-notes.rst) exists and
is non-empty for each package being released. The job blocks doc,
upload-archive, and publish-testpypi via needs: gates.

Helper script at toolshed/check_release_notes.py parses the git tag,
maps component to package directories, and checks file presence.
Post-release tags (.postN) are silently skipped. Tests cover tag
parsing, component mapping, missing/empty detection, and the CLI.

Refs NVIDIA#1326

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures the release-notes check validates the tagged tree, not the
default branch HEAD. Without this, manually triggered runs could
validate the wrong commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review feedback on NVIDIA#1907:

- Relocate check_release_notes.py (and its tests) from toolshed/ to
  ci/tools/, matching validate-release-wheels which is the closest
  conceptual neighbor. toolshed/ is for scripts we rarely re-run; this
  one is invoked from release.yml on every release.
- Drop the `all` component. The shared-version assumption it encoded no
  longer matches the repo's independent tag families (v*, cuda-core-v*,
  cuda-pathfinder-v*). The broader release-workflow cleanup for `all`
  will happen in a separate pass.
- Register ci/tools/tests under pytest testpaths, and run the unit
  tests in the check-release-notes job before invoking the script.
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Apr 21, 2026

Pushed b5cb9b2 addressing the review feedback:

  • Moved check_release_notes.py from toolshed/ to ci/tools/ (next to validate-release-wheels, which it pairs with conceptually). Tests moved alongside to ci/tools/tests/.
  • Dropped component=all. Removed the all key from COMPONENT_TO_PACKAGES, removed the two all tests, and simplified the core function to operate on a single package. Per @leofang's guidance, the broader all cleanup (workflow input choice, validate-release-wheels, etc.) is left for a separate pass.
  • Tests now run. Added ci/tools/tests to top-level testpaths, and the check-release-notes job in release.yml now runs pytest ci/tools/tests before invoking the script, so a broken tool fails the job rather than producing a misleading pass.

On @rwgk's low-severity tag-parser divergence from ci/tools/validate-release-wheels: with all gone, the script only needs to recognise v* / cuda-core-v* / cuda-pathfinder-v* plus skip .post tags. Unifying the two parsers (they have different needs — release-notes must identify .post to skip, while wheel-validation must reject .dev/local) seems like it belongs in the same future cleanup as the all removal rather than this PR. Happy to pull it in here if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD CI/CD infrastructure P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: The release workflow should check if the versioned release note is missing

3 participants