feat(ci): release process fixes from testing#4195
Conversation
Tag push events triggered by GITHUB_TOKEN don't trigger other workflows, but workflow_dispatch does. Dispatch build.yml explicitly after creating the release so Docker images get built without needing a PAT or GitHub App.
Workflow dispatch requires actions:write permission; contents:write alone is not enough.
There was a problem hiding this comment.
Code Review
This pull request updates the documentation in doc/software-releases.md to clarify the release workflow. It adds a description explaining that create-release.yml dispatches build.yml to ensure Docker images are built, as tag pushes via GITHUB_TOKEN do not automatically trigger other workflows. I have no feedback to provide as there were no review comments.
- Drop YYYYMMDD.rcN git tag concept. Versions are defined once (YYYYMMDD), and if broken a new version is cut (YYYYMMDD.1 or next day). - Rename decaf.rc/mainnet.rc floating tags to decaf.canary/mainnet.canary since they no longer represent RC versions, just early-adopter tiers. - Classify YYYYMMDD.N same-day releases as Release (not Pre-release). - Reject YYYYMMDD-* internal tags from promotion.
Adds a step-by-step walkthrough section referencing screenshots in doc/assets/, including a note to always select main in the "Use workflow from" dropdown.
Shows current floating Docker tag targets, recent YYYYMMDD git tags (with Release/Pre-release classification and originating branch), and active release-* branches with tags not on main. Python stdlib only; requires gh with read:packages scope. Adds a `just release-status` recipe and a short section in the release versioning doc.
Switch from gh api (which requires read:packages scope) to docker buildx imagetools inspect for floating-tag-to-version mapping -- works anonymously for public packages. Parallelize docker inspects and per-tag git branch --contains calls via ThreadPoolExecutor. Full run drops from ~30s to ~5s.
Compute git tag --merged for main and each release branch once at startup (parallelized), then look up tag->branch via dict instead of per-tag git branch --contains calls. Also fetch gh API calls in parallel. Runs drop from ~5s to ~1.3s.
- Replace `docker buildx imagetools inspect` with direct HTTP to the GHCR OCI registry. Anonymous token is fetched once. Removes ~400ms per call; full run drops from ~1.3s to ~0.7s, --floating to ~0.4s. - Add --loglevel debug which prints per-command and per-stage timings to stderr. - Skip launching unnecessary work when --floating is passed. - Hide release branches with no release tags by default. Add --all-branches to include them.
…nches - Default window reduced from 180 to 60 days. - git fetch --tags is off by default (use --fetch to enable). Network latency makes it the slowest step and many runs don't need fresh data. - Hide release branches with no release tags (use --all-branches to include them). - Doc updated with new flags.
The repo is public so SSH auth is unnecessary. Fetch over HTTPS (0.3-0.5s) instead of SSH (1.6s) and only fetch YYYYMMDD tags plus release/main branches. Full run with fetch drops from ~2.3s to ~1.3s. Restore fetching as the default.
- Drop the --paginate flag and the "][" JSON concatenation hack; for the endpoints we call, per_page=100 is enough. - Collapse fetch_promote_runs and fetch_release_classification's list/dict fallbacks. - section_floating takes one digests dict instead of two.
Type annotations throughout (PEP 585/604, Python 3.9+). Introduce ReleaseBranch NamedTuple instead of positional tuples. Start git fetch concurrently with floating tag digest lookups and gh workflow runs; candidate digest lookups start after fetch completes. Full run with fetch drops from ~1.3s to ~1.0s.
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Nextest failures (1) in this run
See the step summary for flaky tests and slowest tests. |
|
Claude finished @sveitser's task in 2m 18s —— View job Code Review: PR #4195 — Release process fixes from testing
SummaryClean PR that solves a real GitHub limitation (tag pushes from CI Workflows
DocumentationThe walkthrough is clear and well-structured. Good call adding the "If Something Goes Wrong" section and the note about floating tags vs. manual operator steps. The "always select
|
| matches = [ | ||
| r for r in runs | ||
| if r.get("conclusion") == "success" | ||
| and floating_tag in (r.get("display_title") or r.get("name") or "") | ||
| and release_tag in (r.get("display_title") or r.get("name") or "") | ||
| ] | ||
| return max(matches, key=lambda r: r["created_at"]) if matches else None |
There was a problem hiding this comment.
Bug: substring matching causes false positives
Both checks use in (substring containment), which leads to incorrect matches in at least two ways:
- Floating tag collision — searching for
decafmatches a run titled"… decaf.canary …", andmainnetmatchesmainnet.canary. - Release tag collision — searching for
20260408matches"… 20260408.1 …".
So a promote of 20260408.1 to decaf.canary would incorrectly appear as a match when looking for 20260408 promoted to decaf.
Consider using word-boundary matching or parsing the workflow run inputs instead. The inputs are available on workflow_dispatch runs:
# e.g. r.get("inputs", {}).get("floating-tag") == floating_tag| _ghcr_token: str | None = None | ||
|
|
||
|
|
||
| def ghcr_token() -> str: | ||
| global _ghcr_token | ||
| if _ghcr_token is None: | ||
| url = f"https://ghcr.io/token?scope=repository:{GHCR_REPO}:pull" | ||
| with urllib.request.urlopen(url, timeout=10) as resp: | ||
| _ghcr_token = json.loads(resp.read())["token"] | ||
| return _ghcr_token |
There was a problem hiding this comment.
Nit: ghcr_token() is called from multiple threads (via tag_digest submitted to the ThreadPoolExecutor), but the global _ghcr_token check-then-set isn't synchronized. In practice this just means a few redundant token fetches on the first batch, not a correctness bug. A threading.Lock would make it explicit, or you could fetch the token once before submitting the digest futures.
Tag push events triggered by GITHUB_TOKEN don't trigger other workflows, but workflow_dispatch does. Dispatch build.yml explicitly after creating the release so Docker images get built without needing a PAT or GitHub App.