feat(ci): release process fixes from testing by sveitser · Pull Request #4195 · EspressoSystems/espresso-network

sveitser · 2026-04-17T09:51:38Z

Tag push events triggered by GITHUB_TOKEN don't trigger other workflows, but workflow_dispatch does. Dispatch build.yml explicitly after creating the release so Docker images get built without needing a PAT or GitHub App.

Workflow dispatch requires actions:write permission; contents:write alone is not enough.

gemini-code-assist

Code Review

This pull request updates the documentation in doc/software-releases.md to clarify the release workflow. It adds a description explaining that create-release.yml dispatches build.yml to ensure Docker images are built, as tag pushes via GITHUB_TOKEN do not automatically trigger other workflows. I have no feedback to provide as there were no review comments.

- Drop YYYYMMDD.rcN git tag concept. Versions are defined once (YYYYMMDD), and if broken a new version is cut (YYYYMMDD.1 or next day). - Rename decaf.rc/mainnet.rc floating tags to decaf.canary/mainnet.canary since they no longer represent RC versions, just early-adopter tiers. - Classify YYYYMMDD.N same-day releases as Release (not Pre-release). - Reject YYYYMMDD-* internal tags from promotion.

Adds a step-by-step walkthrough section referencing screenshots in doc/assets/, including a note to always select main in the "Use workflow from" dropdown.

Shows current floating Docker tag targets, recent YYYYMMDD git tags (with Release/Pre-release classification and originating branch), and active release-* branches with tags not on main. Python stdlib only; requires gh with read:packages scope. Adds a `just release-status` recipe and a short section in the release versioning doc.

Switch from gh api (which requires read:packages scope) to docker buildx imagetools inspect for floating-tag-to-version mapping -- works anonymously for public packages. Parallelize docker inspects and per-tag git branch --contains calls via ThreadPoolExecutor. Full run drops from ~30s to ~5s.

Compute git tag --merged for main and each release branch once at startup (parallelized), then look up tag->branch via dict instead of per-tag git branch --contains calls. Also fetch gh API calls in parallel. Runs drop from ~5s to ~1.3s.

- Replace `docker buildx imagetools inspect` with direct HTTP to the GHCR OCI registry. Anonymous token is fetched once. Removes ~400ms per call; full run drops from ~1.3s to ~0.7s, --floating to ~0.4s. - Add --loglevel debug which prints per-command and per-stage timings to stderr. - Skip launching unnecessary work when --floating is passed. - Hide release branches with no release tags by default. Add --all-branches to include them.

…nches - Default window reduced from 180 to 60 days. - git fetch --tags is off by default (use --fetch to enable). Network latency makes it the slowest step and many runs don't need fresh data. - Hide release branches with no release tags (use --all-branches to include them). - Doc updated with new flags.

The repo is public so SSH auth is unnecessary. Fetch over HTTPS (0.3-0.5s) instead of SSH (1.6s) and only fetch YYYYMMDD tags plus release/main branches. Full run with fetch drops from ~2.3s to ~1.3s. Restore fetching as the default.

- Drop the --paginate flag and the "][" JSON concatenation hack; for the endpoints we call, per_page=100 is enough. - Collapse fetch_promote_runs and fetch_release_classification's list/dict fallbacks. - section_floating takes one digests dict instead of two.

Type annotations throughout (PEP 585/604, Python 3.9+). Introduce ReleaseBranch NamedTuple instead of positional tuples. Start git fetch concurrently with floating tag digest lookups and gh workflow runs; candidate digest lookups start after fetch completes. Full run with fetch drops from ~1.3s to ~1.0s.

claude · 2026-04-27T14:58:19Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

github-actions · 2026-04-27T15:45:28Z

Nextest failures (1) in this run

Test	Attempts	Time (s)	Main history
`hotshot-testing::tests_5::tests_5::combined_network::test_combined_network_cdn_crash`	1	76.30	passing

See the step summary for flaky tests and slowest tests.

claude · 2026-04-29T11:36:56Z

Claude finished @sveitser's task in 2m 18s —— View job

Code Review: PR #4195 — Release process fixes from testing

Summary

Clean PR that solves a real GitHub limitation (tag pushes from GITHUB_TOKEN don't trigger other workflows) and renames the rc concept to canary for clarity. The documentation rewrite is a significant improvement — the step-by-step walkthrough with screenshots will save people a lot of trial and error. The release-status script is a thoughtful operational tool with nice performance work (parallelized fetches, GHCR HTTP API, etc.).

CI Workflows

create-release.yml — The core fix is well-designed: after gh release create, dispatch build.yml via workflow_dispatch against the new tag. Adding actions: write permission is necessary and correctly scoped. The classify regex update ((\..+)?$ instead of just $) correctly makes YYYYMMDD.1 a Release rather than Pre-release.

promote-docker-tag.yml — The rc → canary rename is consistent throughout. The validation regex tightening to reject YYYYMMDD-* tags from promotion is a good safety measure, since those are internal-only. Progression chain is correctly updated.

Documentation

The walkthrough is clear and well-structured. Good call adding the "If Something Goes Wrong" section and the note about floating tags vs. manual operator steps. The "always select main in the workflow dropdown" instruction will prevent a common source of confusion.

`release-status` script

Well-written overall — good use of ThreadPoolExecutor for parallelizing network calls, type hints, and structured output. Two issues noted via inline comments:

promote_run_for substring matching (moderate) — in-based matching on display titles causes false positives: "decaf" matches "decaf.canary", and "20260408" matches "20260408.1". Using the workflow run's inputs field would give exact matching.
ghcr_token() thread-safety (minor) — the global token is read/written from multiple threads without synchronization. Harmless in practice (worst case: duplicate fetches), but easy to fix.

Minor note: fetch_promote_runs and fetch_release_classification both cap at per_page=100. This is fine for now, but could silently truncate results as the project accumulates workflow runs over time.

Justfile

Simple passthrough — looks good.

Overall

The PR is in good shape. The only item I'd suggest addressing before merge is the promote_run_for substring matching, since it can produce visibly wrong output (showing the wrong promote date or source tag).

claude · 2026-04-29T11:39:01Z

+    matches = [
+        r for r in runs
+        if r.get("conclusion") == "success"
+        and floating_tag in (r.get("display_title") or r.get("name") or "")
+        and release_tag in (r.get("display_title") or r.get("name") or "")
+    ]
+    return max(matches, key=lambda r: r["created_at"]) if matches else None


Bug: substring matching causes false positives

Both checks use in (substring containment), which leads to incorrect matches in at least two ways:

Floating tag collision — searching for decaf matches a run titled "… decaf.canary …", and mainnet matches mainnet.canary.

Release tag collision — searching for 20260408 matches "… 20260408.1 …".

So a promote of 20260408.1 to decaf.canary would incorrectly appear as a match when looking for 20260408 promoted to decaf.

Consider using word-boundary matching or parsing the workflow run inputs instead. The inputs are available on workflow_dispatch runs:

# e.g. r.get("inputs", {}).get("floating-tag") == floating_tag

claude · 2026-04-29T11:39:05Z

+_ghcr_token: str | None = None
+
+
+def ghcr_token() -> str:
+    global _ghcr_token
+    if _ghcr_token is None:
+        url = f"https://ghcr.io/token?scope=repository:{GHCR_REPO}:pull"
+        with urllib.request.urlopen(url, timeout=10) as resp:
+            _ghcr_token = json.loads(resp.read())["token"]
+    return _ghcr_token


Nit: ghcr_token() is called from multiple threads (via tag_digest submitted to the ThreadPoolExecutor), but the global _ghcr_token check-then-set isn't synchronized. In practice this just means a few redundant token fetches on the first batch, not a correctness bug. A threading.Lock would make it explicit, or you could fetch the token once before submitting the digest futures.

sveitser added 2 commits April 17, 2026 11:51

feat(ci): dispatch build.yml from create-release

2b0fa0c

Tag push events triggered by GITHUB_TOKEN don't trigger other workflows, but workflow_dispatch does. Dispatch build.yml explicitly after creating the release so Docker images get built without needing a PAT or GitHub App.

fix(ci): grant actions:write to dispatch build.yml

38021aa

Workflow dispatch requires actions:write permission; contents:write alone is not enough.

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

sveitser added 2 commits April 17, 2026 13:17

docs: add release walkthrough with screenshots

6f4f3ce

Adds a step-by-step walkthrough section referencing screenshots in doc/assets/, including a note to always select main in the "Use workflow from" dropdown.

sveitser changed the title ~~feat(ci): dispatch build.yml from create-release~~ feat(ci): release process fixes from testing Apr 17, 2026

sveitser added 9 commits April 17, 2026 15:07

chore(scripts): default --days to 45

ec81411

sveitser marked this pull request as ready for review April 27, 2026 14:57

sveitser requested review from bfish713, imabdulbasit, jbearer, lukaszrzasik and ss-es as code owners April 27, 2026 14:58

Merge branch 'main' into ma/version-scheme-2

00679af

claude Bot reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): release process fixes from testing#4195

feat(ci): release process fixes from testing#4195
sveitser wants to merge 14 commits into
mainfrom
ma/version-scheme-2

sveitser commented Apr 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

claude Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

claude Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sveitser commented Apr 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 27, 2026

Nextest failures (1) in this run

Uh oh!

claude Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #4195 — Release process fixes from testing

Summary

CI Workflows

Documentation

release-status script

Justfile

Overall

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 27, 2026 •

edited

Loading

claude Bot commented Apr 29, 2026 •

edited

Loading

`release-status` script