Skip to content

fix(ci): use full image tag for prover-agent in deploy-network#23267

Closed
AztecBot wants to merge 1 commit into
nextfrom
claudebox/investigate-next-net-deploy-fail
Closed

fix(ci): use full image tag for prover-agent in deploy-network#23267
AztecBot wants to merge 1 commit into
nextfrom
claudebox/investigate-next-net-deploy-fail

Conversation

@AztecBot
Copy link
Copy Markdown
Collaborator

Summary

Next-net nightly deploys have been failing nightly with ImagePullBackOff for aztecprotocol/aztec-prover-agent:5.0.0 — a tag that doesn't exist. The actual published tag is aztec-prover-agent:5.0.0-nightly.YYYYMMDD-amd64.

Root cause: .github/workflows/deploy-network.yml constructs PROVER_AGENT_DOCKER_IMAGE from bare inputs.semver instead of the resolved aztec image tag, dropping the -nightly.YYYYMMDD-amd64 suffix. The aztec image itself is constructed correctly from inputs.aztec_docker_image. Regression landed in 717a90c4 (2026-04-16) when this workflow was refactored.

The deploy step times out after ~12 minutes on the failed terraform helm_release for the prover stack (prover-agent pods stuck in ImagePullBackOff).

Proposed fix

ci_allow is false on this session, so I wrote the fix to .github-new/workflows/deploy-network.yml rather than touching .github/. The diff against .github/workflows/deploy-network.yml:

-          # Only use the separate prover-agent image for official semver builds;
-          # for custom images, let the deploy script fall back to AZTEC_DOCKER_IMAGE
-          if [[ -n "${{ inputs.semver }}" ]]; then
-            echo "PROVER_AGENT_DOCKER_IMAGE=aztecprotocol/aztec-prover-agent:${{ inputs.semver }}" >> $GITHUB_ENV
-          fi
+          # Construct prover-agent image using the same tag as the aztec image,
+          # since both are published together for official builds (incl. nightlies).
+          # Using bare `inputs.semver` dropped the `-nightly.YYYYMMDD-amd64` suffix
+          # and made the prover-agent ImagePullBackOff every nightly deploy.
+          IMAGE_TAG="${AZTEC_DOCKER_IMAGE##*:}"
+          echo "PROVER_AGENT_DOCKER_IMAGE=aztecprotocol/aztec-prover-agent:$IMAGE_TAG" >> $GITHUB_ENV

A maintainer needs to either start a new claudebox session with ci-allow to apply the same change to .github/, or move the file from .github-new/ to .github/ manually.

Analysis details

Full investigation (events, timeline, edge-case discussion): https://gist.github.com/AztecBot/d5503dc619fb15d3c69afb89fd03fee1

Test plan

ClaudeBox log: https://claudebox.work/s/05c193221e4b7d52?run=1

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 14, 2026
@alexghr
Copy link
Copy Markdown
Contributor

alexghr commented May 14, 2026

Closing in favour of #23268 which checks whether the image is available first otherwise it falls back to the default aztec image

@alexghr alexghr closed this May 14, 2026
@alexghr alexghr deleted the claudebox/investigate-next-net-deploy-fail branch May 14, 2026 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants