fix(ci): use full image tag for prover-agent in deploy-network#23267
Closed
AztecBot wants to merge 1 commit into
Closed
fix(ci): use full image tag for prover-agent in deploy-network#23267AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
Contributor
|
Closing in favour of #23268 which checks whether the image is available first otherwise it falls back to the default aztec image |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Next-net nightly deploys have been failing nightly with
ImagePullBackOffforaztecprotocol/aztec-prover-agent:5.0.0— a tag that doesn't exist. The actual published tag isaztec-prover-agent:5.0.0-nightly.YYYYMMDD-amd64.Root cause:
.github/workflows/deploy-network.ymlconstructsPROVER_AGENT_DOCKER_IMAGEfrom bareinputs.semverinstead of the resolved aztec image tag, dropping the-nightly.YYYYMMDD-amd64suffix. The aztec image itself is constructed correctly frominputs.aztec_docker_image. Regression landed in717a90c4(2026-04-16) when this workflow was refactored.The deploy step times out after ~12 minutes on the failed terraform
helm_releasefor the prover stack (prover-agent pods stuck inImagePullBackOff).Proposed fix
ci_allow is
falseon this session, so I wrote the fix to.github-new/workflows/deploy-network.ymlrather than touching.github/. The diff against.github/workflows/deploy-network.yml:A maintainer needs to either start a new claudebox session with
ci-allowto apply the same change to.github/, or move the file from.github-new/to.github/manually.Analysis details
Full investigation (events, timeline, edge-case discussion): https://gist.github.com/AztecBot/d5503dc619fb15d3c69afb89fd03fee1
Test plan
.github/and confirm the prover-agent pods pull the nightly tag and become ready.NoCommitteeErrorafter ~77 min (this is expected during warm-up, not a regression).ClaudeBox log: https://claudebox.work/s/05c193221e4b7d52?run=1