-
Notifications
You must be signed in to change notification settings - Fork 93
ci: PR/post-merge/RC pipelines + buildx worker pattern across all build workflows #951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
saturley-hall
wants to merge
16
commits into
main
Choose a base branch
from
harrison/improve-ci-from-release-automation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
079212d
ci(actions): add bootstrap-buildkit composite action
saturley-hall 7e44cf0
ci(actions): add release-metadata composite action
saturley-hall 51e1ce2
ci: add pr.yaml workflow for pull-request/* mirror branches
saturley-hall 9572126
ci: add post-merge.yaml workflow for main and release/* branches
saturley-hall 7c73593
ci(nightly): refactor to single-buildx multi-platform; drop registry …
saturley-hall 8cf5c92
ci: add rc.yaml workflow (promote-only, manual dispatch)
saturley-hall 25fe3ca
chore: remove stale ATTRIBUTIONS-Python.md and ATTRIBUTIONS-container.md
saturley-hall daa9ae4
docs(contributing): document the four release pipelines
saturley-hall cdabbd8
ci(fix): stop exporting container_image/container_ref as job outputs
saturley-hall 88f1498
Merge branch 'main' into harrison/improve-ci-from-release-automation
saturley-hall 8ece9df
feat(pyproject): canonical [project.urls] (Homepage / Repository / Is…
saturley-hall 3b5683e
ci: extract commit-SHA embed to tools/embed_commit_sha.py
saturley-hall 40ba3b4
ci(pr,post-merge): split into build + stage-wheel; jf rt upload
saturley-hall 455e482
ci(actions): bootstrap-buildkit K8s fallback uses purpose=build
saturley-hall 63dde60
ci(actions): bootstrap-buildkit K8s fallback tolerates buildkit-worker
saturley-hall b1fb143
ci: mark tools/embed_commit_sha.py executable
saturley-hall File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,179 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| name: 'Bootstrap BuildKit' | ||
| description: 'Bootstrap a multi-arch buildx builder using the centralized aiperf-builder pods (remote driver) with Kubernetes driver fallback' | ||
|
|
||
| # Connects to the aiperf-builder StatefulSet pods at index 0 for each | ||
| # requested architecture via headless service DNS. Falls back to the | ||
| # Kubernetes driver if the pods are unreachable. | ||
| # | ||
| # DNS pattern: buildkit-{arch}-0.buildkit-{arch}-headless.buildkit.svc.cluster.local | ||
| # | ||
| # Usage: | ||
| # - uses: ./.github/actions/bootstrap-buildkit | ||
| # with: | ||
| # builder_name: my-builder | ||
| # arch: 'linux/amd64,linux/arm64' | ||
| # | ||
| # Cleanup usage (skip_bootstrap: true re-registers the builder so it can be removed): | ||
| # - uses: ./.github/actions/bootstrap-buildkit | ||
| # with: | ||
| # builder_name: my-builder | ||
| # skip_bootstrap: 'true' | ||
|
|
||
| inputs: | ||
| builder_name: | ||
| description: 'Name for the buildx builder' | ||
| required: true | ||
| arch: | ||
| description: 'Comma-separated Docker platform(s): linux/amd64, linux/arm64, or linux/amd64,linux/arm64' | ||
| required: false | ||
| default: 'linux/amd64,linux/arm64' | ||
| skip_bootstrap: | ||
| description: 'Skip the bootstrap step — only re-registers the builder context. Use in cleanup jobs.' | ||
| required: false | ||
| default: 'false' | ||
| fresh_builder: | ||
| description: 'Force removal and recreation of the builder before bootstrapping. Use in pre-warm jobs.' | ||
| required: false | ||
| default: 'false' | ||
|
|
||
| # Kubernetes fallback driver options | ||
| namespace: | ||
| description: 'Kubernetes namespace for fallback buildkit pods' | ||
| required: false | ||
| default: 'buildkit' | ||
| replicas: | ||
| description: 'Number of fallback buildkit replicas' | ||
| required: false | ||
| default: '1' | ||
| requests_cpu: | ||
| description: 'CPU requests for fallback buildkit pods' | ||
| required: false | ||
| default: '12' | ||
| requests_memory: | ||
| description: 'Memory requests for fallback buildkit pods' | ||
| required: false | ||
| default: '26Gi' | ||
| limits_memory: | ||
| description: 'Memory limits for fallback buildkit pods' | ||
| required: false | ||
| default: '29Gi' | ||
| tolerations: | ||
| description: 'Tolerations for fallback buildkit pods' | ||
| required: false | ||
| default: 'key=buildkit-worker,value=true,operator=Equal,effect=NoSchedule' | ||
|
|
||
| runs: | ||
| using: "composite" | ||
| steps: | ||
| - name: Ensure nslookup is available | ||
| shell: bash | ||
| run: | | ||
| if ! command -v nslookup >/dev/null 2>&1; then | ||
| sudo apt-get update -qq && sudo apt-get install -y -qq dnsutils | ||
| fi | ||
|
|
||
| - name: Resolve worker addresses | ||
| id: resolve | ||
| shell: bash | ||
| run: | | ||
| NAMESPACE="buildkit" | ||
| PORT=1234 | ||
|
|
||
| # Strip linux/ prefix to get bare arch names | ||
| ARCH="${{ inputs.arch }}" | ||
| ARCH="${ARCH//linux\//}" | ||
|
|
||
| WORKER_ADDRS="" | ||
| IFS=',' read -ra ARCHS <<< "$ARCH" | ||
| for arch in "${ARCHS[@]}"; do | ||
| DNS="buildkit-${arch}-0.buildkit-${arch}-headless.${NAMESPACE}.svc.cluster.local" | ||
| if nslookup "$DNS" >/dev/null 2>&1; then | ||
| WORKER_ADDRS="${WORKER_ADDRS:+${WORKER_ADDRS},}tcp://${DNS}:${PORT}" | ||
| echo "Resolved ${arch} worker: ${DNS}" | ||
| else | ||
| echo "No DNS for ${arch} worker (${DNS})" | ||
| fi | ||
| done | ||
|
|
||
| echo "worker_addresses=${WORKER_ADDRS}" >> "$GITHUB_OUTPUT" | ||
|
|
||
| - name: Handle fresh builder | ||
| if: inputs.fresh_builder == 'true' | ||
| shell: bash | ||
| run: | | ||
| if docker buildx inspect "${{ inputs.builder_name }}" > /dev/null 2>&1; then | ||
| echo "Forcing fresh builder: removing existing '${{ inputs.builder_name }}'." | ||
| docker buildx rm "${{ inputs.builder_name }}" || true | ||
| fi | ||
|
|
||
| - name: Create builder (remote driver) | ||
| if: steps.resolve.outputs.worker_addresses != '' | ||
| shell: bash | ||
| run: | | ||
| IFS=',' read -ra ADDRS <<< "${{ steps.resolve.outputs.worker_addresses }}" | ||
| FIRST=true | ||
| for addr in "${ADDRS[@]}"; do | ||
| if $FIRST; then | ||
| docker buildx create --use --name "${{ inputs.builder_name }}" --driver remote "$addr" | ||
| FIRST=false | ||
| else | ||
| docker buildx create --append --name "${{ inputs.builder_name }}" --driver remote "$addr" | ||
| fi | ||
| done | ||
|
|
||
| - name: Create builder (Kubernetes fallback) | ||
| if: steps.resolve.outputs.worker_addresses == '' | ||
| shell: bash | ||
| run: | | ||
| echo "::warning::Remote aiperf-builder pods unreachable — falling back to Kubernetes driver" | ||
| echo "## Fallback Build Warning" >> $GITHUB_STEP_SUMMARY | ||
| echo "aiperf-builder pods unavailable. Running on fallback Kubernetes pod. Please alert the ops team." >> $GITHUB_STEP_SUMMARY | ||
|
|
||
| # Strip linux/ prefix | ||
| ARCH="${{ inputs.arch }}" | ||
| ARCH="${ARCH//linux\//}" | ||
|
|
||
| FIRST=true | ||
| IFS=',' read -ra ARCHS <<< "$ARCH" | ||
| for arch in "${ARCHS[@]}"; do | ||
| # Comma-containing values (nodeselector, tolerations) must be wrapped in | ||
| # inner double quotes so buildx's CSV parser treats them as a single field | ||
| # rather than splitting on commas into separate driver options. | ||
| OPTS=( | ||
| '--driver-opt=namespace=${{ inputs.namespace }}' | ||
| '--driver-opt=loadbalance=sticky' | ||
| '--driver-opt=replicas=${{ inputs.replicas }}' | ||
| '--driver-opt=requests.cpu=${{ inputs.requests_cpu }}' | ||
| '--driver-opt=requests.memory=${{ inputs.requests_memory }}' | ||
| '--driver-opt=limits.memory=${{ inputs.limits_memory }}' | ||
| "--driver-opt=\"nodeselector=kubernetes.io/arch=${arch},purpose=build\"" | ||
| '--driver-opt="tolerations=${{ inputs.tolerations }}"' | ||
| ) | ||
| if $FIRST; then | ||
| docker buildx create --use --name "${{ inputs.builder_name }}" \ | ||
| --driver kubernetes --platform "linux/${arch}" "${OPTS[@]}" | ||
| FIRST=false | ||
| else | ||
| docker buildx create --append --name "${{ inputs.builder_name }}" \ | ||
| --driver kubernetes --platform "linux/${arch}" "${OPTS[@]}" | ||
| fi | ||
| done | ||
|
|
||
| - name: Bootstrap builder | ||
| if: inputs.skip_bootstrap != 'true' | ||
| shell: bash | ||
| run: | | ||
| for i in 1 2 3; do | ||
| if docker buildx inspect "${{ inputs.builder_name }}" --bootstrap; then | ||
| echo "Bootstrap succeeded on attempt $i" | ||
| break | ||
| fi | ||
| if [ "$i" -eq 3 ]; then | ||
| echo "::error::Bootstrap failed after 3 attempts" | ||
| exit 1 | ||
| fi | ||
| echo "::warning::Bootstrap attempt $i failed, retrying in 10s..." | ||
| sleep 10 | ||
| done | ||
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize
archtokens and require full resolution before taking the remote path.The input docs allow
linux/amd64, linux/arm64, but the parser never trims per-token whitespace, so the second entry becomesarm64and generates invalid DNS/platform strings. Also,worker_addresses != ''accepts a partially resolved set, so a requested multi-arch build can proceed with only one worker instead of falling back. Trim each token before use and only emitworker_addresseswhen every requested arch resolves.Suggested fix
FIRST=true IFS=',' read -ra ARCHS <<< "$ARCH" for arch in "${ARCHS[@]}"; do + arch="${arch//[[:space:]]/}" # Comma-containing values (nodeselector, tolerations) must be wrapped inAlso applies to: 84-100, 112-127, 134-140
🤖 Prompt for AI Agents