Skip to content

Commit 0c6a593

Browse files
authored
fix(ci3): scope build-instance name by repo to stop cross-repo reaping (v5-next) (#23988)
## Problem `ci3/bootstrap_ec2` terminates any existing instance that shares the target `Name` tag before launching — this intentionally reaps orphans left when a GA run is cancelled (e.g. by a new push) on the same ref. But the name was `<ref>_<arch>[_<postfix>]` with **no repo component**, so `aztec-packages` and `aztec-packages-private` — which build the same tags/refs concurrently under the **same OIDC role** — computed identical names and reaped each other's live instances. ### Observed incident Nightly tag `v5.0.0-nightly.20260610` was built in **both** repos. Instance `i-02e5d6a6c148ec726` (`v5_0_0-nightly_20260610_arm64_a-release`) was launched by the private repo's run at 03:06:01 UTC and **terminated at 03:13:12 UTC by the public repo's run** for the same tag (its pre-launch reap step), ~7 min in — failing the private build. CloudTrail confirms a `TerminateInstances` from a different `ci3-<run_id>` session, not a spot interruption. ## Fix Prefix the instance name with the repo basename (`${GITHUB_REPOSITORY##*/}`, defaulting to `aztec-packages` for local runs): - **Within a repo**, the key is unchanged in spirit (`<repo>_<ref>_<arch>`) and stays stable across re-runs/new-pushes of the same ref — so the intended orphan-on-cancel cleanup still works. - **Across repos**, public → `aztec-packages_…` and private → `aztec-packages-private_…`, so they no longer match and can't reap each other. `ci.sh`'s helper `instance_name` (used by the `shell`/`kill`/`get-ip` dev commands) is kept in sync so it still resolves instances launched by a CI run for the same repo. ### Notes - The EC2 `Name` tag limit is 256 chars; the longest prefixed name is ~61 chars. The reap match uses the full `Name` tag, so the cosmetic 63-char `docker_hostname` truncation doesn't affect correctness. - One-time transition: instances launched by the old (un-prefixed) code won't be reaped by name-match from new runs; they fall back to the shutdown timer / 1.5h reaper. Self-heals within a couple hours. - This stops the *collision*. Whether public **and** private *should* both build the same nightly tag (duplicated work) is a separate question — happy to follow up if you want one gated off.
2 parents 6f5a421 + ee34ae0 commit 0c6a593

2 files changed

Lines changed: 15 additions & 4 deletions

File tree

ci.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,11 @@ function print_usage {
5353

5454
[ -n "$cmd" ] && shift
5555

56-
instance_name=${INSTANCE_NAME:-$(echo -n "$BRANCH" | tr -c 'a-zA-Z0-9-' '_')_${arch}}
56+
# Keep this in sync with bootstrap_ec2's instance_name scheme (repo-scoped) so the
57+
# shell/kill/get-ip helpers find instances launched by a CI run for this repo.
58+
repo=${GITHUB_REPOSITORY##*/}
59+
repo=${repo:-aztec-packages}
60+
instance_name=${INSTANCE_NAME:-${repo}_$(echo -n "$BRANCH" | tr -c 'a-zA-Z0-9-' '_')_${arch}}
5761
[ -n "${INSTANCE_POSTFIX:-}" ] && instance_name+="_$INSTANCE_POSTFIX"
5862

5963
function get_ip_for_instance {

ci3/bootstrap_ec2

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -65,11 +65,18 @@ if [[ "$(git fetch origin --negotiate-only --negotiation-tip="$current_commit")"
6565
fi
6666

6767
# Our instance_name acts as a uniqueness key for the instance.
68-
# Instances are terminated if they exist with the same name.
68+
# Instances are terminated if they exist with the same name; this reaps orphans
69+
# left when a GA run is cancelled (e.g. by a new push) on the same ref.
70+
# Scope the key to the repo: aztec-packages and aztec-packages-private can build
71+
# the same tag/ref concurrently under the same role, and must not reap each
72+
# other's instances. The key stays stable across re-runs within a repo, so the
73+
# orphan cleanup still works.
74+
repo=${GITHUB_REPOSITORY##*/}
75+
repo=${repo:-aztec-packages}
6976
if [[ "$REF_NAME" =~ ^gh-readonly-queue/.*(pr-[0-9]+) ]]; then
70-
instance_name="${BASH_REMATCH[1]}_$arch"
77+
instance_name="${repo}_${BASH_REMATCH[1]}_$arch"
7178
else
72-
instance_name=$(echo -n "$REF_NAME" | head -c 50 | tr -c 'a-zA-Z0-9-' '_')_$arch
79+
instance_name="${repo}_$(echo -n "$REF_NAME" | head -c 50 | tr -c 'a-zA-Z0-9-' '_')_$arch"
7380
fi
7481

7582
state_dir=$(mktemp -d /tmp/aws_request_instance.XXXXXX)

0 commit comments

Comments
 (0)