Commit 0c6a593
authored
fix(ci3): scope build-instance name by repo to stop cross-repo reaping (v5-next) (#23988)
## Problem
`ci3/bootstrap_ec2` terminates any existing instance that shares the
target `Name` tag before launching — this intentionally reaps orphans
left when a GA run is cancelled (e.g. by a new push) on the same ref.
But the name was `<ref>_<arch>[_<postfix>]` with **no repo component**,
so `aztec-packages` and `aztec-packages-private` — which build the same
tags/refs concurrently under the **same OIDC role** — computed identical
names and reaped each other's live instances.
### Observed incident
Nightly tag `v5.0.0-nightly.20260610` was built in **both** repos.
Instance `i-02e5d6a6c148ec726`
(`v5_0_0-nightly_20260610_arm64_a-release`) was launched by the private
repo's run at 03:06:01 UTC and **terminated at 03:13:12 UTC by the
public repo's run** for the same tag (its pre-launch reap step), ~7 min
in — failing the private build. CloudTrail confirms a
`TerminateInstances` from a different `ci3-<run_id>` session, not a spot
interruption.
## Fix
Prefix the instance name with the repo basename
(`${GITHUB_REPOSITORY##*/}`, defaulting to `aztec-packages` for local
runs):
- **Within a repo**, the key is unchanged in spirit
(`<repo>_<ref>_<arch>`) and stays stable across re-runs/new-pushes of
the same ref — so the intended orphan-on-cancel cleanup still works.
- **Across repos**, public → `aztec-packages_…` and private →
`aztec-packages-private_…`, so they no longer match and can't reap each
other.
`ci.sh`'s helper `instance_name` (used by the `shell`/`kill`/`get-ip`
dev commands) is kept in sync so it still resolves instances launched by
a CI run for the same repo.
### Notes
- The EC2 `Name` tag limit is 256 chars; the longest prefixed name is
~61 chars. The reap match uses the full `Name` tag, so the cosmetic
63-char `docker_hostname` truncation doesn't affect correctness.
- One-time transition: instances launched by the old (un-prefixed) code
won't be reaped by name-match from new runs; they fall back to the
shutdown timer / 1.5h reaper. Self-heals within a couple hours.
- This stops the *collision*. Whether public **and** private *should*
both build the same nightly tag (duplicated work) is a separate question
— happy to follow up if you want one gated off.2 files changed
Lines changed: 15 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
69 | 76 | | |
70 | | - | |
| 77 | + | |
71 | 78 | | |
72 | | - | |
| 79 | + | |
73 | 80 | | |
74 | 81 | | |
75 | 82 | | |
| |||
0 commit comments