Skip to content

feat(ci3): run uploadable benchmarks on a dedicated on-demand instance#24028

Merged
AztecBot merged 1 commit into
nextfrom
ci3-dedicated-bench
Jun 16, 2026
Merged

feat(ci3): run uploadable benchmarks on a dedicated on-demand instance#24028
AztecBot merged 1 commit into
nextfrom
ci3-dedicated-bench

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Important

Depends on the IAM change aztec-labs-eng/iac#6 (grants ci3-build-instance-role the launch/SSM/PassRole surface). That must apply first, else the build instance's create-fleet hits UnauthorizedOperation.

Problem

Spot diversification (create-fleet) means build instances now land on variable EC2 types — m6a/m7a/m6i/r6a/r7a at 16/32/48xlarge, AMD vs Intel. The in-build benchmark phase runs on that box, so wall-time numbers vary by hardware family far more than the 105% regression alert threshold → false regressions. (The instance type isn't even recorded in the bench JSON.)

Approach

Only the canonical merge-queue→next series (the one used for real regression tracking) runs benches on a dedicated, fixed, on-demand m6a.16xlarge. PR ci-full runs keep running benches inline on the contended build box purely as a breakage check — no dedicated box, no upload.

Benches are scheduled by the existing test engine: when the build completes in build_and_test (full builds only),

  • upload runs (SHOULD_UPLOAD_BENCHMARKS=1): launch the dedicated box via ./ci.sh bench as a backgrounded, colored, denoised job (logged like the test engine) and wait on it (non-fatal) before returning;
  • otherwise: bench_cmds >> $test_cmds_file — benches become ordinary test commands.

ci.sh benchbootstrap_ec2 blocks until the remote ci-bench finishes (ending in cache_upload bench-<treehash>), so the wait is the whole rendezvous. Results reach the GA Upload benchmarks step unchanged via that cache key (ci3_success.sh gh-bench).

Changes

  • bootstrap.sh: drop inline bench from ci-full/ci-full-no-test-cache; add the build_and_test launch/append hook + non-fatal wait; new ci-bench mode = cache-hit make full + bench (no test engine).
  • ci.sh: new bench launcher — AWS_INSTANCE=m6a.16xlarge NO_SPOT=1 (pins a fixed on-demand type; CPUS not needed since AWS_INSTANCE bypasses pool sizing).
  • ci3/bench_engine: drop the 8-core OS isolation / HT-disable / pinning. Dedicated box → benches use the full machine, honouring per-bench CPUS via the strict scheduler (defaults to nproc/2 without BENCH_CPU_COUNT). This is what lets the 64-vCPU 16xlarge satisfy the CPUS=32 bb rollup bench.
  • .github/ci3_labels_to_env.sh: scope SHOULD_UPLOAD_BENCHMARKS to merge-queue→next (it now also gates the dedicated box). ci3/bootstrap_ec2: pass it through to the instance.

Notes

  • One-time baseline shift in bench/next: different machine + no isolation changes absolute numbers once; stable thereafter. May want to annotate the series.
  • Soft failure: a bench-box failure is logged and the run proceeds (no fresh numbers) rather than blocking the merge.
  • PR benches-as-tests: :PARALLEL=0 serial benches lose one-at-a-time isolation and run contended — fine for breakage-only; real numbers come from the dedicated box's bench_engine path.
  • Validated: all touched scripts pass bash -n; the AWS_INSTANCE+NO_SPOT fixed-on-demand launch mechanism was verified live during the create-fleet work. Full e2e is exercised by a merge-queue→next run once the iac PR lands.

> [!IMPORTANT]
> Depends on the IAM change aztec-labs-eng/iac#6 (grants `ci3-build-instance-role` the launch/SSM/PassRole surface). **That must apply first**, else the build instance's `create-fleet` hits `UnauthorizedOperation`.

## Problem

Spot diversification (create-fleet) means build instances now land on variable EC2 types — m6a/m7a/m6i/r6a/r7a at 16/32/48xlarge, AMD vs Intel. The in-build benchmark phase runs on that box, so wall-time numbers vary by hardware family far more than the 105% regression alert threshold → false regressions. (The instance type isn't even recorded in the bench JSON.)

## Approach

Only the canonical **merge-queue→next** series (the one used for real regression tracking) runs benches on a **dedicated, fixed, on-demand m6a.16xlarge**. PR `ci-full` runs keep running benches inline on the contended build box purely as a **breakage check** — no dedicated box, no upload.

Benches are scheduled by the existing test engine: when the build completes in `build_and_test` (full builds only),
- **upload runs** (`SHOULD_UPLOAD_BENCHMARKS=1`): launch the dedicated box via `./ci.sh bench` as a backgrounded, colored, denoised job (logged like the test engine) and `wait` on it (non-fatal) before returning;
- **otherwise**: `bench_cmds >> $test_cmds_file` — benches become ordinary test commands.

`ci.sh bench` → `bootstrap_ec2` blocks until the remote `ci-bench` finishes (ending in `cache_upload bench-<treehash>`), so the `wait` is the whole rendezvous. Results reach the GA `Upload benchmarks` step unchanged via that cache key (`ci3_success.sh` `gh-bench`).

## Changes

- **`bootstrap.sh`**: drop inline `bench` from `ci-full`/`ci-full-no-test-cache`; add the `build_and_test` launch/append hook + non-fatal `wait`; new `ci-bench` mode = cache-hit `make full` + `bench` (no test engine).
- **`ci.sh`**: new `bench` launcher — `AWS_INSTANCE=m6a.16xlarge NO_SPOT=1` (pins a fixed on-demand type; `CPUS` not needed since `AWS_INSTANCE` bypasses pool sizing).
- **`ci3/bench_engine`**: drop the 8-core OS isolation / HT-disable / pinning. Dedicated box → benches use the full machine, honouring per-bench `CPUS` via the strict scheduler (defaults to `nproc/2` without `BENCH_CPU_COUNT`). This is what lets the 64-vCPU 16xlarge satisfy the `CPUS=32` bb rollup bench.
- **`.github/ci3_labels_to_env.sh`**: scope `SHOULD_UPLOAD_BENCHMARKS` to merge-queue→next (it now also gates the dedicated box). **`ci3/bootstrap_ec2`**: pass it through to the instance.

## Notes
- **One-time baseline shift** in `bench/next`: different machine + no isolation changes absolute numbers once; stable thereafter. May want to annotate the series.
- **Soft failure**: a bench-box failure is logged and the run proceeds (no fresh numbers) rather than blocking the merge.
- **PR benches-as-tests**: `:PARALLEL=0` serial benches lose one-at-a-time isolation and run contended — fine for breakage-only; real numbers come from the dedicated box's `bench_engine` path.
- Validated: all touched scripts pass `bash -n`; the `AWS_INSTANCE`+`NO_SPOT` fixed-on-demand launch mechanism was verified live during the create-fleet work. Full e2e is exercised by a merge-queue→next run once the iac PR lands.
@AztecBot AztecBot force-pushed the ci3-dedicated-bench branch from eadb025 to 19de9f1 Compare June 16, 2026 14:58
@AztecBot AztecBot added this pull request to the merge queue Jun 16, 2026
Merged via the queue into next with commit 61baf08 Jun 16, 2026
19 checks passed
@AztecBot AztecBot deleted the ci3-dedicated-bench branch June 16, 2026 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants