Skip to content

Commit 98a9030

Browse files
authored
Move benchmarks to daily cron (#1302)
* refactor: consolidate Benchmarks.yml into dep_benchmarks.yml Delete Benchmarks.yml and add its features (artifact upload, baseline_tag, baseline_run_id, retention_days inputs) to dep_benchmarks.yml. Update CreateRelease.yml to call dep_benchmarks.yml with a matrix directly. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * feat: move benchmarks from per-PR to daily cron Remove the benchmarks job from ValidatePullRequest.yml and add a new DailyBenchmarks.yml workflow that runs benchmarks daily, comparing against the previous day's run artifacts with 90-day retention. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * docs: update benchmarking docs to reflect daily cron workflow Replace references to per-PR benchmarks and Benchmarks.yml with the new DailyBenchmarks.yml and dep_benchmarks.yml workflows. Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Add permission and fix docs Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> * Update issue title Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com> --------- Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
1 parent e8414f4 commit 98a9030

File tree

6 files changed

+143
-99
lines changed

6 files changed

+143
-99
lines changed

.github/workflows/Benchmarks.yml

Lines changed: 0 additions & 69 deletions
This file was deleted.

.github/workflows/CreateRelease.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,16 @@ jobs:
7272

7373
benchmarks:
7474
needs: [build-guests]
75-
uses: ./.github/workflows/Benchmarks.yml
75+
strategy:
76+
fail-fast: true
77+
matrix:
78+
hypervisor: [hyperv, 'hyperv-ws2025', mshv3, kvm]
79+
cpu: [amd, intel]
80+
uses: ./.github/workflows/dep_benchmarks.yml
7681
secrets: inherit
82+
with:
83+
hypervisor: ${{ matrix.hypervisor }}
84+
cpu: ${{ matrix.cpu }}
7785
permissions:
7886
contents: read
7987

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json
2+
3+
name: Daily Benchmarks
4+
5+
on:
6+
schedule:
7+
- cron: '0 0 * * *' # Runs at 00:00 UTC every day
8+
workflow_dispatch: # Allow manual triggering
9+
10+
permissions:
11+
contents: read
12+
actions: read
13+
14+
jobs:
15+
# Find the most recent successful run of this workflow so we can download
16+
# its benchmark artifacts as a baseline for day-over-day comparison.
17+
find-baseline:
18+
runs-on: ubuntu-latest
19+
outputs:
20+
run-id: ${{ steps.find-run.outputs.run_id }}
21+
steps:
22+
- name: Find latest successful run
23+
id: find-run
24+
# gh run list returns runs sorted by creation date descending (implicit).
25+
# On the first-ever run, this outputs empty and dep_benchmarks.yml
26+
# will skip the baseline download (continue-on-error).
27+
run: |
28+
run_id=$(gh run list --repo "${{ github.repository }}" --workflow DailyBenchmarks.yml --status success --limit 1 --json databaseId --jq '.[0].databaseId // empty')
29+
echo "run_id=$run_id" >> "$GITHUB_OUTPUT"
30+
env:
31+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
32+
33+
# Build release guest binaries needed by the benchmark suite.
34+
build-guests:
35+
uses: ./.github/workflows/dep_build_guests.yml
36+
secrets: inherit
37+
with:
38+
config: release
39+
40+
# Run benchmarks across all hypervisor/cpu combos, comparing against
41+
# the previous day's results. Artifacts are retained for 90 days.
42+
benchmarks:
43+
needs: [build-guests, find-baseline]
44+
strategy:
45+
fail-fast: true
46+
matrix:
47+
hypervisor: [hyperv, 'hyperv-ws2025', mshv3, kvm]
48+
cpu: [amd, intel]
49+
uses: ./.github/workflows/dep_benchmarks.yml
50+
secrets: inherit
51+
with:
52+
hypervisor: ${{ matrix.hypervisor }}
53+
cpu: ${{ matrix.cpu }}
54+
baseline_run_id: ${{ needs.find-baseline.outputs.run-id }}
55+
retention_days: 90
56+
57+
# File a GitHub issue if any job fails.
58+
notify-failure:
59+
runs-on: ubuntu-latest
60+
needs: [build-guests, benchmarks]
61+
if: always() && (needs.build-guests.result == 'failure' || needs.benchmarks.result == 'failure')
62+
permissions:
63+
issues: write
64+
steps:
65+
- name: Checkout code
66+
uses: actions/checkout@v6
67+
68+
- name: Notify Benchmark Failure
69+
run: ./dev/notify-ci-failure.sh --title="Benchmark Failure - ${{ github.run_number }}" --labels="area/benchmarks,area/testing,lifecycle/needs-review,release-blocker"
70+
env:
71+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/ValidatePullRequest.yml

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -125,27 +125,6 @@ jobs:
125125
cpu: ${{ matrix.cpu }}
126126
config: ${{ matrix.config }}
127127

128-
# Run benchmarks - release only, needs guest artifacts, runs in parallel with build-test
129-
benchmarks:
130-
needs:
131-
- docs-pr
132-
- build-guests
133-
# Required because update-guest-locks is skipped on non-dependabot PRs,
134-
# and a skipped dependency transitively skips all downstream jobs.
135-
# See: https://github.com/actions/runner/issues/2205
136-
if: ${{ !cancelled() && !failure() }}
137-
strategy:
138-
fail-fast: true
139-
matrix:
140-
hypervisor: [hyperv, 'hyperv-ws2025', mshv3, kvm]
141-
cpu: [amd, intel]
142-
uses: ./.github/workflows/dep_benchmarks.yml
143-
secrets: inherit
144-
with:
145-
docs_only: ${{ needs.docs-pr.outputs.docs-only }}
146-
hypervisor: ${{ matrix.hypervisor }}
147-
cpu: ${{ matrix.cpu }}
148-
149128
fuzzing:
150129
needs:
151130
- docs-pr
@@ -187,7 +166,6 @@ jobs:
187166
- code-checks
188167
- build-test
189168
- run-examples
190-
- benchmarks
191169
- fuzzing
192170
- spelling
193171
- license-headers

.github/workflows/dep_benchmarks.yml

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json
22

3+
# Reusable workflow to run benchmarks on a single hypervisor/cpu combination.
4+
#
5+
# Baseline comparison:
6+
# The workflow supports two mutually exclusive ways to load a baseline for
7+
# Criterion to compare against:
8+
#
9+
# 1. baseline_run_id — Downloads benchmark artifacts from a previous workflow
10+
# run (by run ID). Used by DailyBenchmarks.yml for day-over-day comparison.
11+
#
12+
# 2. baseline_tag — Downloads benchmark tarballs from a GitHub Release (by tag).
13+
# If empty (the default), `gh release download` fetches from the latest
14+
# stable release. Used by CreateRelease.yml.
15+
#
16+
# If baseline_run_id is set, baseline_tag is ignored.
17+
# If neither is set, the latest stable release is used.
18+
# Both downloads use continue-on-error so the first-ever run (no baseline
19+
# available) succeeds without comparison.
20+
#
21+
# Artifact upload:
22+
# Benchmark results are always uploaded as workflow artifacts named
23+
# benchmarks_<OS>_<hypervisor>_<cpu>. The retention_days input controls
24+
# how long they are kept (default: 5 days).
25+
326
name: Run Benchmarks
427

528
on:
@@ -18,6 +41,21 @@ on:
1841
description: CPU architecture for the build (passed from caller matrix)
1942
required: true
2043
type: string
44+
baseline_tag:
45+
description: Release tag to download baseline benchmarks from (e.g. dev-latest). Ignored if baseline_run_id is set. If empty, downloads from the latest stable release.
46+
required: false
47+
type: string
48+
default: ""
49+
baseline_run_id:
50+
description: Workflow run ID to download baseline benchmark artifacts from. Takes precedence over baseline_tag.
51+
required: false
52+
type: string
53+
default: ""
54+
retention_days:
55+
description: Number of days to retain benchmark artifacts
56+
required: false
57+
type: number
58+
default: 5
2159

2260
env:
2361
CARGO_TERM_COLOR: always
@@ -74,11 +112,29 @@ jobs:
74112
- name: Build
75113
run: just build release
76114

77-
- name: Download benchmarks from "latest"
78-
run: just bench-download ${{ runner.os }} ${{ inputs.hypervisor }} ${{ inputs.cpu }} dev-latest # compare to prerelease
115+
- name: Download baseline from previous run
116+
if: ${{ inputs.baseline_run_id != '' }}
117+
uses: actions/download-artifact@v8
118+
with:
119+
name: benchmarks_${{ runner.os }}_${{ inputs.hypervisor }}_${{ inputs.cpu }}
120+
path: ./target/criterion/
121+
run-id: ${{ inputs.baseline_run_id }}
122+
github-token: ${{ secrets.GITHUB_TOKEN }}
123+
continue-on-error: true
124+
125+
- name: Download baseline from release
126+
if: ${{ inputs.baseline_run_id == '' }}
127+
run: just bench-download ${{ runner.os }} ${{ inputs.hypervisor }} ${{ inputs.cpu }} ${{ inputs.baseline_tag }}
79128
env:
80129
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
81130
continue-on-error: true
82131

83132
- name: Run benchmarks
84133
run: just bench-ci main
134+
135+
- uses: actions/upload-artifact@v7
136+
with:
137+
name: benchmarks_${{ runner.os }}_${{ inputs.hypervisor }}_${{ inputs.cpu }}
138+
path: ./target/criterion/
139+
if-no-files-found: error
140+
retention-days: ${{ inputs.retention_days }}

docs/benchmarking-hyperlight.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
Hyperlight uses the [Criterion](https://bheisler.github.io/criterion.rs/book/index.html) framework to run and analyze benchmarks. A benefit to this framework is that it doesn't require the nightly toolchain.
44

5-
## When Benchmarks are ran
5+
## When Benchmarks are run
66

7-
1. Every time a branch gets a push
8-
- Compares the current branch benchmarking results to the "dev-latest" release (which is the most recent push to "main" branch). This is done as part of `dep_rust.yml`, which is invoked by `ValidatePullRequest.yml`. These benchmarks are for the developer to compare their branch to main, and the results can only be seen in the GitHub action logs, and nothing is saved.
7+
1. Daily (scheduled)
8+
- Benchmarks run daily via `DailyBenchmarks.yml`, comparing results against the previous day's run. Results are stored as workflow artifacts with 90-day retention.
99

1010
```
1111
sandboxes/create_sandbox
@@ -15,9 +15,9 @@ Hyperlight uses the [Criterion](https://bheisler.github.io/criterion.rs/book/ind
1515
```
1616
1717
2. For each release
18-
- For each release, benchmarks are ran as part of the release pipeline in `CreateRelease.yml`, which invokes `Benchmarks.yml`. These benchmark results are compared to the previous release, and are uploaded as port of the "Release assets" on the GitHub release page.
18+
- For each release, benchmarks are run as part of the release pipeline in `CreateRelease.yml`, which invokes `dep_benchmarks.yml`. These benchmark results are compared to the previous release, and are uploaded as part of the "Release assets" on the GitHub release page.
1919
20-
Currently, benchmarks are ran on windows, linux-kvm (ubuntu), and linux-hyperv (mariner). Only release builds are benchmarked, not debug.
20+
Currently, benchmarks are run on windows, linux-kvm (ubuntu), and linux-hyperv (mariner). Only release builds are benchmarked, not debug.
2121
2222
## Criterion artifacts
2323

0 commit comments

Comments
 (0)