Skip to content

Commit bb75b1f

Browse files
committed
ci: integrate CodSpeed continuous benchmarking
Wire the existing criterion benches into CodSpeed (https://codspeed.io) for continuous performance tracking. CodSpeed runs benches under CPU simulation in CI and posts per-PR comparison reports vs. the base branch's latest main run. Highlights ========== - `criterion` workspace dependency renamed to `codspeed-criterion-compat`: a drop-in passthrough that wraps real criterion when running outside cargo-codspeed, so bench source code needs no changes (`use criterion::*` keeps working) and `cargo bench` locally is unaffected. - Two workflows: - `.github/workflows/codspeed.yml` runs on every push to main and populates the base-branch baseline. - `.github/workflows/codspeed-pr.yml` runs on PRs when a `bench:*` label is attached, so external contributors don't blindly burn CI capacity. Labels are namespaced per crate: bench:all # whole workspace bench:arrow # all of arrow's benches bench:parquet bench:arrow-cast # union - Sharded one job per `[[bench]]` target (~78 shards after exclusions). Required because (a) the full workspace produces >1000 individual benchmarks per upload, and (b) the parquet crate alone produces >1000 due to heavy criterion parameterization, both of which exceed CodSpeed's per-upload limit. Jobs in the same workflow are auto-aggregated by CodSpeed into a single report. Ref: https://codspeed.io/docs/features/sharded-benchmarks - Build-once / run-many topology: setup ─┐ ├──→ bench (matrix, N shards) build ─┘ `build` does the full-workspace `cargo codspeed build` exactly once and uploads `target/codspeed/` as a tar artifact (tar preserves the +x bit, which `actions/upload-artifact` strips otherwise). Each bench shard downloads the artifact and invokes `cargo codspeed run -p <crate> --bench <bench>`. No rebuild per shard, so CI cost scales with N shards × ~2 min instead of ×10 min. - Dynamic matrix: `setup` parses every workspace member's Cargo.toml for `[[bench]]` entries with awk + jq and emits a JSON `{crate, bench}` array, so new bench targets are picked up automatically without touching the workflow. - Auth: GitHub OIDC. No `CODSPEED_TOKEN` secret needed for the public repo; the workflow's `id-token` claim is what CodSpeed verifies. Exclusions ========== Ten bench targets currently fail at runtime (e.g. `merge_kernels` panics in `arrow-data/src/transform/primitive.rs:31`); these are pre-existing issues in the benches themselves, not the integration. They're listed in an `EXCLUDED_BENCHES` env in both workflows so the remaining ~78 shards run clean. Each excluded target should be fixed and removed from the list one by one. Prerequisites for activation ============================ - Install the CodSpeed GitHub App on `apache/arrow-rs`: https://github.com/apps/codspeed - Enroll the repository at https://codspeed.io (the OIDC integration is automatic for public repos; no secret token configuration required) Once both are done, the first push to main will populate the baseline and PRs labeled `bench:*` will receive automated CodSpeed comparison comments.
1 parent 2b2a95a commit bb75b1f

5 files changed

Lines changed: 547 additions & 70 deletions

File tree

.github/workflows/codspeed-pr.yml

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
# Opt-in CodSpeed benchmarking for pull requests, gated by labels and
19+
# sharded one job per `[[bench]]` target in each selected crate.
20+
#
21+
# Label convention (managed manually on each PR):
22+
#
23+
# bench:all # every [[bench]] in the workspace
24+
# bench:<crate> # every [[bench]] in that crate
25+
# bench:<crate> bench:<crate> # union
26+
#
27+
# Where <crate> is a workspace member name, e.g. `bench:arrow`,
28+
# `bench:parquet`, `bench:arrow-cast`. `bench:all` short-circuits and
29+
# supersedes any per-crate labels.
30+
#
31+
# Topology mirrors codspeed.yml (setup + build run in parallel; bench
32+
# is a matrix that downloads the build artifact and runs one bench
33+
# target per shard). The `setup` job additionally filters the matrix
34+
# by labels.
35+
#
36+
# Authorization: only users with write access to the repo can add
37+
# labels, so the label is itself the authorization gate.
38+
#
39+
# Baseline: native `pull_request` event → CodSpeed compares against
40+
# the base branch's latest CodSpeed report automatically.
41+
#
42+
# Fork PR caveat: workflows triggered by `pull_request` from fork PRs
43+
# do not get an OIDC token. For benches on fork PRs, push the branch
44+
# to this repo and label it there.
45+
46+
name: codspeed-pr
47+
48+
on:
49+
pull_request:
50+
types: [labeled, synchronize, opened, reopened]
51+
52+
concurrency:
53+
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
54+
cancel-in-progress: true
55+
56+
permissions:
57+
contents: read
58+
id-token: write
59+
pull-requests: write
60+
61+
env:
62+
CODSPEED_FEATURES: arrow/test_utils,arrow/csv,arrow/json,arrow/chrono-tz,arrow/prettyprint,arrow-schema/ffi,parquet/arrow,parquet/async,parquet/test_common,parquet/experimental,parquet/object_store
63+
64+
jobs:
65+
setup:
66+
# Run only if at least one `bench:*` label is currently attached.
67+
# The toJSON serialization wraps each label name in double quotes,
68+
# so searching for `"bench:` matches only at the start of a label
69+
# name.
70+
if: contains(toJSON(github.event.pull_request.labels.*.name), '"bench:')
71+
name: Generate bench matrix
72+
runs-on: ubuntu-latest
73+
outputs:
74+
matrix: ${{ steps.gen.outputs.matrix }}
75+
scope: ${{ steps.gen.outputs.scope }}
76+
steps:
77+
- uses: actions/checkout@v6
78+
79+
- name: Resolve crates from labels and emit per-bench-target matrix
80+
id: gen
81+
env:
82+
LABELS: ${{ toJSON(github.event.pull_request.labels.*.name) }}
83+
# Keep this list in sync with codspeed.yml — bench targets that
84+
# currently panic/error at runtime and should not be benched
85+
# until fixed in their respective crates.
86+
EXCLUDED_BENCHES: |
87+
arrow merge_kernels
88+
arrow buffer_bit_ops
89+
arrow buffer_create
90+
arrow sort_kernel
91+
arrow string_run_builder
92+
arrow primitive_run_accessor
93+
arrow-array union_array
94+
arrow-cast parse_date
95+
parquet row_selection_cursor
96+
parquet-variant-compute variant_kernels
97+
run: |
98+
all_crates="arrow arrow-array arrow-avro arrow-buffer arrow-cast arrow-ipc arrow-json arrow-schema parquet parquet-variant parquet-variant-compute"
99+
100+
suffixes=$(jq -r '.[] | select(startswith("bench:")) | sub("^bench:"; "")' <<<"$LABELS")
101+
102+
if echo "$suffixes" | grep -qx "all"; then
103+
selected_crates="$all_crates"
104+
scope="full workspace (bench:all)"
105+
else
106+
for pkg in $suffixes; do
107+
if ! [[ "$pkg" =~ ^[a-z][a-z0-9_-]*$ ]]; then
108+
echo "::error::Invalid bench label suffix 'bench:$pkg'"
109+
exit 1
110+
fi
111+
done
112+
selected_crates="$(echo $suffixes | tr '\n' ' ')"
113+
scope="$selected_crates"
114+
fi
115+
116+
{
117+
for crate in $selected_crates; do
118+
if [ ! -f "$crate/Cargo.toml" ]; then
119+
echo "::warning::No Cargo.toml found for '$crate' (bench:$crate); skipping"
120+
continue
121+
fi
122+
awk -v crate="$crate" '
123+
/^\[\[bench\]\]/ { in_bench=1; next }
124+
/^\[/ { in_bench=0 }
125+
in_bench && /^name = / {
126+
sub(/^name = "/, ""); sub(/"$/, "");
127+
printf "%s %s\n", crate, $0
128+
}
129+
' "$crate/Cargo.toml"
130+
done
131+
} | grep -vxF -f <(printf '%s\n' "$EXCLUDED_BENCHES" | sed '/^$/d') \
132+
| jq -Rcs 'split("\n") | map(select(length>0) | split(" ") | {crate: .[0], bench: .[1]})' > matrix.json
133+
134+
echo "matrix=$(cat matrix.json)" >> "$GITHUB_OUTPUT"
135+
echo "scope=$scope" >> "$GITHUB_OUTPUT"
136+
echo "::notice::Scope: $scope ($(jq length matrix.json) bench shards after excluding known-broken targets)"
137+
138+
build:
139+
# Gate on the same label condition as setup so we don't build when
140+
# there are no benches to run.
141+
if: contains(toJSON(github.event.pull_request.labels.*.name), '"bench:')
142+
name: Build workspace benchmarks
143+
runs-on: ubuntu-latest
144+
timeout-minutes: 60
145+
steps:
146+
- uses: actions/checkout@v6
147+
with:
148+
submodules: true
149+
150+
- name: Install protoc
151+
run: sudo apt-get update && sudo apt-get install -y protobuf-compiler
152+
153+
- name: Setup Rust toolchain, cache and cargo-codspeed
154+
uses: moonrepo/setup-rust@v1
155+
with:
156+
channel: stable
157+
cache-target: release
158+
bins: cargo-codspeed
159+
160+
- name: Build benchmarks
161+
run: cargo codspeed build --workspace --features "$CODSPEED_FEATURES"
162+
163+
- name: Pack bench binaries into a tarball
164+
# actions/upload-artifact does not preserve Unix executable
165+
# bits, so bench binaries downloaded by shards would otherwise
166+
# land as 644 and fail with EACCES under `cargo codspeed run`.
167+
run: tar -cf codspeed-binaries.tar -C target codspeed
168+
169+
- name: Upload built bench binaries
170+
uses: actions/upload-artifact@v4
171+
with:
172+
name: codspeed-binaries
173+
path: codspeed-binaries.tar
174+
retention-days: 1
175+
if-no-files-found: error
176+
177+
bench:
178+
needs: [setup, build]
179+
name: ${{ matrix.config.crate }} / ${{ matrix.config.bench }}
180+
runs-on: ubuntu-latest
181+
timeout-minutes: 30
182+
strategy:
183+
fail-fast: false
184+
matrix:
185+
config: ${{ fromJson(needs.setup.outputs.matrix) }}
186+
steps:
187+
- uses: actions/checkout@v6
188+
with:
189+
submodules: true
190+
191+
- name: Install cargo-codspeed
192+
uses: moonrepo/setup-rust@v1
193+
with:
194+
channel: stable
195+
bins: cargo-codspeed
196+
197+
- name: Download built bench binaries
198+
uses: actions/download-artifact@v4
199+
with:
200+
name: codspeed-binaries
201+
path: .
202+
203+
- name: Unpack bench binaries (preserves executable bits)
204+
run: |
205+
mkdir -p target
206+
tar -xf codspeed-binaries.tar -C target
207+
208+
- name: Run single bench target
209+
uses: CodSpeedHQ/action@v4
210+
with:
211+
mode: simulation
212+
run: cargo codspeed run -p ${{ matrix.config.crate }} --bench ${{ matrix.config.bench }}

0 commit comments

Comments
 (0)