Stop running benchmarks on non-performance PRs; add full-ci-no-benchmarks label#3950
Conversation
…arks label Audit of the last two days of PRs found benchmark suites (30+ min each, sometimes producing garbage Bencher data) running on PRs that cannot move runtime performance. Three distinct root causes, plus a missing manual override: 1. Base-SHA drift x merge-ref checkout (benchmark.yml). detect-changes checked out the pull/N/merge ref (PR merged into CURRENT main) but diffed against the PR's stale recorded base.sha. When base.sha lagged main, the diff swept in every change merged in between, miscategorizing CI-only PRs and firing all suites (#3919: a bin/-only PR ran all four suites via 8 intervening commits). Fix: check out the PR head SHA so detection sees only the PR's own commits. 2. Uncategorized catch-all force-ran every benchmark "for safety" (#3855, #3915, #3785). Unknown files are almost always tooling/config, not hot runtime paths. Fix: the catch-all still runs the full TEST suite but sets no BENCH_* flag. 3. Spec-only changes triggered core+pro benchmarks (#3854). Specs are not shipped and cannot move runtime performance. Fix: the rspec / pro-rspec arms no longer set BENCH_* flags. The benchmark label still forces a run for a genuinely new runtime path, and every push to main benchmarks unconditionally, so main remains the regression backstop for anything suppressed on a PR. New full-ci-no-benchmarks label (generate_matrix.rb) hard-suppresses all benchmark suites for a PR, overriding change detection and the benchmark label. This is the "full CI but no benchmarks" escape hatch for the operator-error case where full-ci + benchmark were both applied to a lint/config PR (#3841, #3836). Tests: new regression coverage in ci-changes-detector-test.bash (uncategorized and spec-only -> no benchmarks) and benchmark_matrix_spec.rb (suppression label). AGENTS.md documents the new label. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughThis PR implements benchmark suppression for non-source changes and adds a PR label escape hatch. It refines change detection to prevent benchmark triggering for specs and uncategorized files, fixes the workflow's change detection accuracy by using the PR head SHA, and introduces the ChangesBenchmark Suppression Strategy
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df65e4ede3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # suites because eight intervening main commits leaked into the diff). | ||
| # Check out the PR head SHA instead so change detection sees only the | ||
| # PR's own commits. On push, this falls back to the pushed SHA. | ||
| ref: ${{ github.event.pull_request.head.sha || github.sha }} |
There was a problem hiding this comment.
Keep fork PR history reachable for diff detection
On fork PRs with more than 50 commits since the base, checking out only pull_request.head.sha leaves the PR side shallow and not deepen-able by script/lib/git-diff-base: that helper tries to deepen the current side by fetching GITHUB_HEAD_REF from origin, but in this workflow origin is the base repository, where fork branches usually do not exist. In that scenario script/ci-changes-detector cannot find a merge base and the benchmark workflow fails before it can emit the empty fork matrix, even though fork PRs are supposed to skip benchmark suites.
Useful? React with 👍 / 👎.
Code ReviewOverview: This PR correctly addresses a real CI waste problem — 30+ minute benchmark suites running on PRs that cannot affect runtime performance. The root-cause analysis in the PR description is thorough and backed by concrete examples. The fixes are sound and well-tested. Strengths
Issues / Questions1. Fork PR security note (low risk, worth acknowledging)Checking out 2. Missing
|
| # suites because eight intervening main commits leaked into the diff). | ||
| # Check out the PR head SHA instead so change detection sees only the | ||
| # PR's own commits. On push, this falls back to the pushed SHA. | ||
| ref: ${{ github.event.pull_request.head.sha || github.sha }} |
There was a problem hiding this comment.
The existing comment explains the SHA drift problem well, but it could also note that checking out the fork's head SHA is safe because the benchmark jobs themselves are gated by the BENCHMARK_PULL_REQUEST_HEAD_REPO == GITHUB_REPOSITORY check in generate_matrix.rb — so even if a fork's modified ci-changes-detector emits run_core_benchmarks=true, the matrix script drops it. A one-liner here would save the next reviewer from tracing that chain:
| ref: ${{ github.event.pull_request.head.sha || github.sha }} | |
| ref: ${{ github.event.pull_request.head.sha || github.sha }} |
Consider appending to the comment block above: "Fork safety: even if a fork modifies script/ci-changes-detector, generate_matrix.rb gates benchmark suites on BENCHMARK_PULL_REQUEST_HEAD_REPO == GITHUB_REPOSITORY, so a fork can never enable suites — only disable them via full-ci-no-benchmarks."
| setup_repo | ||
| write_file_change "react_on_rails/spec/react_on_rails/some_bench_spec.rb" | ||
|
|
||
| local out |
There was a problem hiding this comment.
This test doesn't assert the non_runtime_only flag, unlike the uncategorized-file test above which asserts "non_runtime_only": false. It's worth adding an assertion here to document the expected value — if spec-only changes correctly produce non_runtime_only: true (specs are not shipped, so the PR is runtime-non-affecting), that should be captured:
| local out | |
| assert_contains "$out" '"run_ruby_tests": true' "rspec-bench output" | |
| assert_contains "$out" '"non_runtime_only": true' "rspec-bench output" | |
| assert_contains "$out" '"run_core_benchmarks": false' "rspec-bench output" |
(Adjust true/false to match the actual detector output if it differs.)
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/benchmark.yml (1)
55-56:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInclude
unlabeledin the PR trigger types.Right now the workflow reevaluates benchmark selection when a label is added, but not when one is removed. That leaves the new
full-ci-no-benchmarksescape hatch sticky: if a maintainer removes that label to re-enable benchmarks, this workflow will not rerun until another push/reopen. The same stale-state problem applies to removingbenchmarklabels.Suggested fix
pull_request: - types: [opened, synchronize, reopened, labeled] + types: [opened, synchronize, reopened, labeled, unlabeled]As per coding guidelines, workflow reviews must inspect trigger changes, and this trigger set currently misses the label-removal path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmark.yml around lines 55 - 56, The pull_request trigger in the workflow is missing the "unlabeled" event so label removals don't retrigger the job; update the pull_request.types array in the benchmark.yml workflow (the pull_request trigger) to include "unlabeled" alongside opened, synchronize, reopened, and labeled so the workflow re-evaluates when a label is removed.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/benchmark.yml:
- Around line 55-56: The pull_request trigger in the workflow is missing the
"unlabeled" event so label removals don't retrigger the job; update the
pull_request.types array in the benchmark.yml workflow (the pull_request
trigger) to include "unlabeled" alongside opened, synchronize, reopened, and
labeled so the workflow re-evaluates when a label is removed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 83fff31c-a317-4668-8f1f-64e3962df4a8
📒 Files selected for processing (6)
.github/workflows/benchmark.ymlAGENTS.mdbenchmarks/generate_matrix.rbbenchmarks/spec/benchmark_matrix_spec.rbscript/ci-changes-detectorscript/ci-changes-detector-test.bash
Greptile SummaryThis PR fixes three root causes of spurious benchmark CI runs — base-SHA drift on merge-commit checkout, spec-only changes triggering benchmark suites, and uncategorized files force-running all benchmarks — and adds a
Confidence Score: 4/5Safe to merge; all benchmark-suppression paths are well-tested and the main-push backstop is untouched. The three changed behavioral paths (head-SHA checkout, spec-arm no-bench, catch-all no-bench) each have dedicated regression tests. The only open edge case is the fetch-depth: 50 window on very long PR branches — git_diff_base_resolve strict would hard-fail rather than silently skip, so the failure mode is loud. .github/workflows/benchmark.yml — the fetch-depth: 50 + head-SHA checkout interaction on deep PR branches. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[PR push / label event] --> B[detect-changes job checkout PR HEAD SHA]
B --> C[ci-changes-detector git diff base.sha..HEAD]
C --> D{File category?}
D -->|lib/app source| E[Set BENCH_* flags]
D -->|spec only| F[Set RSPEC_CHANGED no BENCH_* flags]
D -->|CI infra / script| G[Set CI_INFRA_CHANGED no BENCH_* flags]
D -->|uncategorized| H[Set UNCATEGORIZED_CHANGED no BENCH_* flags]
E & F & G & H --> I[generate_matrix.rb]
I --> J{BENCHMARK_NON_RUNTIME_ONLY OR full-ci-no-benchmarks label?}
J -->|Yes| K[Empty matrix suite_id = none]
J -->|No| L{suite_enabled? check fork + run_output + labels}
L -->|benchmark label| M[Force suite ON]
L -->|no matching signal| N[Suite skipped]
M & N --> O[benchmark job gated by run_benchmark_suites]
K --> P[benchmark job skipped sentinel]
|
| ref: ${{ github.event.pull_request.head.sha || github.sha }} | ||
| fetch-depth: 50 |
There was a problem hiding this comment.
Shallow-clone depth may not reach base SHA after ref switch
With fetch-depth: 50 and ref set to the PR head SHA, actions/checkout fetches exactly 50 commits from the PR HEAD's ancestry. The base SHA (github.event.pull_request.base.sha) is an ancestor of the PR HEAD, so it is normally within those 50 commits — but a PR branch with 50+ of its own commits would push the fork point outside the fetched window. In that case git_diff_base_resolve strict exits 1 and the detect-changes job fails, gating all benchmark jobs, rather than silently producing an empty diff. This is the safe-failure direction, but could be a surprise on long-lived squash-heavy feature branches. Consider bumping fetch-depth to a higher value (e.g. 100 or 0 for full history) if PR branches regularly exceed 50 commits.
…wind-generator * origin/main: Add preload link helper for generated component packs (#3935) Add experimental View Transitions recipe, Turbo interplay doc, flagged dummy demo (#3947) Stop running benchmarks on non-performance PRs; add full-ci-no-benchmarks label (#3950) Add doctor FORMAT=json machine-readable output with stable schema (#3948) Docs: add Inertia Rails to React on Rails migration guide v1 (#3946) Add stable SmartError codes and generated error reference (#3936) # Conflicts: # CHANGELOG.md
Problem
An audit of benchmark CI runs on the last ~2 days of PRs found that benchmark suites (30+ minutes each, sometimes producing garbage Bencher data — e.g. #3919's Pro suite logged
✗ 0/4726successful requests) ran on PRs that cannot move runtime performance. The system never under-runs benchmarks; every defect is wasteful over-running.Verified examples (actual workflow runs, not just rollups):
bin/ci-rerun-failuresonly.ci-dependency-versions+ bin + specscreate-react-on-rails-app(CLI scaffolder).tool-versions+ specsbenchmarklabel applied)Root causes (all confirmed in CI logs)
detect-changesinbenchmark.ymlchecks out thepull/N/mergeref (PR merged into current main) but diffs against the PR's stale recordedbase.sha. Whenbase.shalags main,git diff stale-base..merge-refsweeps in everything merged in between → CI-only PRs get miscategorized → all suites fire. (Fix CI rerun mapping for split dummy app tests (#3909) #3919: abin/-only PR ran all four suites because 8 intervening main commits leaked into the diff.)Plus an operator-error class (#3841, #3836):
full-ci+benchmarkwere both applied to lint-config PRs, and there was no way to say "full CI but no benchmarks."Fixes
benchmark.yml: check out the PR head SHA indetect-changesso change detection sees only the PR's own commits (not main's intervening commits). Scoped to the benchmark workflow — the test workflows share the pattern but only over-run tests (safe direction).script/ci-changes-detector: the uncategorized catch-all still runs the full test suite but no longer sets anyBENCH_*flag.script/ci-changes-detector: therspec/pro-rspecarms no longer setBENCH_*flags.full-ci-no-benchmarkslabel (benchmarks/generate_matrix.rb): hard-suppresses every benchmark suite for a PR, overriding both change detection and thebenchmarklabel. Pair withfull-cifor full test coverage without benchmarks.Safety net: the
benchmarklabel still forces a run for a genuinely new runtime path, and every push tomainbenchmarks unconditionally, so main remains the regression backstop for anything suppressed on a PR.Tests
script/ci-changes-detector-test.bash: +2 regression tests (uncategorized → no benchmarks; spec-only → no benchmarks). 47 run, 0 failed.benchmarks/spec/benchmark_matrix_spec.rb: +3 tests for the suppression label (overrides detection, beats thebenchmarklabel, honored from forks). Fullbenchmarks/specsuite: 271 examples, 0 failures.script/lib/git-diff-base-test.bash: 41/41 still pass. RuboCop clean.The
full-ci-no-benchmarkslabel has been created in the repo.🤖 Generated with Claude Code
Note
Medium Risk
Changes CI change-detection and benchmark gating; misclassification could skip benchmarks on a PR, though every push to main still runs benchmarks unconditionally as the backstop.
Overview
Stops 30+ minute benchmark suites from running on PRs that cannot affect runtime performance, while keeping broad tests and
mainas the performance backstop.benchmark.ymlnow checks out the PR head SHA indetect-changesso diffs reflect only the PR’s commits—not unrelated commits onmainthat leaked in when comparing a stalebase.shato the merge ref (#3919).script/ci-changes-detectornarrows when PR benchmarks are requested: spec-only gem/Pro spec edits no longer setBENCH_*; the uncategorized catch-all still runs the full test matrix but no longer forces all benchmark suites. Runtime source paths and thebenchmarklabel can still trigger suites on PRs.full-ci-no-benchmarksis added inbenchmarks/generate_matrix.rb(documented inAGENTS.md): it hard-suppresses every benchmark suite on a PR, overriding change detection and evenbenchmark; it is honored from forks because it only disables runs.Regression coverage:
ci-changes-detector-test.bash(+2) andbenchmark_matrix_spec.rb(+3).Reviewed by Cursor Bugbot for commit df65e4e. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
full-ci-no-benchmarksPR label option for comprehensive CI testing without benchmark suite execution.