Commit c331ad8
ci: warn (don't fail) on Bencher main regression (#3168)
## Summary
- Downgrade the "fail on main regression" step in the benchmark workflow
to a `::warning::` annotation so main stops being permanently red.
- Keep everything else intact: Bencher dashboard upload, auto-opened
tracking issue #3116, and the per-run report in the job summary still
run.
## Why
The gate has been failing nearly every non-docs push to `main` for
weeks. Investigation of the Bencher alert history makes clear it's
runner noise, not a real regression:
- The failing run cited in the issue ([run
24521366756](https://github.com/shakacode/react_on_rails/actions/runs/24521366756/job/71679754726)
on `4eb83648b`) fired **43 alerts** across Core and Pro, spanning routes
unrelated to the committed change (which only edits
`packages/react-on-rails/src/pageLifecycle.ts`).
- Comparing alert sets across six consecutive main runs, **Jaccard
similarity between adjacent runs is 0.01–0.08** — i.e., each run flags a
near-disjoint random subset of `(benchmark, measure)` pairs. A real
regression would produce persistent, overlapping alerts.
- Docs-only commits "pass" simply because they skip benchmarks via
`paths-ignore`. Every run that actually exercises the suite trips the
t-test on at least one metric.
Single-run, 95%-CI t-test alerts on GitHub-hosted runners + a 70-minute
suite will always produce tail-latency noise. The right long-term fix is
threshold/sample-size tuning or self-hosted runners, tracked in #3169.
## Change
`.github/workflows/benchmark.yml` step 7c: replace `exit
${BENCHER_EXIT_CODE:-1}` with a GitHub Actions warning annotation. Steps
7a (Bencher report) and 7b (regression issue) are untouched.
## Follow-ups (not in this PR)
- Tuning and re-enabling the hard gate: **#3169**.
- Close #3116 once #3169's work lands.
- Reassess after #3148 merges (Bencher baseline reporting fix).
## Test plan
- [ ] CI passes on this PR (the change is workflow-only).
- [ ] After merge, next push to `main` shows a Bencher warning
annotation instead of a red check when alerts fire.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes `main` benchmark gating behavior by no longer failing the
workflow on performance regressions, which could let real regressions
merge if reviewers rely on CI status. Adds new stderr-based
classification logic that could mis-detect alerts and alter when the
workflow fails vs warns.
>
> **Overview**
> **Benchmark CI on `main` no longer hard-fails on Bencher regression
alerts.** The workflow now emits a `::warning::` for regressions, while
continuing to open/update the regression tracking issue.
>
> Adds `BENCHER_HAS_ALERT` detection by parsing Bencher stderr and gates
downstream steps on it: regression-related actions run only when an
actual alert is detected, and a new step explicitly fails `main` only
for *non-regression* Bencher failures (auth/API/network/CLI).
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
12fa8f2. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Release Notes
* **Improvements**
* Benchmarks now detect and flag regression alerts separately from
general failures.
* Main-branch workflow creates issues and emits warnings only for
confirmed regression alerts.
* CI now fails only for non-regression operational errors, reducing
false positives and improving reliability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 23c1234 commit c331ad8
1 file changed
Lines changed: 53 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
161 | 166 | | |
162 | 167 | | |
163 | 168 | | |
| |||
650 | 655 | | |
651 | 656 | | |
652 | 657 | | |
653 | | - | |
654 | 658 | | |
655 | | - | |
656 | | - | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
657 | 677 | | |
| 678 | + | |
658 | 679 | | |
659 | 680 | | |
660 | 681 | | |
| |||
703 | 724 | | |
704 | 725 | | |
705 | 726 | | |
706 | | - | |
| 727 | + | |
707 | 728 | | |
708 | 729 | | |
709 | 730 | | |
| |||
789 | 810 | | |
790 | 811 | | |
791 | 812 | | |
792 | | - | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
793 | 831 | | |
794 | | - | |
795 | | - | |
796 | | - | |
797 | | - | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
798 | 839 | | |
799 | | - | |
| 840 | + | |
| 841 | + | |
800 | 842 | | |
801 | 843 | | |
802 | 844 | | |
| |||
0 commit comments