Skip to content

Commit c331ad8

Browse files
justin808claude
andauthored
ci: warn (don't fail) on Bencher main regression (#3168)
## Summary - Downgrade the "fail on main regression" step in the benchmark workflow to a `::warning::` annotation so main stops being permanently red. - Keep everything else intact: Bencher dashboard upload, auto-opened tracking issue #3116, and the per-run report in the job summary still run. ## Why The gate has been failing nearly every non-docs push to `main` for weeks. Investigation of the Bencher alert history makes clear it's runner noise, not a real regression: - The failing run cited in the issue ([run 24521366756](https://github.com/shakacode/react_on_rails/actions/runs/24521366756/job/71679754726) on `4eb83648b`) fired **43 alerts** across Core and Pro, spanning routes unrelated to the committed change (which only edits `packages/react-on-rails/src/pageLifecycle.ts`). - Comparing alert sets across six consecutive main runs, **Jaccard similarity between adjacent runs is 0.01–0.08** — i.e., each run flags a near-disjoint random subset of `(benchmark, measure)` pairs. A real regression would produce persistent, overlapping alerts. - Docs-only commits "pass" simply because they skip benchmarks via `paths-ignore`. Every run that actually exercises the suite trips the t-test on at least one metric. Single-run, 95%-CI t-test alerts on GitHub-hosted runners + a 70-minute suite will always produce tail-latency noise. The right long-term fix is threshold/sample-size tuning or self-hosted runners, tracked in #3169. ## Change `.github/workflows/benchmark.yml` step 7c: replace `exit ${BENCHER_EXIT_CODE:-1}` with a GitHub Actions warning annotation. Steps 7a (Bencher report) and 7b (regression issue) are untouched. ## Follow-ups (not in this PR) - Tuning and re-enabling the hard gate: **#3169**. - Close #3116 once #3169's work lands. - Reassess after #3148 merges (Bencher baseline reporting fix). ## Test plan - [ ] CI passes on this PR (the change is workflow-only). - [ ] After merge, next push to `main` shows a Bencher warning annotation instead of a red check when alerts fire. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes `main` benchmark gating behavior by no longer failing the workflow on performance regressions, which could let real regressions merge if reviewers rely on CI status. Adds new stderr-based classification logic that could mis-detect alerts and alter when the workflow fails vs warns. > > **Overview** > **Benchmark CI on `main` no longer hard-fails on Bencher regression alerts.** The workflow now emits a `::warning::` for regressions, while continuing to open/update the regression tracking issue. > > Adds `BENCHER_HAS_ALERT` detection by parsing Bencher stderr and gates downstream steps on it: regression-related actions run only when an actual alert is detected, and a new step explicitly fails `main` only for *non-regression* Bencher failures (auth/API/network/CLI). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 12fa8f2. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Improvements** * Benchmarks now detect and flag regression alerts separately from general failures. * Main-branch workflow creates issues and emits warnings only for confirmed regression alerts. * CI now fails only for non-regression operational errors, reducing false positives and improving reliability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 23c1234 commit c331ad8

1 file changed

Lines changed: 53 additions & 11 deletions

File tree

.github/workflows/benchmark.yml

Lines changed: 53 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,12 @@ jobs:
157157
# ============================================
158158

159159
- name: Install Bencher CLI
160-
uses: bencherdev/bencher@main
160+
# Pinned: step 7a's alert-detection heuristic greps stderr for
161+
# "Alert[s]" / "threshold violation" / "boundary violation". Those strings
162+
# are not a documented API contract, so upgrading the CLI requires
163+
# re-verifying that the expected phrases still appear in alert output.
164+
# Expected stderr phrases at v0.6.2: "🚨 N Alerts", "Alert detected".
165+
uses: bencherdev/bencher@v0.6.2
161166

162167
- name: Add tools directory to PATH
163168
run: |
@@ -650,11 +655,27 @@ jobs:
650655
echo "::warning::Bencher baseline not found for start-point hash '$START_POINT_HASH' — regression comparison unavailable for this run"
651656
BENCHER_EXIT_CODE=0
652657
fi
653-
rm -f "$BENCHER_STDERR"
654658
655-
# Export exit code early so downstream alerting steps (7b/7c) always see it,
656-
# even if the post-processing below (step summary, PR comments) fails.
659+
# Distinguish regression alerts from operational failures (auth/API/network/CLI)
660+
# so that main can warn-only on the former while still failing hard on the latter.
661+
# Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
662+
# a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
663+
# prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
664+
# so the case-insensitive flag adds robustness against future CLI wording changes
665+
# (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
666+
BENCHER_HAS_ALERT=0
667+
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
668+
{ grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
669+
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
670+
BENCHER_HAS_ALERT=1
671+
fi
672+
# Cleanup is also handled by the `trap 'rm -f "$BENCHER_STDERR"' EXIT` above,
673+
# so we don't need an explicit rm here.
674+
675+
# Export exit code and alert flag early so downstream steps (7b/7c/7d) always
676+
# see them, even if the post-processing below (step summary, PR comments) fails.
657677
echo "BENCHER_EXIT_CODE=$BENCHER_EXIT_CODE" >> "$GITHUB_ENV"
678+
echo "BENCHER_HAS_ALERT=$BENCHER_HAS_ALERT" >> "$GITHUB_ENV"
658679
659680
# Post report to job summary and PR comment(s) if there's HTML output
660681
if [ -s bench_results/bencher_report.html ]; then
@@ -703,7 +724,7 @@ jobs:
703724
# STEP 7b: ALERT ON MAIN REGRESSION
704725
# ============================================
705726
- name: Create GitHub Issue for main regression
706-
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
727+
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT == '1'
707728
env:
708729
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
709730
run: |
@@ -789,14 +810,35 @@ jobs:
789810
fi
790811
791812
# ============================================
792-
# STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION
813+
# STEP 7c: WARN ON MAIN REGRESSION
814+
# ============================================
815+
# Regressions are surfaced via the Bencher dashboard, the auto-opened tracking
816+
# issue (Step 7b), and the job-summary report. Do not fail the workflow on regression
817+
# alerts: single-run alerts on GitHub-hosted runners are dominated by environmental
818+
# noise (alert sets across consecutive runs have ~0 overlap), so blocking main
819+
# produces churn without signal. Re-enable a hard gate once thresholds/sample-size
820+
# tolerate that noise. Operational errors (auth/API/network/CLI) are handled in
821+
# step 7d and still fail the workflow.
822+
- name: Warn if Bencher detected regression on main
823+
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT == '1'
824+
env:
825+
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
826+
run: |
827+
echo "::warning::Bencher flagged a regression on main (exit ${BENCHER_EXIT_CODE:-1}). See the open regression issue (label: performance-regression), the Bencher dashboard, and the workflow run: ${RUN_URL}"
828+
829+
# ============================================
830+
# STEP 7d: FAIL ON NON-REGRESSION BENCHER ERRORS
793831
# ============================================
794-
# Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label).
795-
# Regressions on PRs are surfaced via Bencher report comments, not workflow failures.
796-
- name: Fail workflow if Bencher detected regression on main
797-
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
832+
# If bencher exited non-zero but stderr did not contain regression alert
833+
# indicators, this is an operational failure (auth/API/network/CLI) rather than
834+
# a performance signal. These should still fail the workflow on main so a broken
835+
# benchmark pipeline (missing uploads, bad credentials, Bencher outage, etc.) is
836+
# not silently hidden behind a warning annotation.
837+
- name: Fail on non-regression Bencher error on main
838+
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT != '1'
798839
run: |
799-
echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})"
840+
echo "::error::Bencher exited ${BENCHER_EXIT_CODE:-1} on main with no regression alert in stderr — this indicates an operational failure (auth/API/network/CLI), not a performance regression. Check the logs above."
841+
# Preserve the original Bencher exit code in CI logs for diagnostic context.
800842
exit "${BENCHER_EXIT_CODE:-1}"
801843
802844
# ============================================

0 commit comments

Comments
 (0)