ci: warn (don't fail) on Bencher main regression (#3168)

justin808 · claude · web-flow · commit c331ad87c078 · 2026-04-19T17:51:03.000-10:00
## Summary - Downgrade the "fail on main regression" step in the benchmark workflow to a `::warning::` annotation so main stops being permanently red. - Keep everything else intact: Bencher dashboard upload, auto-opened tracking issue #3116, and the per-run report in the job summary still run. ## Why The gate has been failing nearly every non-docs push to `main` for weeks. Investigation of the Bencher alert history makes clear it's runner noise, not a real regression: - The failing run cited in the issue ([run 24521366756](https://github.com/shakacode/react_on_rails/actions/runs/24521366756/job/71679754726) on `4eb83648b`) fired **43 alerts** across Core and Pro, spanning routes unrelated to the committed change (which only edits `packages/react-on-rails/src/pageLifecycle.ts`). - Comparing alert sets across six consecutive main runs, **Jaccard similarity between adjacent runs is 0.01–0.08** — i.e., each run flags a near-disjoint random subset of `(benchmark, measure)` pairs. A real regression would produce persistent, overlapping alerts. - Docs-only commits "pass" simply because they skip benchmarks via `paths-ignore`. Every run that actually exercises the suite trips the t-test on at least one metric. Single-run, 95%-CI t-test alerts on GitHub-hosted runners + a 70-minute suite will always produce tail-latency noise. The right long-term fix is threshold/sample-size tuning or self-hosted runners, tracked in #3169. ## Change `.github/workflows/benchmark.yml` step 7c: replace `exit ${BENCHER_EXIT_CODE:-1}` with a GitHub Actions warning annotation. Steps 7a (Bencher report) and 7b (regression issue) are untouched. ## Follow-ups (not in this PR) - Tuning and re-enabling the hard gate: **#3169**. - Close #3116 once #3169's work lands. - Reassess after #3148 merges (Bencher baseline reporting fix). ## Test plan - [ ] CI passes on this PR (the change is workflow-only). - [ ] After merge, next push to `main` shows a Bencher warning annotation instead of a red check when alerts fire.  --- > [!NOTE] > **Medium Risk** > Changes `main` benchmark gating behavior by no longer failing the workflow on performance regressions, which could let real regressions merge if reviewers rely on CI status. Adds new stderr-based classification logic that could mis-detect alerts and alter when the workflow fails vs warns. > > **Overview** > **Benchmark CI on `main` no longer hard-fails on Bencher regression alerts.** The workflow now emits a `::warning::` for regressions, while continuing to open/update the regression tracking issue. > > Adds `BENCHER_HAS_ALERT` detection by parsing Bencher stderr and gates downstream steps on it: regression-related actions run only when an actual alert is detected, and a new step explicitly fails `main` only for *non-regression* Bencher failures (auth/API/network/CLI). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 12fa8f2. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>   ## Summary by CodeRabbit ## Release Notes * **Improvements** * Benchmarks now detect and flag regression alerts separately from general failures. * Main-branch workflow creates issues and emits warnings only for confirmed regression alerts. * CI now fails only for non-regression operational errors, reducing false positives and improving reliability.  --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
@@ -157,7 +157,12 @@ jobs:
       # ============================================
 
       - name: Install Bencher CLI
-        uses: bencherdev/bencher@main
+        # Pinned: step 7a's alert-detection heuristic greps stderr for
+        # "Alert[s]" / "threshold violation" / "boundary violation". Those strings
+        # are not a documented API contract, so upgrading the CLI requires
+        # re-verifying that the expected phrases still appear in alert output.
+        # Expected stderr phrases at v0.6.2: "🚨 N Alerts", "Alert detected".
+        uses: bencherdev/bencher@v0.6.2
 
       - name: Add tools directory to PATH
         run: |
@@ -650,11 +655,27 @@ jobs:
             echo "::warning::Bencher baseline not found for start-point hash '$START_POINT_HASH' — regression comparison unavailable for this run"
             BENCHER_EXIT_CODE=0
           fi
-          rm -f "$BENCHER_STDERR"
 
-          # Export exit code early so downstream alerting steps (7b/7c) always see it,
-          # even if the post-processing below (step summary, PR comments) fails.
+          # Distinguish regression alerts from operational failures (auth/API/network/CLI)
+          # so that main can warn-only on the former while still failing hard on the latter.
+          # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
+          # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
+          # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
+          # so the case-insensitive flag adds robustness against future CLI wording changes
+          # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
+          BENCHER_HAS_ALERT=0
+          if [ $BENCHER_EXIT_CODE -ne 0 ] && \
+             { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
+               grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
+            BENCHER_HAS_ALERT=1
+          fi
+          # Cleanup is also handled by the `trap 'rm -f "$BENCHER_STDERR"' EXIT` above,
+          # so we don't need an explicit rm here.
+
+          # Export exit code and alert flag early so downstream steps (7b/7c/7d) always
+          # see them, even if the post-processing below (step summary, PR comments) fails.
           echo "BENCHER_EXIT_CODE=$BENCHER_EXIT_CODE" >> "$GITHUB_ENV"
+          echo "BENCHER_HAS_ALERT=$BENCHER_HAS_ALERT" >> "$GITHUB_ENV"
 
           # Post report to job summary and PR comment(s) if there's HTML output
           if [ -s bench_results/bencher_report.html ]; then
@@ -703,7 +724,7 @@ jobs:
       # STEP 7b: ALERT ON MAIN REGRESSION
       # ============================================
       - name: Create GitHub Issue for main regression
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT == '1'
         env:
           GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         run: |
@@ -789,14 +810,35 @@ jobs:
           fi
 
       # ============================================
-      # STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION
+      # STEP 7c: WARN ON MAIN REGRESSION
+      # ============================================
+      # Regressions are surfaced via the Bencher dashboard, the auto-opened tracking
+      # issue (Step 7b), and the job-summary report. Do not fail the workflow on regression
+      # alerts: single-run alerts on GitHub-hosted runners are dominated by environmental
+      # noise (alert sets across consecutive runs have ~0 overlap), so blocking main
+      # produces churn without signal. Re-enable a hard gate once thresholds/sample-size
+      # tolerate that noise. Operational errors (auth/API/network/CLI) are handled in
+      # step 7d and still fail the workflow.
+      - name: Warn if Bencher detected regression on main
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT == '1'
+        env:
+          RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+        run: |
+          echo "::warning::Bencher flagged a regression on main (exit ${BENCHER_EXIT_CODE:-1}). See the open regression issue (label: performance-regression), the Bencher dashboard, and the workflow run: ${RUN_URL}"
+
+      # ============================================
+      # STEP 7d: FAIL ON NON-REGRESSION BENCHER ERRORS
       # ============================================
-      # Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label).
-      # Regressions on PRs are surfaced via Bencher report comments, not workflow failures.
-      - name: Fail workflow if Bencher detected regression on main
-        if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
+      # If bencher exited non-zero but stderr did not contain regression alert
+      # indicators, this is an operational failure (auth/API/network/CLI) rather than
+      # a performance signal. These should still fail the workflow on main so a broken
+      # benchmark pipeline (missing uploads, bad credentials, Bencher outage, etc.) is
+      # not silently hidden behind a warning annotation.
+      - name: Fail on non-regression Bencher error on main
+        if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT != '1'
         run: |
-          echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})"
+          echo "::error::Bencher exited ${BENCHER_EXIT_CODE:-1} on main with no regression alert in stderr — this indicates an operational failure (auth/API/network/CLI), not a performance regression. Check the logs above."
+          # Preserve the original Bencher exit code in CI logs for diagnostic context.
           exit "${BENCHER_EXIT_CODE:-1}"
 
       # ============================================