Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 53 additions & 11 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,12 @@ jobs:
# ============================================

- name: Install Bencher CLI
uses: bencherdev/bencher@main
# Pinned: step 7a's alert-detection heuristic greps stderr for
# "Alert[s]" / "threshold violation" / "boundary violation". Those strings
# are not a documented API contract, so upgrading the CLI requires
# re-verifying that the expected phrases still appear in alert output.
# Expected stderr phrases at v0.6.2: "🚨 N Alerts", "Alert detected".
uses: bencherdev/bencher@v0.6.2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pin to a mutable tag (@v0.6.2) is a meaningful improvement over @main, but a tag can still be force-pushed by the upstream maintainer, silently swapping the binary that runs in your workflow.

For a stronger supply-chain guarantee, pin to the immutable commit SHA:

Suggested change
uses: bencherdev/bencher@v0.6.2
uses: bencherdev/bencher@v0.6.2 # TODO: pin to SHA once verified, e.g. bencherdev/bencher@<sha>

Run git ls-remote https://github.com/bencherdev/bencher refs/tags/v0.6.2 to get the SHA for the tag, then use bencherdev/bencher@<sha> (with # v0.6.2 as a comment for readability). This is a low-effort hardening for a step that runs with BENCHER_API_TOKEN.


- name: Add tools directory to PATH
run: |
Expand Down Expand Up @@ -650,11 +655,27 @@ jobs:
echo "::warning::Bencher baseline not found for start-point hash '$START_POINT_HASH' — regression comparison unavailable for this run"
BENCHER_EXIT_CODE=0
fi
rm -f "$BENCHER_STDERR"

# Export exit code early so downstream alerting steps (7b/7c) always see it,
# even if the post-processing below (step summary, PR comments) fails.
# Distinguish regression alerts from operational failures (auth/API/network/CLI)
# so that main can warn-only on the former while still failing hard on the latter.
# Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
# a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
# prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
# so the case-insensitive flag adds robustness against future CLI wording changes
# (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
Comment on lines +661 to +665
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment's claim about \b is incorrect. \b is a word-boundary assertion between \w and \W. Since / is \W and a is \w, there is a word boundary at the start of alerts in /v0/.../alerts — so \bAlerts?\b would match alerts inside a URL path.

In practice this may not matter (Bencher probably doesn't emit raw URL paths like /v0/alerts in stderr), but the stated justification is wrong and could mislead future editors into over-trusting the heuristic.

Suggested correction:

Suggested change
# Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
# a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
# prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
# so the case-insensitive flag adds robustness against future CLI wording changes
# (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
# Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
# a word-bounded, case-insensitive "Alert[s]". Note: \b does NOT prevent matching
# "alerts" inside URL paths like "/v0/alerts" (/ is \W so a word boundary exists
# before "a"). False positives are unlikely in practice because Bencher does not
# emit raw API endpoint paths in its stderr alert output.

BENCHER_HAS_ALERT=0
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
{ grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
Comment on lines +668 to +669
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid classifying '/alerts' paths as regressions

The new BENCHER_HAS_ALERT gate can still misclassify operational Bencher failures as regressions, because grep -qiE "\bAlerts?\b" matches lowercase URL segments like /v0/.../alerts (word boundaries exist at / and end-of-token). When that happens on main, step 7d is skipped and the workflow only emits a warning instead of failing for auth/API/network errors. Fresh evidence in this commit is the added case-insensitive \bAlerts?\b matcher here, which reintroduces the false-positive path-match behavior.

Useful? React with 👍 / 👎.

BENCHER_HAS_ALERT=1
Comment on lines +666 to +670
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern inconsistency with the 404 check creates a misclassification gap.

The 404 guard above (line 645) uses case-insensitive alert to detect alerts:

! grep -qiE "alert|threshold violation|boundary violation" "$BENCHER_STDERR"

But this block uses case-sensitive \bAlerts?\b. If Bencher ever emits lowercase "alert detected" or "1 alert" (e.g. after a CLI version bump), the sequence would be:

  1. 404 guard: no 404 Not Found → skip → BENCHER_EXIT_CODE stays non-zero ✓
  2. This block: \bAlerts?\b does not match lowercase → BENCHER_HAS_ALERT stays 0
  3. Step 7d fires and fails the workflow with "operational failure" — a misclassified regression.

The case-sensitivity was chosen to avoid matching /v0/.../alerts URL paths, but the same goal is achievable with a word boundary and case-insensitive flag:

Suggested change
BENCHER_HAS_ALERT=0
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
{ grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
BENCHER_HAS_ALERT=1
BENCHER_HAS_ALERT=0
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
{ grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
BENCHER_HAS_ALERT=1

Word boundaries (\b) already prevent matching alerts inside URL path segments like /v0/orgs/foo/alerts, so the case-insensitive flag adds robustness without reintroducing the false-positive risk.

fi
Comment on lines +659 to +671
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

tmp="$(mktemp)"
trap 'rm -f "$tmp"' EXIT

printf '%s\n' 'GET https://api.bencher.dev/v0/projects/react-on-rails-t8a9ncxo/alerts failed: 500' > "$tmp"

echo "Current regex:"
if grep -qiE "\bAlerts?\b" "$tmp"; then
  echo "FAIL: case-insensitive regex matched lowercase /alerts URL path"
else
  echo "PASS: no URL-path match"
fi

echo "Case-sensitive regex:"
if grep -qE "\bAlerts?\b" "$tmp"; then
  echo "FAIL: case-sensitive regex still matched lowercase /alerts URL path"
else
  echo "PASS: lowercase /alerts URL path is not treated as an alert"
fi

printf '%s\n' '🚨 2 Alerts' 'Alert detected' > "$tmp"
grep -qE "\bAlerts?\b" "$tmp"
echo "PASS: pinned Bencher v0.6.2 alert phrases still match"

Repository: shakacode/react_on_rails

Length of output: 282


Make the Alerts? grep case-sensitive to prevent false positives from URL paths.

Line 668 uses -i flag, causing \bAlerts?\b to match lowercase /alerts in URL paths within error messages. This incorrectly classifies auth/API/network failures as performance regressions, skipping the hard-fail step 7d and downgrading to warn-only.

🐛 Proposed fix
-          # Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
-          # a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
-          # prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
-          # so the case-insensitive flag adds robustness against future CLI wording changes
-          # (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
+          # Match Bencher's pinned v0.6.2 alert output (e.g., "🚨 2 Alerts", "Alert detected")
+          # with a case-sensitive "Alert[s]" so lowercase URL paths such as "/v0/.../alerts"
+          # in operational errors are not classified as performance regressions.
           BENCHER_HAS_ALERT=0
           if [ $BENCHER_EXIT_CODE -ne 0 ] && \
-             { grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
+             { grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \
                grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
             BENCHER_HAS_ALERT=1
           fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Distinguish regression alerts from operational failures (auth/API/network/CLI)
# so that main can warn-only on the former while still failing hard on the latter.
# Match Bencher's actual alert output (e.g., "🚨 2 Alerts", "Alert detected") via
# a word-bounded, case-insensitive "Alert[s]". The word boundary (\b) already
# prevents matching lowercase "alerts" inside URL paths (e.g., "/v0/.../alerts"),
# so the case-insensitive flag adds robustness against future CLI wording changes
# (e.g., lowercase "1 alert") without reintroducing the URL-path false positive.
BENCHER_HAS_ALERT=0
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
{ grep -qiE "\bAlerts?\b" "$BENCHER_STDERR" || \
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
BENCHER_HAS_ALERT=1
fi
# Match Bencher's pinned v0.6.2 alert output (e.g., "🚨 2 Alerts", "Alert detected")
# with a case-sensitive "Alert[s]" so lowercase URL paths such as "/v0/.../alerts"
# in operational errors are not classified as performance regressions.
BENCHER_HAS_ALERT=0
if [ $BENCHER_EXIT_CODE -ne 0 ] && \
{ grep -qE "\bAlerts?\b" "$BENCHER_STDERR" || \
grep -qiE "threshold violation|boundary violation" "$BENCHER_STDERR"; }; then
BENCHER_HAS_ALERT=1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml around lines 659 - 671, The pattern for
detecting Bencher alerts is currently using case-insensitive grep (grep -qiE
"\bAlerts?\b") which matches lowercase "/alerts" in URLs; change that grep to be
case-sensitive (remove the -i flag) so only "Alert" or "Alerts" word-bound
matches trigger setting BENCHER_HAS_ALERT=1, leaving the other grep for
"threshold violation|boundary violation" unchanged; update the combined
condition that references BENCHER_EXIT_CODE and BENCHER_STDERR accordingly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word boundary doesn't prevent URL-path false positives

Medium Severity

The comment claims \b prevents matching "alerts" inside URL paths like /v0/.../alerts, but this is incorrect. \b matches at transitions between word and non-word characters — / is non-word and a is word, so \bAlerts?\b with -i does match "alerts" in URL paths. If Bencher's stderr during an operational failure includes a URL containing /alerts, BENCHER_HAS_ALERT is incorrectly set to 1, causing step 7d's hard-fail gate to be skipped and the error silently downgraded to a warning in step 7c.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6f2bb80. Configure here.

# Cleanup is also handled by the `trap 'rm -f "$BENCHER_STDERR"' EXIT` above,
# so we don't need an explicit rm here.

# Export exit code and alert flag early so downstream steps (7b/7c/7d) always
# see them, even if the post-processing below (step summary, PR comments) fails.
echo "BENCHER_EXIT_CODE=$BENCHER_EXIT_CODE" >> "$GITHUB_ENV"
echo "BENCHER_HAS_ALERT=$BENCHER_HAS_ALERT" >> "$GITHUB_ENV"

# Post report to job summary and PR comment(s) if there's HTML output
if [ -s bench_results/bencher_report.html ]; then
Expand Down Expand Up @@ -703,7 +724,7 @@ jobs:
# STEP 7b: ALERT ON MAIN REGRESSION
# ============================================
- name: Create GitHub Issue for main regression
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT == '1'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
Expand Down Expand Up @@ -789,14 +810,35 @@ jobs:
fi

# ============================================
# STEP 7c: FAIL WORKFLOW ON MAIN REGRESSION
# STEP 7c: WARN ON MAIN REGRESSION
# ============================================
# Regressions are surfaced via the Bencher dashboard, the auto-opened tracking
# issue (Step 7b), and the job-summary report. Do not fail the workflow on regression
# alerts: single-run alerts on GitHub-hosted runners are dominated by environmental
# noise (alert sets across consecutive runs have ~0 overlap), so blocking main
# produces churn without signal. Re-enable a hard gate once thresholds/sample-size
# tolerate that noise. Operational errors (auth/API/network/CLI) are handled in
# step 7d and still fail the workflow.
- name: Warn if Bencher detected regression on main
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT == '1'
env:
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
echo "::warning::Bencher flagged a regression on main (exit ${BENCHER_EXIT_CODE:-1}). See the open regression issue (label: performance-regression), the Bencher dashboard, and the workflow run: ${RUN_URL}"

# ============================================
# STEP 7d: FAIL ON NON-REGRESSION BENCHER ERRORS
# ============================================
# Only fail on main — PR benchmarks are informational (triggered by 'benchmark' label).
# Regressions on PRs are surfaced via Bencher report comments, not workflow failures.
- name: Fail workflow if Bencher detected regression on main
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0'
# If bencher exited non-zero but stderr did not contain regression alert
# indicators, this is an operational failure (auth/API/network/CLI) rather than
# a performance signal. These should still fail the workflow on main so a broken
# benchmark pipeline (missing uploads, bad credentials, Bencher outage, etc.) is
# not silently hidden behind a warning annotation.
- name: Fail on non-regression Bencher error on main
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && env.BENCHER_EXIT_CODE != '0' && env.BENCHER_HAS_ALERT != '1'
run: |
echo "Bencher detected a regression (exit code: ${BENCHER_EXIT_CODE:-1})"
echo "::error::Bencher exited ${BENCHER_EXIT_CODE:-1} on main with no regression alert in stderr — this indicates an operational failure (auth/API/network/CLI), not a performance regression. Check the logs above."
# Preserve the original Bencher exit code in CI logs for diagnostic context.
exit "${BENCHER_EXIT_CODE:-1}"

# ============================================
Expand Down
Loading