Skip to content

[mirror] ci: warn (don't fail) on Bencher main regression#5

Open
yashwant86 wants to merge 4 commits intomm-base-3168from
mm-pr-3168
Open

[mirror] ci: warn (don't fail) on Bencher main regression#5
yashwant86 wants to merge 4 commits intomm-base-3168from
mm-pr-3168

Conversation

@yashwant86
Copy link
Copy Markdown

Mirror of upstream shakacode#3168 for benchmark. Do not merge.

justin808 and others added 4 commits April 17, 2026 21:20
Single-run alerts on GitHub-hosted runners are dominated by environmental
noise: the set of (benchmark, measure) pairs flagged varies almost entirely
across consecutive pushes on main (Jaccard ~0 between adjacent runs), while
every non-docs commit fails the gate. Tracking issue shakacode#3116 has accumulated
auto-regression comments with no actionable signal.

Retain the Bencher dashboard upload, the auto-opened tracking issue, and the
job-summary report (steps 7a/7b). Downgrade step 7c from `exit 1` to a
`::warning::` annotation so main stops being permanently red. Re-enable a
hard gate later when thresholds/sample-size tolerate the runner noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Distinguish regression alerts from operational failures so the warn-only
policy on main only applies to genuine performance signals:

- Set BENCHER_HAS_ALERT=1 when bencher stderr mentions alert/threshold/
  boundary indicators; leave it 0 for auth/API/network/CLI failures.
- Step 7b (auto-open issue) and step 7c (::warning::) now require
  BENCHER_HAS_ALERT == 1 so operational errors do not create bogus
  regression issues.
- New step 7d fails the workflow on non-regression Bencher errors on
  main, preserving the original safety net for a broken benchmark
  pipeline (missing uploads, bad credentials, Bencher outage).
- Step 7c warning no longer hardcodes a specific issue number
  (shakacode#3116 will churn as issues are closed/reopened) and now includes
  the workflow run URL for a direct clickable link.

Addresses review feedback on shakacode#3168.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Narrows BENCHER_HAS_ALERT from case-insensitive "alert" to word-bounded
case-sensitive "Alert" / "Alerts" so that lowercase "alert" appearing in
bencher URL paths (e.g., "/v0/.../alerts") or CLI text cannot misclassify
an operational failure as a regression and let main go green.

"threshold violation" and "boundary violation" remain case-insensitive
since those specific phrases are unlikely to appear outside real alert
output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses follow-up review feedback on the Bencher main-regression gate:

- Pin bencherdev/bencher to v0.6.2 and document the expected stderr
  phrases ("Alert[s]", "threshold violation", "boundary violation") so a
  silent CLI upgrade can't flip the alert classification.
- Switch the BENCHER_HAS_ALERT grep from case-sensitive `\bAlerts?\b` to
  case-insensitive `\bAlerts?\b` to stay consistent with the 404 guard
  and avoid misclassifying a lowercase `alert` variant as operational.
  Word boundaries still prevent matches against URL paths like
  `/v0/.../alerts`, so there's no new false-positive risk.
- Propagate the original Bencher exit code from step 7d (`exit
  "${BENCHER_EXIT_CODE:-1}"` instead of bare `exit 1`) to preserve
  diagnostic context in CI logs.
- Drop the redundant `rm -f "$BENCHER_STDERR"` now that the
  `trap 'rm -f' EXIT` registered above handles cleanup.
@bot-mergemonkey
Copy link
Copy Markdown

No Reviewable Files

All files in this PR were filtered out (possibly due to size limits or invalid format).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants