Confirm benchmark regressions on a fresh runner before filing the main issue by justin808 · Pull Request #3810 · shakacode/react_on_rails

justin808 · 2026-06-08T20:43:10Z

Summary

Fixes #3670. A single noisy Bencher alert on main could previously create/update a performance-regression issue. This adds a fresh-runner confirmation step between the initial benchmark run and the main issue: a main-push alert is now a non-fatal candidate that must re-alert on an independent rerun before any issue is filed. Requiring a second, independent alert turns the per-commit false-positive rate from ~p into ~p² for runner noise.

Scope is exactly as the issue specifies: only main issue creation is gated. PR summary comments still post unconditionally (initial mode, unchanged).

Pipeline (`benchmark.yml`)

benchmark (initial)   on a main alert -> write a NON-FATAL candidate artifact
                      (regression-candidate-*); the suite stays GREEN
        │
plan-confirmation     read candidates; short-circuit IGNORED_REGRESSION_BENCHMARKS
                      BEFORE spending a rerun; emit a matrix of just the alerting suite/shard(s)
        │
confirm-regressions   rerun those on fresh runners against main's baseline WITHOUT polluting
                      main's series; write a confirmed artifact only if the SAME
                      benchmark+measure re-alerts
        │
report-regressions    consume confirmed artifacts; file one issue with the first-run and
                      confirmation summaries side by side; fail the workflow (it owns pass/fail)

The per-suite execution is extracted into a reusable workflow (benchmark-suite.yml, modes initial/confirm) so the initial run and the confirmation rerun share one set of steps with no drift.

Behavior

Situation	Workflow result	Issue filed?
Alert does not reproduce (noise)	🟢 green	no
Same benchmark+measure re-alerts	🔴 red	yes (first-run + confirmation evidence)
Inconclusive rerun (bench/Bencher failure, corrupt/unmatched candidate)	🔴 red	no
Every alert is in `IGNORED_REGRESSION_BENCHMARKS`	🟢 green (notice)	no

The workflow summary states confirmed / cleared-as-noise / inconclusive per alert.

Key implementation notes

Don't pollute the baseline: the confirmation submits to a unique synthetic Bencher branch confirm-main-<run_id>-<suite-slug> (never main) with --start-point main --start-point-clone-thresholds --start-point-reset, so the confirmation sample is compared against main's baseline but never appended to main's series — exactly the anchoring the PR path already uses.
Same benchmark+measure matching uses the structured alert pairs now carried in the candidate payload (BencherReport::Alert already exposes the measure slug). Falls back to benchmark-name-only when a measure is absent, as the issue permits.
Exit-code contract change: the first-run alert no longer fails the suite job; report-regressions owns the final pass/fail. A confirmed regression fails the workflow there.
IGNORED_REGRESSION_BENCHMARKS (Suppress regression issues when only /posts_page: Pro regresses #3668) moved to the shared benchmarks/lib/regression_report.rb and is now applied in plan-confirmation, short-circuiting a rerun for benchmarks that would be suppressed anyway.
Candidate vs confirmed artifacts/payloads are kept distinct (regression-candidate-* / regression-confirmed-*).

Out of scope (sequenced follow-up, per #3670)

Loosening the THRESHOLDS to buy back the statistical power that double-gating costs is intentionally a separate, empirical follow-up: it requires measuring the post-confirmation false-positive and missed-regression rates on real main data, so it cannot be tuned blindly here.

Known tradeoff (acknowledged in the issue)

The confirmation clones main's baseline after the initial run appended its (regressing) sample to main's series, so a borderline regression can be slightly diluted by its own already-merged sample. This is inherent to confirming post-merge and is what the threshold-loosening follow-up addresses. A future refinement could pin the clone to the commit's parent via --start-point-hash.

Testing

bundle exec rspec benchmarks/spec — 212 examples, 0 failures (26 new: regression_report_spec, plan_confirmation_spec, confirmation-mode cases in track_benchmarks_spec; report_regressions_spec reworked for the confirmed-artifact contract).
actionlint clean on both benchmark.yml and benchmark-suite.yml.
rubocop clean on all changed/new Ruby files.
End-to-end candidate→confirm→report file round-trip verified locally.

Live validation gap: the multi-job GitHub Actions DAG (reusable-workflow calls, dynamic confirmation matrix, fresh-runner reruns, Bencher submissions) cannot be exercised locally — it needs a real main push with a Bencher token. A maintainer should watch the first alerting main run to confirm the candidate→confirm→report handoff end to end. No CHANGELOG entry: per the repo's changelog guidelines this is internal CI/benchmark tooling, not a user-visible source change.

🤖 Generated with Claude Code

Note

Medium Risk
Large CI workflow refactor changes when benchmark workflows fail and when regression issues are filed; mistakes could miss real regressions or spam issues, but scope is internal benchmark tooling only.

Overview
Adds a two-stage main regression pipeline: Bencher alerts on the first main run are non-fatal candidates (regression-candidate.json artifacts) instead of failing the suite or filing immediately. New jobs plan-confirmation (filter ignored benchmarks, build rerun matrix) and confirm-regressions rerun only alerting suite/shards on fresh runners via a shared reusable workflow benchmark-suite.yml (initial / confirm modes).

track_benchmarks.rb writes candidates with structured benchmark+measure alert pairs; confirmation submits to a synthetic Bencher branch against main's baseline without appending to main's series, and emits regression-confirmed.json only when the same pairs re-alert. report_regressions.rb files issues from confirmed payloads only, shows first-run vs confirmation tables, and owns final workflow failure on confirmed regressions. IGNORED_REGRESSION_BENCHMARKS moves to regression_report.rb and short-circuits before confirmation reruns in plan_confirmation.rb.

^{Reviewed by Cursor Bugbot for commit 985121d. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

New Features
- Two-stage benchmark regression flow: initial candidate runs then optional confirmation reruns before filing issues.
- Reports now include side-by-side "First run" vs "Confirmation" summaries for clearer comparison.
Refactor
- Benchmark CI converted to a reusable, modular workflow for shard-level end-to-end runs and confirmation orchestration.
Tests
- Added comprehensive tests covering confirmation planning, regression matching, and reporting behaviors.

…n issue Implements #3670. A single noisy Bencher alert on main could previously create/update a `performance-regression` issue. Now a main-push alert is a non-fatal *candidate* that must be reproduced by an independent rerun on a fresh GitHub Actions runner before any issue is filed, turning the per-commit false-positive rate from ~p into ~p² for runner noise. Pipeline (benchmark.yml): benchmark (initial) -> on a main alert, write a NON-FATAL candidate artifact (regression-candidate-*); the suite stays green plan-confirmation -> read candidates, short-circuit IGNORED_REGRESSION_BENCHMARKS BEFORE spending a rerun, emit a matrix of just the alerting suite/shard(s) confirm-regressions -> rerun those on fresh runners against main's baseline WITHOUT polluting main's series (synthetic Bencher branch + --start-point main --start-point-clone-thresholds --start-point-reset); write a confirmed artifact only if the SAME benchmark+measure re-alerts report-regressions -> consume confirmed artifacts, file one issue with the first-run and confirmation summaries side by side, and fail the workflow (it owns the final pass/fail) The per-suite execution is extracted into a reusable workflow (benchmark-suite.yml, modes initial/confirm) so the initial run and the confirmation rerun share one set of steps with no drift. Behavior: - A noisy alert that does not reproduce is cleared as noise (no issue, green). - A confirmed regression fails the workflow AND files the issue. - An inconclusive rerun (benchmark execution / Bencher reporting failure, or a corrupt/unmatched candidate) fails the workflow WITHOUT filing. - The workflow summary states confirmed / cleared / inconclusive per alert. - PR summary comments still post unconditionally (initial mode, unchanged); the confirmation machinery is main-push only. Same benchmark+measure matching uses the structured alert pairs now carried in the candidate payload (BencherReport::Alert already exposes measure), with a name-only fallback when a measure is absent. IGNORED_REGRESSION_BENCHMARKS moved to the shared regression_report.rb and is now applied in plan-confirmation. Threshold loosening to buy back the power double-gating costs is intentionally a sequenced, empirical follow-up (see #3670), not part of this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-08T20:43:28Z

Walkthrough

Adds a two-stage benchmark regression confirmation flow: initial runs emit non-fatal candidate artifacts; a plan step filters/schedules reruns; confirmation runs re-alerting the same benchmark+measure produce confirmed artifacts; reporting files issues only for confirmed regressions. Integrates via a new reusable benchmark-suite workflow and updates tracking/reporting scripts and specs.

Changes

Benchmark regression confirmation workflow

Layer / File(s)	Summary
Regression report contract and alert helpers `benchmarks/lib/regression_report.rb`	RegressionReport expanded to candidate/confirmed filenames, alert builders, measure normalization, alerts_match? logic, confirmed_alerts, and ignore-list/actionable helpers.
Track benchmarks confirmation mode and artifact writers `benchmarks/track_benchmarks.rb`	Adds `BENCHMARK_MODE=confirm` flow: confirmation branch naming, deduped alert-pairs, candidate loading, confirmation outcome classification, and writers for candidate/confirmed JSON artifacts and summaries.
Plan confirmation candidate filtering and matrix selection `benchmarks/plan_confirmation.rb`	New script discovers candidate artifacts, short-circuits fully-ignored candidates, matches remaining candidates to benchmark matrix rows, emits `confirmation_matrix` or fails on unreadable/unmatched candidates, and announces suppressed benchmarks.
Reusable benchmark-suite workflow `.github/workflows/benchmark-suite.yml`	New reusable workflow runs one suite/shard in `initial` or `confirm` mode: unpacks matrix_row, installs Bencher/tooling, builds assets and optional Rails Pro DB, starts server/renderer with PID readiness polling, runs benchmarks, validates results/logs, uploads outputs, and in confirm mode downloads candidate artifact and runs `benchmarks/track_benchmarks.rb`.
Main workflow orchestration `.github/workflows/benchmark.yml`	Refactors top-level CI to call the reusable workflow per matrix row and orchestrates: `benchmark` (initial) → `plan-confirmation` → `confirm-regressions` (confirm mode, safe empty-matrix fallback) → `report-regressions` (consumes confirmed artifacts).
Regression reporting: confirmed-artifact consumption `benchmarks/report_regressions.rb`	Now scans `confirmed` payloads, validates payload shapes, renders first-run vs confirmation shard summaries side-by-side, files issues only for confirmed regressions, and returns `:clean`/`:filed`/`:error` with workflow-failure semantics when confirmed regressions were filed or payloads unreadable.
Specs and test harness `benchmarks/spec/*`	Adds RSpec coverage for RegressionReport, plan_confirmation behavior, report_regressions confirmed-artifact handling and exit semantics, and track_benchmarks confirmation-mode helpers and outcomes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related issues

#3670: Directly implements the confirm-regressions-before-posting-issue proposal (candidate/confirmed hand-off, fresh-run confirmations, reset baseline, and benchmark+measure matching).

Possibly related PRs

shakacode/react_on_rails#3668: Related changes to regression hand-off payload keys and ignore-list handling that the current PR composes with.
shakacode/react_on_rails#3586: Related Bencher JSON/alert parsing and table-rendering code that confirmation logic builds upon.
shakacode/react_on_rails#3528: Related benchmark matrix/unified runner work that the new reusable workflow extends.

Suggested labels

enhancement, review-needed, full-ci, benchmark, ci-tooling, P2

Suggested reviewers

alexeyr-ci2

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: implementing a confirmation step on a fresh runner before filing benchmark regression issues on main.
Linked Issues check	✅ Passed	The PR implements all major requirements from `#3670`: non-fatal candidate artifacts on first alert, plan-confirmation job gating reruns, fresh-runner confirmation without polluting main's baseline using synthetic Bencher branch, benchmark+measure matching with name-only fallback, ignored-benchmark short-circuit, confirmed-only issue filing, and preservation of PR comments.
Out of Scope Changes check	✅ Passed	All changes directly support the confirmation-before-filing workflow: new reusable benchmark-suite.yml and related helpers, regression_report.rb expanded with confirmation logic, plan_confirmation.rb and report_regressions.rb refactored for candidate/confirmed artifact handoff, and track_benchmarks.rb updated for initial/confirm modes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch justin808/fix-issue-3670

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit acc4f36. Configure here.}

claude · 2026-06-08T20:51:06Z

test

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/benchmark-suite.yml:
- Line 83: Replace all GitHub Actions uses: entries that reference tags with
full commit SHAs: update actions/checkout@v6, bencherdev/bencher@v0.6.2,
grafana/setup-k6-action@v1, actions/cache@v5 (both occurrences),
pnpm/action-setup@v6, actions/setup-node@v6, actions/upload-artifact@v7 (all
occurrences), and actions/download-artifact@v7 to owner/repo@<full-commit-sha>
form; locate each occurrence by the exact uses string (e.g.,
"actions/checkout@v6", "bencherdev/bencher@v0.6.2", etc.) and replace the tag
with the corresponding full commit SHA from the action's repository, ensuring
every duplicate occurrence is updated and commit SHAs are immutable values.
- Around line 430-435: The workflow step named "Download first-run candidate"
currently can abort the job on failure and prevents the Ruby confirmation
handler from running; update that step to allow the job to continue on
missing/corrupt artifacts by adding continue-on-error: true (or enabling the
action’s native missing-artifact option) so the workflow proceeds to run
benchmarks/track_benchmarks.rb which expects load_candidate to return [nil, ""]
and then executes run_confirmation.

In @.github/workflows/benchmark.yml:
- Line 210: Replace the floating tag uses: actions/download-artifact@v7 with a
pinned commit SHA for both occurrences where the workflow references the
download-artifact action (search for the literal "uses:
actions/download-artifact@v7"); update each to the specific commit SHA form
actions/download-artifact@<commit-sha> (obtain the correct commit SHA from the
actions/download-artifact repository) so both download-artifact steps are
immutably pinned.
- Around line 179-183: Replace the broad secrets inheritance with an explicit
secrets mapping in the reusable workflow call: remove "secrets: inherit" in
.github/workflows/benchmark.yml (the step using the reusable workflow with with:
matrix_row/mode) and instead provide a limited secrets: mapping that only passes
REACT_ON_RAILS_PRO_LICENSE_V2, BENCHER_API_TOKEN, and GITHUB_TOKEN to the
reusable workflow; ensure the reusable workflow (benchmark-suite.yml) expects
those exact inputs. Additionally, update action pins in both
.github/workflows/benchmark.yml and benchmark-suite.yml (e.g.,
actions/download-artifact@v7, actions/checkout@v6) to use reviewed commit SHAs
rather than tag versions to comply with the supply-chain policy.

In `@benchmarks/plan_confirmation.rb`:
- Around line 33-38: The load_candidate_payload method currently parses any JSON
and returns it, which can later cause build_plan to crash when it expects a
Hash; update load_candidate_payload to validate that the parsed value is a Hash
(and contains required keys used by build_plan such as the matrix/fields you
later index) and return nil if the check fails so the existing
inconclusive/unreadable-candidate handling is used; reference
load_candidate_payload and build_plan to locate where to enforce
parsed.is_a?(Hash) and required key presence and ensure you still rescue
StandardError and warn with the same error message when parsing fails.

In `@benchmarks/report_regressions.rb`:
- Around line 366-373: The load_payload function currently parses JSON but
doesn't validate structure; update load_payload to ensure the parsed value is a
Hash and contains required keys (e.g., "suite_name", plus any other keys
expected later around where payload is used at lines 410-411) and return nil if
the schema is invalid; specifically, after JSON.parse in load_payload check that
the result.is_a?(Hash) and required_keys.all? { |k| result.key?(k) } and if not
warn with the same ::error:: message (including e.class/message when parse
fails) and return nil so schema-invalid payloads follow the existing error path
instead of causing crashes when consumers of load_payload (the code that reads
payload["suite_name"] etc.) assume a Hash.

In `@benchmarks/track_benchmarks.rb`:
- Around line 335-347: The current regressed_alert_pairs method drops alerts
with no benchmark (report.alerts.select(&:benchmark)), which lets incomplete
payloads be treated later as empty matches and classified as :cleared; instead
preserve unmatchable alerts and surface them as inconclusive so they cannot be
noise-cleared. Change regressed_alert_pairs to include alerts with nil benchmark
(don’t filter them out) and convert those to an explicit inconclusive sentinel
(e.g. call or construct a RegressionReport.inconclusive(alert) or return a pair
with a nil benchmark/measure marker) so downstream matching logic can detect and
return :inconclusive; also update the empty-match handling (the classifier
currently emitting :cleared for empty matches) to return :inconclusive when it
sees that sentinel or a nil benchmark/measure pair. Ensure you reference and
modify RegressionReport.alert usage and the empty-match classification code to
recognize the new inconclusive sentinel.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 155fe7b5-0611-4415-ac42-9e1b825cc540

📥 Commits

Reviewing files that changed from the base of the PR and between 2c2cbb7 and acc4f36.

📒 Files selected for processing (10)

.github/workflows/benchmark-suite.yml
.github/workflows/benchmark.yml
benchmarks/lib/regression_report.rb
benchmarks/plan_confirmation.rb
benchmarks/report_regressions.rb
benchmarks/spec/plan_confirmation_spec.rb
benchmarks/spec/regression_report_spec.rb
benchmarks/spec/report_regressions_spec.rb
benchmarks/spec/track_benchmarks_spec.rb
benchmarks/track_benchmarks.rb

coderabbitai · 2026-06-08T20:52:03Z

+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/benchmark-suite.yml" if [ ! -f "$FILE" ]; then echo "Missing $FILE" exit 1 fi echo "== uses: lines ==" rg -n "uses:\s*[^@]+@[^ ]+" "$FILE" echo echo "== line excerpts around specified lines ==" for L in 83 98 107 114 139 149 156 418 432 451 462; do echo "--- $L ---" sed -n "$((L-3)),$((L+3))p" "$FILE" | nl -ba done

Repository: shakacode/react_on_rails

Length of output: 659

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/benchmark-suite.yml" if [ ! -f "$FILE" ]; then echo "Missing $FILE" exit 1 fi echo "== uses: lines ==" rg -n "uses:\s*[^@]+@[^ ]+" "$FILE" echo echo "== line excerpts around specified lines ==" for L in 83 98 107 114 139 149 156 418 432 451 462; do echo "--- $L ---" sed -n "$((L-3)),$((L+3))p" "$FILE" | nl -ba done

Repository: shakacode/react_on_rails

Length of output: 659

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/benchmark-suite.yml" rg -n "uses:\s*[^@]+@[^ ]+" "$FILE" || true for L in 83 98 107 114 139 149 156 418 432 451 462; do echo "--- $L ---" sed -n "$((L-3)),$((L+3))p" "$FILE" | nl -ba done

Repository: shakacode/react_on_rails

Length of output: 596

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/benchmark-suite.yml" [ -f "$FILE" ] || { echo "Missing $FILE"; exit 1; } echo "== uses: entries ==" rg -n "^\s*uses:\s*[^@]+@[^ ]+" "$FILE" echo echo "== Excerpts around lines ==" for L in 83 98 107 114 139 149 156 418 432 451 462; do echo "--- $L ---" sed -n "$((L-2)),$((L+2))p" "$FILE" | nl -ba done

Repository: shakacode/react_on_rails

Length of output: 646

Pin GitHub Actions uses: entries to full commit SHAs (not tags) to prevent tag retargeting.
In .github/workflows/benchmark-suite.yml, these uses: lines use tag refs instead of SHAs: actions/checkout@v6 (83), bencherdev/bencher@v0.6.2 (98), grafana/setup-k6-action@v1 (107), actions/cache@v5 (114,139), pnpm/action-setup@v6 (149), actions/setup-node@v6 (156), actions/upload-artifact@v7 (418,451,462), actions/download-artifact@v7 (432). Pin each to owner/repo@<full-commit-sha>.

🧰 Tools

🪛 zizmor (1.25.2)

[error] 83-83: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmark-suite.yml at line 83, Replace all GitHub Actions uses: entries that reference tags with full commit SHAs: update actions/checkout@v6, bencherdev/bencher@v0.6.2, grafana/setup-k6-action@v1, actions/cache@v5 (both occurrences), pnpm/action-setup@v6, actions/setup-node@v6, actions/upload-artifact@v7 (all occurrences), and actions/download-artifact@v7 to owner/repo@<full-commit-sha> form; locate each occurrence by the exact uses string (e.g., "actions/checkout@v6", "bencherdev/bencher@v0.6.2", etc.) and replace the tag with the corresponding full commit SHA from the action's repository, ensuring every duplicate occurrence is updated and commit SHAs are immutable values.

Source: Linters/SAST tools

coderabbitai · 2026-06-08T20:52:03Z

+      - name: Download first-run candidate
+        if: inputs.mode == 'confirm'
+        uses: actions/download-artifact@v7
+        with:
+          name: ${{ env.CANDIDATE_ARTIFACT_NAME }}
+          path: candidate


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show the referenced section of the workflow sed -n '380,480p' .github/workflows/benchmark-suite.yml | cat -n # Locate and inspect track_benchmarks.rb fd -a "track_benchmarks\.rb" . || true TRACK_FILE="$(fd -a "track_benchmarks\.rb" . | head -n 1 || true)" if [ -n "${TRACK_FILE}" ]; then echo "Using: $TRACK_FILE" # Search for missing-candidate handling and run_confirmation classification rg -n "missing|candidate|inconclusive|run_confirmation|confirmation" "$TRACK_FILE" || true # Show surrounding relevant code blocks (best-effort) # (Print a few chunks around any matches) for pat in "run_confirmation" "inconclusive" "missing" "candidate"; do echo "---- Matches for $pat ----" rg -n --context 5 "$pat" "$TRACK_FILE" || true done else echo "track_benchmarks.rb not found via fd" fi

Repository: shakacode/react_on_rails

Length of output: 30976

Avoid short-circuiting the confirmation failure path in benchmark-suite.yml

track_benchmarks.rb already treats a missing/corrupt downloaded candidate as :inconclusive (it returns [nil, ""] from load_candidate, writes the inconclusive confirmation summary, and exits 1). If actions/download-artifact@v7 fails instead (e.g., candidate artifact missing), the job would stop before run_confirmation runs, so the Ruby handler can’t apply that logic. Add continue-on-error: true to the download step (or use the step’s native “missing artifact” option) so the workflow can reach ruby benchmarks/track_benchmarks.rb.

Suggested change

- name: Download first-run candidate if: inputs.mode == 'confirm' + continue-on-error: true uses: actions/download-artifact@v7 with: name: ${{ env.CANDIDATE_ARTIFACT_NAME }} path: candidate

🧰 Tools

🪛 zizmor (1.25.2)

[error] 432-432: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmark-suite.yml around lines 430 - 435, The workflow step named "Download first-run candidate" currently can abort the job on failure and prevents the Ruby confirmation handler from running; update that step to allow the job to continue on missing/corrupt artifacts by adding continue-on-error: true (or enabling the action’s native missing-artifact option) so the workflow proceeds to run benchmarks/track_benchmarks.rb which expects load_candidate to return [nil, ""] and then executes run_confirmation.

coderabbitai · 2026-06-08T20:52:03Z

+    uses: ./.github/workflows/benchmark-suite.yml
+    with:
+      matrix_row: ${{ toJSON(matrix) }}
+      mode: initial
+    secrets: inherit


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show benchmark.yml around the mentioned lines echo "=== benchmark.yml lines 150-270 ===" nl -ba .github/workflows/benchmark.yml | sed -n '150,270p' echo "=== benchmark.yml lines 270-330 ===" nl -ba .github/workflows/benchmark.yml | sed -n '270,340p' # Locate all occurrences of "secrets: inherit" in benchmark.yml echo "=== occurrences of 'secrets: inherit' in benchmark.yml ===" rg -n "secrets:\s*inherit" .github/workflows/benchmark.yml # Inspect reusable workflow benchmark-suite.yml around its workflow_call secrets if [ -f .github/workflows/benchmark-suite.yml ]; then echo "=== benchmark-suite.yml on.workflow_call secrets (search) ===" rg -n "workflow_call:|secrets:" .github/workflows/benchmark-suite.yml echo "=== benchmark-suite.yml lines 1-220 ===" nl -ba .github/workflows/benchmark-suite.yml | sed -n '1,220p' echo "=== benchmark-suite.yml lines 220-400 ===" nl -ba .github/workflows/benchmark-suite.yml | sed -n '220,400p' else echo "benchmark-suite.yml not found at .github/workflows/benchmark-suite.yml" fi # Also check if any other reusable workflow files are used echo "=== benchmark.yml reusable workflow uses (workflow_call) ===" rg -n "uses:\s*\./\.github/workflows/" .github/workflows/benchmark.yml

Repository: shakacode/react_on_rails

Length of output: 165

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== benchmark.yml lines 150-270 ===" nl -ba .github/workflows/benchmark.yml | sed -n '150,270p' echo "=== benchmark.yml lines 270-360 ===" nl -ba .github/workflows/benchmark.yml | sed -n '270,360p' echo "=== occurrences of 'secrets: inherit' in benchmark.yml ===" rg -n "secrets:\s*inherit" .github/workflows/benchmark.yml || true # Determine exact path of reusable workflow being called # It's referenced as "./.github/workflows/benchmark-suite.yml" in the snippet. echo "=== check existence of reusable workflow ===" ls -la .github/workflows/benchmark-suite.yml .github/workflows/benchmark-suite*.yml .github/workflows/benchmark-suite*.yaml 2>/dev/null || true # Inspect reusable workflow benchmark-suite.yml if [ -f .github/workflows/benchmark-suite.yml ]; then echo "=== benchmark-suite.yml: header through workflow_call secrets ===" # Print first 160 lines and any workflow_call secrets block nl -ba .github/workflows/benchmark-suite.yml | sed -n '1,200p' echo "=== benchmark-suite.yml: workflow_call secrets section (search) ===" rg -n "workflow_call:|secrets:" .github/workflows/benchmark-suite.yml echo "=== benchmark-suite.yml: around secrets block ===" # Show context around secrets: definitions rg -n "on:\s*workflow_call" -n .github/workflows/benchmark-suite.yml || true # Use grep for "secrets:" and show 40 lines around each match while read -r line; do ln=$(echo "$line" | cut -d: -f1) start=$((ln-20)) end=$((ln+60)) echo "--- benchmark-suite.yml around line $ln ---" nl -ba .github/workflows/benchmark-suite.yml | sed -n "${start},${end}p" done < <(rg -n "secrets:" .github/workflows/benchmark-suite.yml | head -n 20 || true) echo "=== benchmark-suite.yml: search for secret usages ===" rg -n "BENCHER_API_TOKEN|REACT_ON_RAILS_PRO_LICENSE_V2|GITHUB_TOKEN|\bsecrets\." .github/workflows/benchmark-suite.yml || true fi # Also check if benchmark-suite.yml uses any action without pinning (sha) # We'll just check uses lines that reference github actions. Without pinning means using `@v1` or `@v2` etc. if [ -f .github/workflows/benchmark-suite.yml ]; then echo "=== benchmark-suite.yml: action uses without SHA (heuristic) ===" rg -n "uses:\s*[^#\n]*`@v`[0-9]|uses:\s*[^#\n]*`@v`[0-9]+|uses:\s*[^#\n]*`@master`|uses:\s*[^#\n]*`@main`" .github/workflows/benchmark-suite.yml || true fi

Repository: shakacode/react_on_rails

Length of output: 165

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== benchmark.yml lines 150-270 ===" nl -ba .github/workflows/benchmark.yml | sed -n '150,270p' echo "=== benchmark.yml lines 270-360 ===" nl -ba .github/workflows/benchmark.yml | sed -n '270,360p' echo "=== occurrences of 'secrets: inherit' in benchmark.yml ===" rg -n "secrets:\s*inherit" .github/workflows/benchmark.yml || true echo "=== check existence of reusable workflow ===" ls -la .github/workflows/benchmark-suite.yml 2>/dev/null || true if [ -f .github/workflows/benchmark-suite.yml ]; then echo "=== benchmark-suite.yml lines 1-240 ===" nl -ba .github/workflows/benchmark-suite.yml | sed -n '1,240p' echo "=== benchmark-suite.yml: locate workflow_call secrets definitions ===" rg -n "on:\s*workflow_call|secrets:" .github/workflows/benchmark-suite.yml echo "=== benchmark-suite.yml: locate secret usages ===" rg -n "secrets\.|BENCHER_API_TOKEN|REACT_ON_RAILS_PRO_LICENSE_V2" .github/workflows/benchmark-suite.yml || true echo "=== benchmark-suite.yml: uses lines and sha pinning heuristic ===" # Show uses lines (excluding local ./) to judge pinning rg -n "uses:\s*(?!\./)([^ ]+)" .github/workflows/benchmark-suite.yml | head -n 80 echo "=== benchmark-suite.yml: uses lines without commit SHA heuristic ===" rg -n "uses:\s*(?!\./).*@((v|V)[0-9]+|[a-zA-Z]+)$" .github/workflows/benchmark-suite.yml || true else echo "benchmark-suite.yml not found" fi

Repository: shakacode/react_on_rails

Length of output: 165

🏁 Script executed:

# re-run with a single script (avoid duplication) set -euo pipefail echo "=== benchmark.yml lines 150-270 ===" nl -ba .github/workflows/benchmark.yml | sed -n '150,270p' echo "=== benchmark.yml lines 270-360 ===" nl -ba .github/workflows/benchmark.yml | sed -n '270,360p' echo "=== occurrences of 'secrets: inherit' in benchmark.yml ===" rg -n "secrets:\s*inherit" .github/workflows/benchmark.yml || true echo "=== reusable workflow file listing matching benchmark-suite ===" fd -a "benchmark-suite*.*ya?ml" .github/workflows || true if [ -f .github/workflows/benchmark-suite.yml ]; then echo "=== benchmark-suite.yml lines 1-260 ===" nl -ba .github/workflows/benchmark-suite.yml | sed -n '1,260p' echo "=== benchmark-suite.yml workflow_call secrets definitions ===" rg -n "on:\s*workflow_call|secrets:" .github/workflows/benchmark-suite.yml echo "=== benchmark-suite.yml secret usages ===" rg -n "secrets\.|BENCHER_API_TOKEN|REACT_ON_RAILS_PRO_LICENSE_V2" .github/workflows/benchmark-suite.yml || true else echo "benchmark-suite.yml not present at expected path" fi

Repository: shakacode/react_on_rails

Length of output: 165

🏁 Script executed:

#!/bin/bash set -euo pipefail file=".github/workflows/benchmark.yml" echo "=== benchmark.yml: occurrences of 'secrets: inherit' ===" rg -n "secrets:\s*inherit" "$file" || true echo "=== benchmark.yml: show around first referenced area (170-210) ===" awk 'NR>=170 && NR<=210 {printf "%d\t%s\n", NR, $0}' "$file" || true echo "=== benchmark.yml: show around second referenced area (230-270) ===" awk 'NR>=230 && NR<=270 {printf "%d\t%s\n", NR, $0}' "$file" || true echo "=== benchmark.yml: show reusable workflow call sites ===" rg -n "uses:\s*\./\.github/workflows/" "$file" || true suite=".github/workflows/benchmark-suite.yml" echo "=== check existence of benchmark-suite.yml ===" ls -la "$suite" 2>/dev/null || true if [ -f "$suite" ]; then echo "=== benchmark-suite.yml: locate on.workflow_call and secrets definitions ===" rg -n "on:\s*workflow_call|secrets:" "$suite" || true echo "=== benchmark-suite.yml: print beginning (1-220) ===" awk 'NR>=1 && NR<=220 {printf "%d\t%s\n", NR, $0}' "$suite" || true echo "=== benchmark-suite.yml: search for secret consumption (expected secrets) ===" rg -n "BENCHER_API_TOKEN|REACT_ON_RAILS_PRO_LICENSE_V2|GITHUB_TOKEN|secrets\." "$suite" || true echo "=== benchmark-suite.yml: detect third-party action uses (exclude local ./) ===" rg -n "uses:\s*(?!\./)([^#\n]+)" "$suite" || true echo "=== benchmark-suite.yml: heuristic check for unpinned action refs (`@v1/`@v2/@master) ===" rg -n "uses:\s*(?!\./).*@((v)[0-9]+|master|main)" "$suite" || true fi

Repository: shakacode/react_on_rails

Length of output: 15640

🏁 Script executed:

#!/bin/bash set -euo pipefail suite=".github/workflows/benchmark-suite.yml" file=".github/workflows/benchmark.yml" echo "=== benchmark-suite.yml: lines containing 'secrets:' (declarations only) ===" rg -n "secrets:" "$suite" || true echo "=== benchmark-suite.yml: lines containing 'workflow_call' ===" rg -n "workflow_call" "$suite" || true echo "=== benchmark-suite.yml: show around lines ~35-90 (env and secret usage start) ===" awk 'NR>=40 && NR<=90 {printf "%d\t%s\n", NR, $0}' "$suite" echo "=== benchmark-suite.yml: show around lines 420-480 (secret env mapping) ===" awk 'NR>=420 && NR<=480 {printf "%d\t%s\n", NR, $0}' "$suite" echo "=== benchmark.yml: show around 200-220 (first download-artifact step) ===" awk 'NR>=195 && NR<=220 {printf "%d\t%s\n", NR, $0}' "$file" echo "=== benchmark.yml: show around 265-290 (second download-artifact step) ===" awk 'NR>=260 && NR<=290 {printf "%d\t%s\n", NR, $0}' "$file"

Repository: shakacode/react_on_rails

Length of output: 8503

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for any repo policy about pinning actions to SHAs rg -n "pin(ned)?( to)? (commit|SHA|sh a|hash)|SHA-pinned|commit SHA|pin to .*@([0-9a-f]{7,40})" .github README* CONTRIBUTING* docs* package* 2>/dev/null || true # Also look for general "actions are pinned" statements rg -n "pinned.*action|actions/.*@([0-9a-f]{7,40})|`@v`[0-9]+(?![^#]*[0-9a-f]{7,40})" .github/workflows 2>/dev/null || true # Check whether benchmark.yml or benchmark-suite.yml contains any action refs that are already commit-SHA pinned rg -n "uses:\s*([^#\n]+)@([0-9a-f]{7,40})" .github/workflows/benchmark.yml .github/workflows/benchmark-suite.yml || true

Repository: shakacode/react_on_rails

Length of output: 496

Stop passing all secrets to the reusable benchmark workflow and align action pinning with supply-chain policy.

In .github/workflows/benchmark.yml, replace secrets: inherit (lines 183 and 245) with an explicit secrets: mapping limited to the secrets benchmark-suite.yml actually uses: REACT_ON_RAILS_PRO_LICENSE_V2, BENCHER_API_TOKEN, and GITHUB_TOKEN (instead of exposing the full repo secret set).

.github/workflows/benchmark.yml (and benchmark-suite.yml) use tag-based actions like actions/download-artifact@v7 / actions/checkout@v6; docs/pro/license-ci-integration.md asks for third-party actions to be pinned to reviewed commit SHAs.

🧰 Tools

🪛 zizmor (1.25.2)

[warning] 179-179: secrets unconditionally inherited by called workflow (secrets-inherit): this reusable workflow

(secrets-inherit)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmark.yml around lines 179 - 183, Replace the broad secrets inheritance with an explicit secrets mapping in the reusable workflow call: remove "secrets: inherit" in .github/workflows/benchmark.yml (the step using the reusable workflow with with: matrix_row/mode) and instead provide a limited secrets: mapping that only passes REACT_ON_RAILS_PRO_LICENSE_V2, BENCHER_API_TOKEN, and GITHUB_TOKEN to the reusable workflow; ensure the reusable workflow (benchmark-suite.yml) expects those exact inputs. Additionally, update action pins in both .github/workflows/benchmark.yml and benchmark-suite.yml (e.g., actions/download-artifact@v7, actions/checkout@v6) to use reviewed commit SHAs rather than tag versions to comply with the supply-chain policy.

Source: Linters/SAST tools

coderabbitai · 2026-06-08T20:52:03Z

-        if: always()
-        uses: actions/upload-artifact@v7
+      - name: Download regression candidates
+        uses: actions/download-artifact@v7


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/benchmark.yml" echo "== File existence ==" ls -la "$FILE" || exit 1 echo echo "== Context around line 210 ==" nl -ba "$FILE" | sed -n '195,230p' echo echo "== Context around line 272 ==" nl -ba "$FILE" | sed -n '255,290p' echo echo "== All uses: entries in this workflow (for tag vs SHA pin check) ==" nl -ba "$FILE" | rg -n "uses:\s*[^#]+" || true

Repository: shakacode/react_on_rails

Length of output: 264

🏁 Script executed:

#!/bin/bash set -euo pipefail FILE=".github/workflows/benchmark.yml" python3 - <<'PY' import itertools path = ".github/workflows/benchmark.yml" with open(path, "r", encoding="utf-8") as f: lines = f.readlines() def show(start, end): start_i = max(1, start) end_i = min(len(lines), end) for i in range(start_i-1, end_i): print(f"{i+1:4d}: {lines[i].rstrip()}") print("== Context around line 210 ==") show(195, 230) print("\n== Context around line 272 ==") show(255, 290) print("\n== All uses: entries in this workflow ==") for i, line in enumerate(lines, start=1): if "uses:" in line and not line.strip().startswith("#"): # print the uses line and its line number print(f"{i}: {line.rstrip()}") PY

Repository: shakacode/react_on_rails

Length of output: 3424

Pin actions/download-artifact to commit SHAs (not @v7 tags).

Both artifact download steps use actions/download-artifact@v7, so they remain vulnerable to upstream tag moves. Update to actions/download-artifact@<commit-sha>.

Applies to: 210-210, 272-272

🧰 Tools

🪛 zizmor (1.25.2)

[error] 210-210: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmark.yml at line 210, Replace the floating tag uses: actions/download-artifact@v7 with a pinned commit SHA for both occurrences where the workflow references the download-artifact action (search for the literal "uses: actions/download-artifact@v7"); update each to the specific commit SHA form actions/download-artifact@<commit-sha> (obtain the correct commit SHA from the actions/download-artifact repository) so both download-artifact steps are immutably pinned.

Source: Linters/SAST tools

coderabbitai · 2026-06-08T20:52:03Z

+# The structured benchmark+measure pairs Bencher raised an active alert for, deduped.
+# Read from the same alerts[] as #regression?/#regressed_benchmark_names. Handed off in
+# the candidate so a confirmation rerun can require the SAME pair to re-alert (not just
+# any alert in the suite). An alert with no benchmark name is dropped — it can't be
+# matched. measure may be nil; the matcher falls back to name-only for those.
+def regressed_alert_pairs(report)
+  return [] unless report
+
+  report.alerts
+        .select(&:benchmark)
+        .map { |alert| RegressionReport.alert(alert.benchmark, alert.measure) }
+        .uniq
+end


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat unmatchable regression alerts as inconclusive instead of cleared.

At Line 344, alerts without a benchmark are dropped, and then Line 387 classifies empty matches as :cleared. That can silently clear a real first-run alert when the payload is incomplete (no benchmark identifier), even though it is not actually confirmable. This path should return :inconclusive (operational failure) rather than noise-cleared.

Also applies to: 379-389

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/track_benchmarks.rb` around lines 335 - 347, The current regressed_alert_pairs method drops alerts with no benchmark (report.alerts.select(&:benchmark)), which lets incomplete payloads be treated later as empty matches and classified as :cleared; instead preserve unmatchable alerts and surface them as inconclusive so they cannot be noise-cleared. Change regressed_alert_pairs to include alerts with nil benchmark (don’t filter them out) and convert those to an explicit inconclusive sentinel (e.g. call or construct a RegressionReport.inconclusive(alert) or return a pair with a nil benchmark/measure marker) so downstream matching logic can detect and return :inconclusive; also update the empty-match handling (the classifier currently emitting :cleared for empty matches) to return :inconclusive when it sees that sentinel or a nil benchmark/measure pair. Ensure you reference and modify RegressionReport.alert usage and the empty-match classification code to recognize the new inconclusive sentinel.

greptile-apps · 2026-06-08T20:52:46Z

Greptile Summary

This PR implements a two-run confirmation gate for main-push benchmark regressions: a Bencher alert on the initial run is now a non-fatal candidate that must re-alert on an independent fresh-runner rerun before any GitHub issue is filed. The pipeline is extracted into a reusable workflow (benchmark-suite.yml) shared by both runs to eliminate drift between the initial and confirmation steps.

New 4-job pipeline in benchmark.yml: benchmark (initial, non-fatal on alert) → plan-confirmation (gate: drops ignored benchmarks, emits confirmation matrix) → confirm-regressions (fresh-runner rerun on a synthetic Bencher branch that never pollutes main's series) → report-regressions (files issue with side-by-side first-run + confirmation evidence, owns final pass/fail).
benchmarks/lib/regression_report.rb becomes the shared hand-off contract (candidate/confirmed filenames, structured alert pairs, IGNORED_REGRESSION_BENCHMARKS moved here from report_regressions.rb), with alerts_match? and confirmed_alerts helpers implementing the same-benchmark+measure matching rule including a nil-measure name-only fallback.
26 new specs across track_benchmarks_spec, plan_confirmation_spec, and regression_report_spec cover the new confirmation mode outcomes; report_regressions_spec is reworked for the confirmed-artifact contract and now asserts that a confirmed regression correctly fails the workflow.

Confidence Score: 3/5

The Ruby scripts and specs are well-structured and the hand-off logic is sound, but both download-artifact glob steps in benchmark.yml lack continue-on-error: true, which will likely cause the confirmation pipeline to fail on every clean main push and every cleared-noise run — the two most common outcomes.

The plan-confirmation and report-regressions jobs each use actions/download-artifact@v7 with a pattern: glob but no fallback for zero matches. On any main push where no benchmarks regressed (or where alerts are cleared on confirmation), those download steps will fail before the handling scripts run, causing the workflow to turn red in all the cases it should be green. This would trigger false broken CI signals on every clean commit to main until fixed.

benchmark.yml lines 209–213 (Download regression candidates) and 271–276 (Download confirmed regressions) need continue-on-error: true; the Ruby scripts are otherwise ready.

Important Files Changed

Filename	Overview
.github/workflows/benchmark-suite.yml	New reusable workflow extracting all benchmark suite steps into a single callable unit; correctly gates candidate/confirmed upload on mode; the Download first-run candidate step (confirm mode) downloads by exact name and won't fail on an empty match since it's always expected to exist. Step ordering is correct.
.github/workflows/benchmark.yml	Orchestration refactored into a 4-job confirmation pipeline; both Download regression candidates (plan-confirmation) and Download confirmed regressions (report-regressions) use pattern: glob downloads without continue-on-error: true, which will likely fail when no matching artifacts exist — the normal case on clean runs and cleared-noise runs.
benchmarks/lib/regression_report.rb	Shared module expanded with candidate/confirmed filenames, structured alert helpers, and IGNORED_REGRESSION_BENCHMARKS moved here from report_regressions.rb; logic is clean, helpers are well-tested, and the measure normalisation correctly handles slug vs name drift.
benchmarks/plan_confirmation.rb	New gate script that reads candidates, short-circuits ignored benchmarks, and emits a confirmation matrix; handles empty, corrupt, and unmatched candidates correctly. The script itself is robust but depends on the download step in benchmark.yml not failing first.
benchmarks/report_regressions.rb	Updated to consume confirmed payloads with first-run/confirmation summaries side by side; old IGNORED_REGRESSION_BENCHMARKS filtering removed (now lives in plan_confirmation); exit-code contract correctly changed so confirmed regressions fail the workflow here.
benchmarks/track_benchmarks.rb	Confirmation mode added with clean separation from initial mode; synthetic branch naming prevents main series pollution; edge case where candidate_alerts is empty after filtering causes :cleared instead of :inconclusive (P2 comment left inline).
benchmarks/spec/plan_confirmation_spec.rb	New spec providing good coverage of the gate: empty/corrupt/unmatched candidates, ignored benchmark short-circuit, mixed suppressed+confirmed, and GITHUB_OUTPUT emission all tested.
benchmarks/spec/regression_report_spec.rb	New spec pins alert normalisation, matching (including slug/name equivalence and nil-measure fallback), confirmed_alerts intersection, and ignore-list filtering.
benchmarks/spec/report_regressions_spec.rb	Reworked for confirmed-artifact contract; correctly adds tests asserting exit 1 on confirmed regression and verifying first-run/confirmation evidence rendered side by side in the issue body.
benchmarks/spec/track_benchmarks_spec.rb	26 new examples covering confirmation mode: branch naming, start-point args, alert pair extraction, and all four outcome classifications are unit-tested.

_{Reviews (1): Last reviewed commit: "Confirm benchmark regressions on a fresh..." | Re-trigger Greptile}

greptile-apps · 2026-06-08T20:52:49Z

+      - name: Download regression candidates
+        uses: actions/download-artifact@v7
        with:
-          name: regression-${{ matrix.artifact_name_prefix }}-${{ github.run_number }}${{ matrix.artifact_name_suffix }}
-          path: bench_results/regression.json
-          if-no-files-found: ignore
-          retention-days: 7
-
-      - name: Benchmark workflow summary
-        if: always()
-        # Route context/matrix values through env so the shell treats them as data,
-        # not code (GitHub Actions injection hardening; ref_name can be attacker-set).
+          path: candidate-artifacts
+          pattern: regression-candidate-*


download-artifact with a glob pattern may fail when no candidates exist

actions/download-artifact@v4+ using pattern: (no name:) fails with an error when the glob matches zero artifacts. On any clean main push where no suite raised a Bencher alert, no regression-candidate-* artifacts are uploaded, so this step would fail before plan_confirmation.rb even runs. The job terminates here, emitting no outputs; confirm-regressions is skipped; report-regressions runs under always(), finds nothing, and exits 0 — but the workflow is already marked red from the failing plan-confirmation step. The same pattern appears in the report-regressions job (line 274–275) for regression-confirmed-*, meaning any run without confirmed regressions also fails the download step.

Adding continue-on-error: true to both download steps lets the scripts proceed and handle the empty-directory case they already cover correctly (plan_confirmation.rb returns true with has_confirmations=false when the directory is empty).

greptile-apps · 2026-06-08T20:52:50Z

+      - name: Download confirmed regressions
        uses: actions/download-artifact@v7
        with:
-          path: regression-artifacts
-          pattern: regression-*
+          path: confirmed-artifacts
+          pattern: regression-confirmed-*
      # Stdlib-only script, so the runner's system Ruby is enough (no Bundler/Rails boot).


Same empty-pattern failure risk for report-regressions

Like the plan-confirmation download (line 209–213), this step will fail on any run where no regression-confirmed-* artifact was uploaded — i.e. whenever every alert was cleared as noise or the confirm-regressions job was skipped entirely. Both the cleared and the clean cases should produce a green workflow, but the failing download step prevents report_regressions.rb from running and marks the workflow red. Adding continue-on-error: true here (and at line 209) allows the scripts to reach their empty-directory short-circuits as intended.

greptile-apps · 2026-06-08T20:52:51Z

  report.alerts.filter_map(&:benchmark).uniq
 end

+# The structured benchmark+measure pairs Bencher raised an active alert for, deduped.
+# Read from the same alerts[] as #regression?/#regressed_benchmark_names. Handed off in
+# the candidate so a confirmation rerun can require the SAME pair to re-alert (not just
+# any alert in the suite). An alert with no benchmark name is dropped — it can't be
+# matched. measure may be nil; the matcher falls back to name-only for those.
+def regressed_alert_pairs(report)
+  return [] unless report
+
+  report.alerts
+        .select(&:benchmark)
+        .map { |alert| RegressionReport.alert(alert.benchmark, alert.measure) }
+        .uniq
+end


Empty ALERTS in a candidate silently clears the confirmation

load_candidate normalises a missing or non-Array alerts value to []. When candidate_alerts is [], actionable_alerts([]) returns [], and confirmed_alerts([], ...) returns [], so confirmation_outcome returns :cleared — even if the confirmation run itself did re-alert. This path can be reached if all active alerts in the initial run lacked a benchmark name (dropped by regressed_alert_pairs), which would cause the suite to be silently cleared instead of treated as inconclusive. Returning [:inconclusive, []] when candidate_alerts is empty after actionable filtering would fail safer, consistent with the "can't match what we can't name" intent stated in the docs.

claude · 2026-06-08T20:53:40Z

+
+  parsed = JSON.parse(File.read(path))
+  alerts = parsed[RegressionReport::ALERTS]
+  alerts = [] unless alerts.is_a?(Array)


When ALERTS is missing from the candidate payload (key absent or non-array), this falls back to []. The problem: confirmation_outcome then calls confirmed_alerts(actionable_alerts([]), ...) which always returns [], and the outcome is :cleared — silently treating the regression as noise.

This is reachable if regressed_alert_pairs returns [] (all Bencher alerts lacked a benchmark name and were dropped by .select(&:benchmark)) while regression?(report) was still true, causing write_candidate to write ALERTS: [].

Consider returning [nil, summary] (inconclusive) instead of [[], summary] when the array is empty/absent — a candidate with no matchable alert pairs cannot be confirmed or cleared on evidence:

Suggested change

alerts = [] unless alerts.is_a?(Array)

alerts = parsed[RegressionReport::ALERTS]

# Treat absent/empty ALERTS as inconclusive: without matchable pairs the

# confirmation can neither confirm nor clear on evidence, only on speculation.

unless alerts.is_a?(Array) && !alerts.empty?

warn "::error::Candidate has empty or missing ALERTS (#{RegressionReport::ALERTS}); " \

"treating the confirmation as inconclusive."

return [nil, parsed[RegressionReport::SUMMARY].to_s]

end

claude · 2026-06-08T20:53:45Z

+# NO alert is an operational failure, not a regression, and still fails the suite.
+def report_main_push_candidate(report, report_markdown, bencher_exit_code, suite_name)
+  if regression?(report)
+    exit 1 unless write_candidate(report, report_markdown, suite_name)


The intent here is that a successful write_candidate makes the initial run non-fatal (exit 0), but this is subtle: when write_candidate returns true the unless is skipped, the function returns normally, the surrounding if main_push? && bencher_exit_code != 0 block ends, and the script falls through to exit 0. A reader seeing the exit 1 on this line may be confused about how a Bencher regression alert produces a green suite.

Consider making the non-fatal exit explicit:

Suggested change

exit 1 unless write_candidate(report, report_markdown, suite_name)

# write_candidate succeeded: the alert is now a non-fatal candidate awaiting

# fresh-runner confirmation. Exit 0 so this suite stays green; report-regressions

# owns the final pass/fail.

exit 1 unless write_candidate(report, report_markdown, suite_name)

# (write_candidate returned true — fall through to exit 0)

claude · 2026-06-08T20:53:54Z

+
+def set_output(name, value)
+  path = ENV.fetch("GITHUB_OUTPUT", nil)
+  File.write(path, "#{name}=#{value}\n", mode: "a") if path && !path.empty?


JSON.generate always produces compact single-line JSON, so there is no newline risk today. For future-proofing, GitHub Actions supports the multiline heredoc format for GITHUB_OUTPUT which handles any embedded newlines:

Suggested change

File.write(path, "#{name}=#{value}\n", mode: "a") if path && !path.empty?

if path && !path.empty?

# Use the multiline heredoc delimiter so embedded newlines in `value`

# (e.g. if JSON ever becomes pretty-printed) don't corrupt the output file.

File.open(path, "a") { |f| f.write("#{name}<<__GHEOF__\n#{value}\n__GHEOF__\n") }

end

Not a bug today, but worth the one-line change to avoid a silent breakage if the matrix JSON ever gains newlines.

claude · 2026-06-08T20:53:59Z

+    runs-on: ubuntu-24.04
+    permissions:
+      contents: read
+      issues: write


issues: write and pull-requests: write are only needed for initial mode (posting PR comments and — indirectly — enabling the downstream report-regressions job to file issues). In confirm mode this job only submits to Bencher and writes artifacts; it never touches issues or PRs directly.

GitHub Actions does not support conditional permission blocks, so both modes get the full set. A short comment here would make this over-grant intentional and auditable:

Suggested change

issues: write

# issues: write and pull-requests: write are used by initial mode only

# (track_benchmarks.rb posts PR summary comments). confirm mode does not post

# to PRs or file issues, but workflow YAML does not support conditional

# permission blocks, so both modes inherit the full set.

issues: write

pull-requests: write

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/spec/track_benchmarks_spec.rb`:
- Around line 187-193: The long test line calling write_candidate with
RegressionReport::ALERTS and RegressionReport::SUMMARY exceeds 120 chars; break
that call into multiple lines using Ruby method continuation (e.g., wrap
arguments in parentheses or pass a multi-line hash) so each hash key-value pair
(RegressionReport::ALERTS => "not an array", RegressionReport::SUMMARY => "first
run") is on its own line and the overall line length is under 120 chars; keep
the surrounding Dir.mktmpdir block and the expect(load_candidate(dir)) assertion
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79d49ece-155d-4367-a262-fa9ace62799f

📥 Commits

Reviewing files that changed from the base of the PR and between acc4f36 and 985121d.

📒 Files selected for processing (6)

benchmarks/plan_confirmation.rb
benchmarks/report_regressions.rb
benchmarks/spec/plan_confirmation_spec.rb
benchmarks/spec/report_regressions_spec.rb
benchmarks/spec/track_benchmarks_spec.rb
benchmarks/track_benchmarks.rb

🚧 Files skipped from review as they are similar to previous changes (4)

benchmarks/spec/plan_confirmation_spec.rb
benchmarks/report_regressions.rb
benchmarks/plan_confirmation.rb
benchmarks/track_benchmarks.rb

coderabbitai · 2026-06-08T21:45:25Z

+      it "treats non-array alerts as inconclusive" do
+        Dir.mktmpdir do |dir|
+          write_candidate(dir, RegressionReport::ALERTS => "not an array", RegressionReport::SUMMARY => "first run")
+
+          expect(load_candidate(dir)).to eq([nil, "first run"])
+        end
+      end


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Break the long line to comply with the 120-character limit.

Line 189 exceeds the 120-character limit specified in the coding guidelines for Ruby code. As per coding guidelines, Ruby code should have a maximum line length of 120 characters and long chains should be broken using method continuation syntax.

📏 Proposed fix to break the long line

it "treats non-array alerts as inconclusive" do Dir.mktmpdir do |dir| - write_candidate(dir, RegressionReport::ALERTS => "not an array", RegressionReport::SUMMARY => "first run") + write_candidate(dir, + RegressionReport::ALERTS => "not an array", + RegressionReport::SUMMARY => "first run") expect(load_candidate(dir)).to eq([nil, "first run"]) end end

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

it "treats non-array alerts as inconclusive" do

Dir.mktmpdir do |dir|

write_candidate(dir, RegressionReport::ALERTS => "not an array", RegressionReport::SUMMARY => "first run")

expect(load_candidate(dir)).to eq([nil, "first run"])

end

end

it "treats non-array alerts as inconclusive" do

Dir.mktmpdir do |dir|

write_candidate(dir,

RegressionReport::ALERTS => "not an array",

RegressionReport::SUMMARY => "first run")

expect(load_candidate(dir)).to eq([nil, "first run"])

end

end

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/spec/track_benchmarks_spec.rb` around lines 187 - 193, The long test line calling write_candidate with RegressionReport::ALERTS and RegressionReport::SUMMARY exceeds 120 chars; break that call into multiple lines using Ruby method continuation (e.g., wrap arguments in parentheses or pass a multi-line hash) so each hash key-value pair (RegressionReport::ALERTS => "not an array", RegressionReport::SUMMARY => "first run") is on its own line and the overall line length is under 120 chars; keep the surrounding Dir.mktmpdir block and the expect(load_candidate(dir)) assertion unchanged.

Source: Coding guidelines

claude · 2026-06-08T21:49:35Z

PR 3810 Code Review - see inline comments for details

claude · 2026-06-08T21:49:49Z

+    strategy:
+      fail-fast: false
+      # Fall back to an empty (but valid) matrix if plan-confirmation was skipped/crashed
+      # before setting the output: its output is then "" and fromJSON("") would error at


The confirm-regressions job carries issues: write and pull-requests: write permissions, but in confirm mode track_benchmarks.rb explicitly skips replace_pr_comments() and never files issues — that is report-regressions's job. These scopes are unnecessary here and widen the blast radius if the runner token is ever misused. Suggest trimming to contents: read + checks: write (Bencher needs checks: write for a synthetic-branch run; no issue/PR scopes required).

claude · 2026-06-08T21:49:56Z

+    uses: ./.github/workflows/benchmark-suite.yml
+    with:
+      matrix_row: ${{ toJSON(matrix) }}
+      mode: confirm


The {"include":[]} fallback is correct but the comment ("crashed before setting output") understates why it's needed. GitHub Actions expands the matrix: expression before evaluating the if: condition, so if confirmation_matrix is empty string the runner would error at matrix-construction time even though the if: gate would skip the job. Worth noting that ordering explicitly — otherwise a future reader might remove the fallback as apparently redundant given the if: guard.

claude · 2026-06-08T21:50:35Z

+      issues: write
+      pull-requests: write


issues: write and pull-requests: write are not needed in confirm mode. In confirm mode track_benchmarks.rb explicitly skips replace_pr_comments() and never calls the GitHub issue API — that is report-regressions's responsibility. Consider trimming to contents: read + checks: write here (Bencher needs checks: write for the synthetic-branch submission; no issue/PR scopes are required).

claude · 2026-06-08T21:50:43Z

+      # Fall back to an empty (but valid) matrix if plan-confirmation was skipped/crashed
+      # before setting the output: its output is then "" and fromJSON("") would error at
+      # matrix-construction time even though the if: gate skips this job. The empty matrix
+      # produces zero jobs; the if: gate is what actually decides whether confirmation runs.
+      matrix: ${{ fromJSON(needs.plan-confirmation.outputs.confirmation_matrix || '{"include":[]}') }}


The fallback is correct and should stay, but the comment only explains what happens ("its output is then \"\""). It's worth also noting why the fallback is required: GitHub Actions expands the matrix: expression before evaluating the if: condition, so a missing or empty confirmation_matrix would cause a matrix-construction error even though the if: gate would have skipped the job. Without that context a future reader may remove the fallback as redundant given the if: guard.

claude · 2026-06-08T21:50:58Z

+  unless plan[:unmatched].empty?
+    warn_unmatched(plan[:unmatched])
+    emit_no_confirmations
+    return false
+  end


Design question worth confirming: when there are both unmatched candidates and confirmable ones, this bails out entirely — no confirmations are emitted, and the workflow goes red without filing an issue. That is conservative and safe, but it means one operationally-broken shard silences a real regression in a different shard. Is that the intended trade-off? If partial confirmation is acceptable, the logic could emit confirmations for the matched rows and only warn (not fail) on the unmatched ones. Either policy is defensible; this just warrants a comment so the intention is clear to future readers.

claude · 2026-06-08T21:52:02Z

Code Review Summary for PR 3810 - Two-stage benchmark regression confirmation pipeline.

MEDIUM: confirm-regressions job has issues:write and pull-requests:write permissions that are not needed in confirm mode. track_benchmarks.rb skips replace_pr_comments() and never touches the issues API when BENCHMARK_MODE=confirm. Least-privilege fix: trim to contents:read + checks:write. See inline comment at lines 238-239 of benchmark.yml.

LOW: The fallback matrix comment (lines 231-235) explains what happens but not why the fallback is required. GitHub Actions expands matrix: expressions before evaluating if: conditions, so a missing output would error at matrix-construction time regardless of the if: gate. Worth one sentence to prevent future removal.

LOW: The Benchmark workflow summary step in benchmark-suite.yml echoes to stdout, not to GITHUB_STEP_SUMMARY. The meaningful confirmation outcome (confirmed/cleared/inconclusive) is correctly written via append_step_summary in track_benchmarks.rb, so this is informational only.

QUESTION: In plan_confirmation.rb lines 158-162, when there are both unmatched candidates AND confirmable ones the function bails entirely. Safe, but one broken shard silences a real regression in another. See inline comment for detail.

POSITIVE: Alert matching fallback is well-reasoned. The --start-point-reset + unique synthetic branch names correctly isolate the confirmation from mains series. Fail-loud philosophy is consistent throughout. Both write_candidate and write_confirmed have explicit true returns. 212 test examples with good coverage of the new code paths.

Overall: well-engineered. The permissions issue is worth fixing before merge; the rest are minor.

…sitives) (#3829) ## Problem ~99% of recent benchmark "performance-regression" alerts on `main` are false positives from a single **orphaned server-side Bencher threshold**. `p90_latency` (and `p99_latency`) were intentionally dropped from `track_benchmarks.rb` `THRESHOLDS` as too noisy ("tail noise can't meet the 1/20 target"). But Bencher thresholds are persistent server-side objects keyed on (branch, testbed, measure) — removing a measure from the CLI `--threshold-*` args stops *updating* the threshold, it never *deletes* it. The p90 metric is still submitted (for the dashboard), so the orphaned p90 threshold keeps evaluating p90 tail noise and firing alerts on nearly every run. These alerts are doubly bad: - They file a `performance-regression` issue (exit≠0), and - They are **invisible in the summary table** — the p90 column has no `:direction`, so it is never flagged 🔴. The filed issue names nothing actionable (this is the #3782 / #3795 "🔴 only appears in the legend" symptom). ### Evidence (public Bencher API, project `react-on-rails-t8a9ncxo`) - 255 most-recent active alerts: **248 p90-latency, 4 rps, 3 p50-latency**. - `main` branch: **164 active alerts → 162 p90-latency, 2 rps** (98.8% p90). - #3782 (`dec4b8c`) filed on exactly two p90 alerts — `/client_side_hello_world: Core` p90 `24.20 > upper 23.61` and `/rendered_html: Core` p90 — with no rps/p50/failed_pct alert. - `main`/`github-actions` carries thresholds for `rps, p50-latency, p90-latency, p99-latency, failed-pct` — two more than the code manages. (p99 is dormant: no p99 metric is submitted.) ## Fix Thread the measures we actually track (`THRESHOLDS` names) into `BencherReport`. An active alert on any **other** measure is classified as *filtered* rather than a regression. This reuses #3822's existing `filtered_alert?` exit-normalization, so a p90-only run no longer writes a candidate or files an issue. The fix is fully in-repo and makes the orphaned threshold harmless regardless of its server-side state. - Tracked measures (`rps`, `p50_latency`, `failed_pct`) are unaffected — including the "hidden" `failed_pct` regressions #3822 added coverage for. - Measure-less and benchmark-less alerts keep their existing fail-safe (still counted). - `tracked_measures` defaults to `nil` (track every measure) so non-production callers (`BenchmarkTable`, specs) are unchanged. ## Validation - `bundle exec rspec benchmarks/spec` — **234 examples, 0 failures** (5 new specs for tracked-measure filtering). - `bundle exec rubocop benchmarks/...` — no offenses. - **End-to-end against the real #3782 reports** (fetched from the Bencher API): `regression?` flips `true → false` (`filtered_alert? = true`) for both the Core and Pro p90 alerts, while rps/p50 alerts are untouched. - `script/ci-changes-detector origin/main` → Benchmark scripts → Lint (Ruby + JS). ## Companion cleanup (separate, manual — not in this PR) This neutralizes the orphaned threshold in code. To also stop it polluting the Bencher dashboard (162 cosmetic active alerts) and being cloned to every new branch, delete the server-side thresholds (needs `BENCHER_API_TOKEN`): - `main` p90 `51fb6a47-0083-4e84-a745-60ee42e3bba4`, p99 `6faa7a68-1835-4cd7-96dc-959220737172` - `master` p90 `d4ad2066-74cb-41f7-93bb-f0885358c56c`, p99 `61bf5ae6-77fa-474f-b6d8-ded630bd0c20` ## Relationship to other work Completes the noise fix that #3810 (fresh-runner confirmation) and #3822 (stale-alert filtering) started: #3822 only filters alerts whose metric *recovered*; a **live** p90 crossing (the dominant case) still passed through. Substantially removes the p90 tail noise behind #3169 and the issue explosion researched in #3755. Refs #3755, #3169, #3795 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Medium Risk** > Changes which Bencher alerts count as regressions and can affect main-push candidate filing and CI exit behavior, though scoped to benchmark reporting with explicit specs and a nil default for other callers. > > **Overview** > Adds optional **`tracked_measures`** to `BencherReport.parse` / `#initialize` so active Bencher alerts on measures the repo no longer tracks (e.g. orphaned server-side **p90_latency** thresholds) are moved to **`filtered_alert?`** instead of **`regression?`**, reusing the existing exit-code normalization for filtered-only runs. > > **`track_benchmarks.rb`** now passes `THRESHOLDS.map(&:first)` when parsing the JSON report, so p90-only false positives no longer write regression candidates or file issues the summary table cannot flag. Tracked measures, measure-less fail-safe alerts, and callers that omit `tracked_measures` stay backward compatible. > > Specs cover orphaned p90 filtering, slug/name normalization, and a small fix for `BencherReport.new` with a Hash root in a perf-links test. > > Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 7693f64. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).   ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Fixed false regression alerts for measures no longer tracked in performance benchmarks. The system now filters regression reports to only flag alerts for actively monitored metrics, preventing issue creation based on orphaned or retired threshold measurements that are no longer part of the current tracking configuration.  Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

## Problem A confirmed `performance-regression` issue currently only shows the per-suite summary tables. Those tables flag 🔴 on **rps** and **p50** but never on **failed_pct** (it has no column — it lives in the "Status" cell) and never on any untracked measure. So a reader has to open the tables to see what regressed, and a `failed_pct`-only regression renders with no visible 🔴 at all — the same "the issue names nothing actionable" problem behind #3782/#3795. ## Fix Add a **"What regressed"** section to the issue body, built from the confirmed `ALERTS` payload (#3810's structured `{benchmark, measure}` pairs), aggregated across all suites and computed once in `report_regressions`: ``` ### What regressed The fresh-runner confirmation re-alerted on these benchmark + measure pairs: - `/client_side_log_throw: Pro` — **rps** - `/server_side_redux_app: Core` — **failed_pct** ``` - Threaded through `report_suite` → `RegressionIssueReporter`; only the reporter that *creates* the issue renders the body, so the cross-suite list is complete. - `regressed_overview` defaults to `""` → the section is omitted and the body is byte-for-byte unchanged, so it composes with the existing flow and with older payloads that lack `ALERTS`. ## Validation - `bundle exec rspec benchmarks/spec` — **233 examples, 0 failures** (4 new: helper rendering/dedup + issue-body inclusion/omission). - `bundle exec rubocop benchmarks/...` — no offenses. - Rendered the body locally with and without pairs to confirm clean Markdown spacing and that the no-pairs body is unchanged. ## Relationship to other work Complements #3829 (filter alerts to tracked measures): #3829 stops the p90 false positives from filing at all; this makes the *real* confirmed regressions self-explanatory in the issue body. Refs #3755. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Low Risk** > CI reporting and GitHub issue markdown only; no runtime app or auth changes. > > **Overview** > Adds a **"What regressed"** section to auto-filed `performance-regression` issue bodies, listing confirmed benchmark + measure pairs as markdown bullets (including measures like **failed_pct** that the per-suite tables do not surface clearly). > > `report_regressions` builds the list once from each confirmed payload's structured `ALERTS` via new `regressed_overview_markdown` (deduped across suites) and passes `regressed_overview` into `RegressionIssueReporter`. The section is omitted when the overview is empty, so older payloads without `ALERTS` keep the previous body shape. > > Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 3068c93. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).   ## Summary by CodeRabbit * **New Features** * Performance regression reports now include a "What regressed" overview section, displaying all confirmed benchmark and measure pairs that regressed across test suites in a single consolidated view. * **Tests** * Added comprehensive test coverage for the new regression overview markdown rendering logic and deduplication.  Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* origin/main: Add Pro license header checker RSC: stop serializing props into embedded payload cache key (#3800) Make PR batches skip customer-feedback issues (#3826) Name the regressed benchmark+measure pairs in the issue body (#3830) Clarify agent batch policy handoffs (#3824) Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) (#3829) Fix auto-bundled component pack normalization (#3818) Filter stale Bencher alerts before reporting (#3822) Tighten benchmark confirmation workflow permissions (#3819) Add issue evaluation skill (#3816) Confirm benchmark regressions on a fresh runner before filing the main issue (#3810) Define agent scope and accelerated RC auto-merge policy (#3808) Replace custom MockClient with async-http Mock::Endpoint (#3703) Docs: per-request data sharing in RSC with React.cache() (#3769) Pro RSC: share unstable_cache across renderer workers via Redis (#3705) [codex] Add PR batch planning skill (#3792) Docs: document PR batch operational lessons (#3789) Document dummy Redux state indexing rationale (#3781) Pro RSC: avoid caching failed Flight renders (#3775) # Conflicts: # packages/react-on-rails-pro/tests/getReactServerComponent.client.test.ts

* origin/main: (23 commits) Enforce Pro license headers in CI and pre-commit (#3821) Add RSC payload route-data helper (#3783) [Pro] Fix React.cache request dedupe in generated RSC configs (#3813) Docs: clarify RuboCop autofix ownership (#3827) Add Pro license header checker RSC: stop serializing props into embedded payload cache key (#3800) Make PR batches skip customer-feedback issues (#3826) Name the regressed benchmark+measure pairs in the issue body (#3830) Clarify agent batch policy handoffs (#3824) Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) (#3829) Fix auto-bundled component pack normalization (#3818) Filter stale Bencher alerts before reporting (#3822) Tighten benchmark confirmation workflow permissions (#3819) Add issue evaluation skill (#3816) Confirm benchmark regressions on a fresh runner before filing the main issue (#3810) Define agent scope and accelerated RC auto-merge policy (#3808) Replace custom MockClient with async-http Mock::Endpoint (#3703) Docs: per-request data sharing in RSC with React.cache() (#3769) Pro RSC: share unstable_cache across renderer workers via Redis (#3705) [codex] Add PR batch planning skill (#3792) ...

## Summary The revert gate for the temporary `/posts_page: Pro` regression-issue suppression has been verified PASSED (gate evidence: #3669 (comment) — the failure-era samples have rolled out of Bencher's 64-run t-test window and the baseline reflects real timings). This PR removes the suppression entry so a confirmed regression on `/posts_page: Pro` files an issue again. This is a **minimal revert**: `IGNORED_REGRESSION_BENCHMARKS` is emptied, but the general ignore machinery is kept (see decision log below). The issue's original "How to revert" steps are partially stale after #3670/#3810; this follows the updated plan. Closes #3669 ## Changes - `benchmarks/lib/regression_report.rb`: `IGNORED_REGRESSION_BENCHMARKS` is now `[].freeze`; the /posts_page-specific temporary-suppression comment block is replaced with a short doc of the general-purpose mechanism (exact Bencher name matching, pre-rerun short-circuit, tracking-issue requirement for any future entry). - Specs that read `IGNORED_REGRESSION_BENCHMARKS.first` (regression_report_spec, plan_confirmation_spec, and the confirmation-outcome example in track_benchmarks_spec) now `stub_const` a sample entry (`"/ignored: Pro"`) so they keep pinning the machinery against the now-empty production list. No coverage was deleted. - `"/posts_page: Pro"` remains in track_benchmarks_spec only as an inert fixture name in report-parsing examples. ## Validation - `bundle exec rspec benchmarks/spec/regression_report_spec.rb benchmarks/spec/plan_confirmation_spec.rb benchmarks/spec/track_benchmarks_spec.rb` (root bundle) → 63 examples, 0 failures - `(cd react_on_rails && BUNDLE_GEMFILE=../Gemfile bundle exec rubocop)` → 214 files, no offenses - `BUNDLE_GEMFILE=Gemfile bundle exec rubocop benchmarks/` → 36 files, no offenses - `script/ci-changes-detector origin/main` → Benchmark scripts; recommended job: Lint (Ruby + JS) - `git grep "3669"` / `git grep "posts_page: Pro"` → no live suppression references remain (fixture-only uses kept) - Codex review: skipped — CLI usage limit reached (resets Jun 10 6:30 PM); self-review performed instead Labels: none — this changes benchmark regression-reporting tooling only, fully validated by its specs; it is not a performance-sensitive runtime path, and a benchmark run would not exercise the suppression branch (it only activates when a Bencher alert fires on a main push). ## Codex Decision Log - **Minimal revert scope (coordinator decision):** emptied the ignore list but KEPT `ignored_benchmark?`, `actionable_alerts`, `actionable_benchmarks`, and plan_confirmation.rb's `fully_ignored?`/suppressed handling. Why: #3810 made the ignore list load-bearing general plumbing in the confirmation gate; full machinery removal is out of scope for #3669. Maintainers may revisit whether the machinery should be removed entirely if no new suppression is ever needed. - **Specs converted to `stub_const`** instead of deleted: the examples previously sampled the live list (`.first`), which is now empty. Stubbing `RegressionReport::IGNORED_REGRESSION_BENCHMARKS` keeps the machinery pinned independent of production contents. Note: track_benchmarks_spec.rb line 332 also sampled the live list (the issue's plan assumed it was fixture-only — stale), so it got the same conversion to avoid a nil-driven false `:confirmed` outcome. - **plan_confirmation.rb comments left untouched:** they document the general mechanism and never claim an active suppression. 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Low Risk** > Benchmark CI reporting tooling only; no runtime app behavior, and specs still cover the ignore path via stubs. > > **Overview** > Reverts the temporary **`/posts_page: Pro`** regression-issue suppression (#3669): **`IGNORED_REGRESSION_BENCHMARKS`** is now **`[]`**, so confirmed regressions on that benchmark can open issues again. The long `/posts_page`-specific comment is replaced with a short description of the general ignore mechanism (exact Bencher names, pre-rerun short-circuit, tracking issue required for future entries). > > Specs that previously used **`IGNORED_REGRESSION_BENCHMARKS.first`** now **`stub_const`** a sample **`"/ignored: Pro"`** so ignore-list behavior stays covered while production has no active suppressions. > > Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a79526d. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).   ## Summary by CodeRabbit * **Chores** * Cleared the regression benchmark suppression list; all confirmed regressions will now be reported. * Updated test setup with deterministic test data to ensure stable regression report behavior.  ## Agent Merge Confidence Mode: accelerated-rc Score: 9/10 Auto-merge recommendation: yes Affected areas: benchmarks regression-reporting tooling (no runtime/perf surface) CI detector: `script/ci-changes-detector origin/main` -> Benchmark scripts; recommended job: Lint (Ruby + JS) Validation run: - `bundle exec rspec benchmarks/spec/regression_report_spec.rb benchmarks/spec/plan_confirmation_spec.rb benchmarks/spec/track_benchmarks_spec.rb` -> 63 examples, 0 failures - `(cd react_on_rails && BUNDLE_GEMFILE=../Gemfile bundle exec rubocop)` -> no offenses; `BUNDLE_GEMFILE=Gemfile bundle exec rubocop benchmarks/` -> no offenses Review/check gate: - Claude review: complete for a79526d (claude-review SUCCESS), no confirmed blocker; one OPTIONAL spec-scoping nit triaged with reply + resolved - Codex review: skipped — CLI usage limit (resets Jun 10 6:30 PM), evidence in PR body; claude-review served as the independent gate - GitHub checks: complete for a79526d, all pass; skips are path-selection per detector Known residual risk: none — suppression-list removal is fully covered by the updated specs; 1-point deduction for the skipped secondary Codex pass Finalized by: Claude Code Review check `claude-review` (SUCCESS for head a79526d, GitHub Checks API record) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

cursor Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread benchmarks/track_benchmarks.rb

coderabbitai Bot requested changes Jun 8, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 8, 2026

View reviewed changes

claude Bot reviewed Jun 8, 2026

View reviewed changes

Treat empty benchmark confirmations as inconclusive

985121d

coderabbitai Bot requested changes Jun 8, 2026

View reviewed changes

claude Bot reviewed Jun 8, 2026

View reviewed changes

justin808 merged commit 25b94cc into main Jun 8, 2026
55 checks passed

justin808 deleted the justin808/fix-issue-3670 branch June 8, 2026 23:30

This was referenced Jun 8, 2026

Gumroad public comparison repo: React 19 + Rspack + React on Rails Pro RSC findings #3128

Open

Follow-up: Tighten benchmark CI workflow permissions (PR #3810) #3815

Closed

Revert the temporary /posts_page: Pro regression-issue suppression #3669

Closed

This was referenced Jun 9, 2026

Tighten benchmark confirmation workflow permissions #3819

Merged

Filter stale Bencher alerts before reporting #3822

Merged

This was referenced Jun 9, 2026

Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) #3829

Merged

Performance Regression Detected on main (24205ff) #3828

Closed

Performance Regression Detected on main (2c2cbb7) #3807

Closed

justin808 mentioned this pull request Jun 9, 2026

Research performance-regression issue posting explosion #3755

Closed

justin808 mentioned this pull request Jun 10, 2026

Remove temporary /posts_page: Pro regression-issue suppression (#3669 revert) #3853

Merged

coderabbitai Bot mentioned this pull request Jun 22, 2026

Split benchmark tracking script #4176

Merged

-  alerts = [] unless alerts.is_a?(Array)
+  alerts = parsed[RegressionReport::ALERTS]
+  # Treat absent/empty ALERTS as inconclusive: without matchable pairs the
+  # confirmation can neither confirm nor clear on evidence, only on speculation.
+  unless alerts.is_a?(Array) && !alerts.empty?
+    warn "::error::Candidate has empty or missing ALERTS (#{RegressionReport::ALERTS}); " \
+         "treating the confirmation as inconclusive."
+    return [nil, parsed[RegressionReport::SUMMARY].to_s]
+  end

-    exit 1 unless write_candidate(report, report_markdown, suite_name)
+    # write_candidate succeeded: the alert is now a non-fatal candidate awaiting
+    # fresh-runner confirmation. Exit 0 so this suite stays green; report-regressions
+    # owns the final pass/fail.
+    exit 1 unless write_candidate(report, report_markdown, suite_name)
+    # (write_candidate returned true — fall through to exit 0)

-  File.write(path, "#{name}=#{value}\n", mode: "a") if path && !path.empty?
+  if path && !path.empty?
+    # Use the multiline heredoc delimiter so embedded newlines in `value`
+    # (e.g. if JSON ever becomes pretty-printed) don't corrupt the output file.
+    File.open(path, "a") { |f| f.write("#{name}<<__GHEOF__\n#{value}\n__GHEOF__\n") }
+  end

-      issues: write
+      # issues: write and pull-requests: write are used by initial mode only
+      # (track_benchmarks.rb posts PR summary comments). confirm mode does not post
+      # to PRs or file issues, but workflow YAML does not support conditional
+      # permission blocks, so both modes inherit the full set.
+      issues: write
+      pull-requests: write

Uh oh!

Uh oh!

Conversation

justin808 commented Jun 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pipeline (benchmark.yml)

Behavior

Key implementation notes

Out of scope (sequenced follow-up, per #3670)

Known tradeoff (acknowledged in the issue)

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

claude Bot commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 8, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

justin808 commented Jun 8, 2026 •

edited by coderabbitai Bot

Loading

Pipeline (`benchmark.yml`)

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading