Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) by justin808 · Pull Request #4013 · shakacode/react_on_rails

justin808 · 2026-06-14T03:10:59Z

Problem

Benchmarks dominate PR CI time — far more than any other check. Measured on real runs:

Each benchmark suite/shard job runs ~24–27 min, and a PR runs three of them (Core + Pro shard 1/2 + Pro shard 2/2) → ~76 billable minutes/PR.
Step-level timing of a 23.5-min Core job: "Execute benchmark suite" = 21.7 min (92%); all checkout/build/bundle/asset/boot/Bencher steps combined are ~1.8 min. It's the load test itself, not setup.
Cause: the suite benchmarks ~37 routes/shard serially, each route = ~5s warm-up (10 requests × sleep 0.5) + a 30s k6 load test ≈ 35s/route.

Crucially, per-PR benchmarks are informational only — the regression gate, confirmation rerun, and issue filing are all push-to-main only (benchmark.yml). On a PR the job just posts a summary comment, and single-run deltas on shared CI runners are noise-dominated (the thresholds are deliberately wide and correctly refuse to flag them). So we spent ~76 min/PR to produce a comment no one can act on with confidence.

Changes

Per-PR benchmarks are now opt-in. suite_requested_by_event? selects suites on a PR only when a benchmark* label is present (benchmark, benchmark-core, benchmark-pro, benchmark-pro-node-renderer). Change detection (run_output / RUN_*_BENCHMARKS) no longer auto-triggers them. push to main and workflow_dispatch are unchanged, so main-branch tracking and the confirmation/issue-filing gate are untouched. Removed the now-dead run_output keys and the RUN_*_BENCHMARKS workflow env plumbing that fed them.
Trimmed the per-route warm-up from 10×0.5s (5s/route) to 3×0.2s (~1s/route) in benchmarks/bench.rb. At ~37 routes/shard the old warm-up was pure CI time with no effect on the measured 30s k6 window; the load test self-warms remaining workers in its first fraction of a second.

Anyone wanting a perf check on a specific PR adds the benchmark (or a suite-specific) label.

Not in scope (deliberately)

Shortening the 30s k6 DURATION — widens per-run noise on slow routes; needs threshold re-validation (Benchmark CI: tune thresholds / reduce flakiness so the main gate can be re-enabled #3169).
More sharding — cuts wall-clock but not billable minutes, and worsens runner-queue contention (shards already queue 60+ min on busy days).

Testing

bundle exec rspec benchmarks/spec → 271 examples, 0 failures (updated the one matrix test that asserted the old change-detection trigger).
bundle exec rubocop benchmarks/... → clean.
actionlint .github/workflows/benchmark.yml → clean; YAML parses.
Verified matrix selection end-to-end: PR + no label (change-detection on) → ["none"]; PR + benchmark label → all suites; push → all suites.

Docs updated: AGENTS.md PR-label guidance and the full-ci-no-benchmarks suppress-label comment.

Labels: full-ci — this is a CI-workflow/detector change, so run the broad test matrix; it does not need benchmarks.

Merge criteria (self-merge by Claude Code)

I'll squash-merge this once all of the following hold:

Base is healthy — main is not red at merge time (not stacking onto a broken base).
All non-skipped PR checks are green — the full-ci test matrix, lint/RuboCop, and the Benchmark Workflow's detect-changes (which runs bundle exec rspec benchmarks/spec) all conclude success. The only acceptable non-success states are the benchmark suite jobs being skipped (see Add linting and CI scripts #3) and the superseded push-run jobs that concurrency cancelled.
The behavioral change is validated on this PR — the Core/Pro benchmarks … run-suite jobs are absent/skipped because this PR has no benchmark* label, proving the opt-in gate works. ✅ already confirmed.
No unresolved review feedback requesting changes, and mergeable stays true.

Not required: an actual labeled benchmark suite run on this PR. The matrix-selection logic and the warm-up code path are covered by benchmarks/spec (271 examples), every selection scenario was verified locally, and the next main push exercises the full suite + new warm-up end-to-end — spending ~67 min to run a suite here would re-introduce the exact cost this PR removes.

Fixes #4012

🤖 Generated with Claude Code

Benchmarks dominated PR CI time at ~76 billable min/PR (three ~24-27 min suite jobs), and step timing showed 92% of each job was the load test itself, not build/setup. Per-PR benchmark runs are informational only — the regression gate, confirmation rerun, and issue filing are all main-push only — and single-run deltas on shared CI runners are noise-dominated, so auto-running the full suite on every code change cost far more than it informed. 1. Per-PR benchmarks are now opt-in. suite_requested_by_event? selects suites on a PR only when a benchmark* label is present; change detection (run_output / RUN_*_BENCHMARKS) no longer auto-triggers them. push to main and workflow_dispatch are unchanged, so main-branch tracking and the confirmation/issue-filing gate are untouched. Removed the now-dead run_output keys and the RUN_*_BENCHMARKS workflow plumbing that fed them. 2. Trimmed the per-route warm-up from 10x0.5s (5s/route) to 3x0.2s (~1s/route). At ~37 routes/shard the old warm-up was pure CI time with no effect on the measured 30s k6 window; the load test self-warms remaining workers. Updated AGENTS.md PR-label guidance, the suppress-label comment, and the matrix spec to match. Fixes #4012 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-14T03:11:04Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

coderabbitai · 2026-06-14T03:11:41Z

Walkthrough

Per-PR benchmark suite selection is changed from change-detection-driven auto-inclusion to label-only opt-in. The run_output fields are removed from all SUITES entries in generate_matrix.rb, the suite_requested_by_event? PR path now checks only labels, the corresponding RUN_*_BENCHMARKS env vars are removed from the workflow, and per-route warm-up is trimmed from 10×0.5s to 3×0.2s.

Changes

Benchmark opt-in and warm-up trim

Layer / File(s)	Summary
Suite selection: remove `run_output` and label-only PR gating `benchmarks/generate_matrix.rb`, `benchmarks/spec/benchmark_matrix_spec.rb`	Removes `run_output` keys from all three `SUITES` entries; updates `suite_requested_by_event?` so `pull_request` events select suites only by label intersection (no `truthy_env?`/run-output path). Replaces the spec that expected Pro suites to run on `RUN_PRO_BENCHMARKS` alone with one asserting `suite_id: "none"` when no benchmark label is present.
*Workflow: remove `RUN__BENCHMARKS` env vars and update comments** `.github/workflows/benchmark.yml`	Removes `RUN_CORE_BENCHMARKS`, `RUN_PRO_BENCHMARKS`, and `RUN_PRO_NODE_RENDERER_BENCHMARKS` from the benchmark-matrices step. Updates `detect-changes` and suite-trigger comments to describe label-driven opt-in for PRs and non-fatal main-push alerts.
Warm-up reduction and docs update `benchmarks/bench.rb`, `AGENTS.md`	Reduces per-route warm-up in `benchmark_route` from 10 requests × `sleep 0.5` to 3 requests × `sleep 0.2`. Updates `AGENTS.md` with expanded `benchmark`/`benchmark-*` label descriptions and `full-ci-no-benchmarks` override semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

shakacode/react_on_rails#3950: Modifies benchmarks/generate_matrix.rb suite selection logic for pull_request events, including label-driven gating and full-ci-no-benchmarks suppression — the same code paths changed in this PR.
shakacode/react_on_rails#3747: Introduced the detect-changes → RUN_*_BENCHMARKS env var wiring in .github/workflows/benchmark.yml that this PR removes.
shakacode/react_on_rails#3528: Unified benchmark matrix generation in generate_matrix.rb, establishing the SUITES/suite_requested_by_event? structure that this PR modifies.

Suggested labels

enhancement, benchmark, documentation

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the two main changes: making per-PR benchmarks opt-in via labels and reducing per-route warm-up overhead.
Linked Issues check	✅ Passed	All code changes directly address issue `#4012`'s objectives: opt-in PR benchmarks via label selection logic and warm-up reduction from 10×0.5s to 3×0.2s.
Out of Scope Changes check	✅ Passed	All changes are scoped to the linked issue: benchmark workflow configuration, documentation, matrix generation logic, warm-up timing, and related tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jg/4012-benchmark-pr-opt-in

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-06-14T03:13:10Z

Greptile Summary

This PR makes per-PR benchmarks opt-in via a benchmark* label (replacing automatic change-detection triggering), and trims the per-route warm-up from 10×0.5s to 3×0.2s in bench.rb. Push-to-main and workflow_dispatch paths are unchanged, preserving the regression gate.

Removed run_output keys from all three suite definitions and the corresponding RUN_*_BENCHMARKS env vars from the workflow, replacing the truthy_env? check in suite_requested_by_event? with a label-only check for PRs.
Warm-up trimmed from 5s/route to ~0.6s/route with a clear comment explaining that the 30s k6 window absorbs first-request latency.
One spec updated to assert the new opt-in behavior; two other suppression-label tests retain now-dead RUN_*_BENCHMARKS env vars that are harmless but stale.

Confidence Score: 4/5

Safe to merge — the main-branch regression gate is untouched and the opt-in mechanism is well-tested; the only rough edge is a couple of stale env vars in two spec examples.

The core logic change — removing truthy_env? + run_output from PR suite selection — is small, clearly correct, and directly validated by the updated spec. The warm-up reduction is straightforward arithmetic. The only rough edge is that two existing suppression-label tests still pass RUN_*_BENCHMARKS env vars that generate_matrix.rb no longer reads; those vars are now silent no-ops, so the tests pass but their descriptions and setup are misleading for future readers.

benchmarks/spec/benchmark_matrix_spec.rb — the two full-ci-no-benchmarks suppression examples still reference the removed RUN_* env vars.

Important Files Changed

Filename	Overview
.github/workflows/benchmark.yml	Removed three dead RUN_*_BENCHMARKS env vars from the Set benchmark matrices step and updated comments to reflect the new opt-in model; no logic changed in the workflow itself.
AGENTS.md	Documentation updated to reflect opt-in benchmark labels and clarify full-ci-no-benchmarks is now an override for an explicit benchmark* label rather than a change-detection override.
benchmarks/bench.rb	Trimmed per-route warm-up from 10×0.5s (5s) to 3×0.2s (~0.6s); comment explains rationale that the 30s k6 window self-warms remaining workers.
benchmarks/generate_matrix.rb	Removed run_output keys from all three SUITES and replaced truthy_env?(suite.fetch(:run_output))
benchmarks/spec/benchmark_matrix_spec.rb	Updated one test description to document the new opt-in behavior, but two existing suppression-label tests still pass RUN_*_BENCHMARKS env vars that are now no-ops, leaving stale setup and a misleading description.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Benchmark workflow triggered] --> B{Event type?}
    B -->|push to main| C[All suites selected]
    B -->|workflow_dispatch| C
    B -->|pull_request| D{Same-repo PR?}
    D -->|No - fork| E[No suites - placeholder row]
    D -->|Yes| F{full-ci-no-benchmarks label?}
    F -->|Yes| E
    F -->|No| G{BENCHMARK_NON_RUNTIME_ONLY?}
    G -->|true| E
    G -->|false| H{benchmark* label present?}
    H -->|No label| E
    H -->|benchmark| I[All suites selected]
    H -->|benchmark-core| J[Core suite only]
    H -->|benchmark-pro| K[Pro + Pro Node Renderer suites]
    H -->|benchmark-pro-node-renderer| L[Pro Node Renderer suite only]
    C --> M[suite_enabled? filtered by app_version]
    I --> M
    J --> M
    K --> M
    L --> M
    M --> N[build_matrix: shard rows emitted]
    E --> O[empty_matrix_row: suite_id=none]

Comments Outside Diff (1)

benchmarks/spec/benchmark_matrix_spec.rb, line 87-116 (link)

Stale env vars and misleading description in suppression tests

Two examples still pass RUN_CORE_BENCHMARKS, RUN_PRO_BENCHMARKS, and RUN_PRO_NODE_RENDERER_BENCHMARKS as env vars, but generate_matrix.rb no longer reads them (the run_output key and its truthy_env? call were removed). Those vars are now silent no-ops, so the setup doesn't actually exercise the case described by "suppresses every suite even when change detection requested them." A reader will wonder why those vars are there, and a future developer may mistakenly believe removing them would break something. The descriptions in both examples should be updated and the dead env-var lines dropped.

_{Reviews (1): Last reviewed commit: "Make per-PR benchmarks opt-in and trim p..." | Re-trigger Greptile}

…error-callbacks * origin/main: Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) # Conflicts: # llms-full.txt

* origin/main: Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) # Conflicts: # llms-full.txt

…t192-phase1 * origin/main: (43 commits) Split llms-full.txt into OSS and Pro tiers to clear the 2048 KiB gate (#4021) Codify maintainer attention contract (#3987) Docs: correct vm.Script caching analysis caveats (#3997) CI: handle unavailable PR head repo for +ci-run-full (#3986) Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) Docs: add OSS flagship demo to examples gallery (closes #3876) (#3959) Docs: rewrite RSC CSS guide with FOUC pipeline and per-approach sections (#3992) Docs: add 21 animated SVG diagrams for React Server Components (#3806) Fix Rspack generator: missing @rspack/dev-server and wrong ReactRefreshPlugin import (#3926) Docs: add React Performance Tracks profiling guide (#3961) Add cross-agent coordination backend workflow (#3977) Validate llms sidebar coverage and hosted pointers (#3960) ...

…e-demo-cost-docs * origin/main: Add RSC FOUC acceptance coverage (#4033) Keep plan-pr-batch goal prompts under 4000 chars (#4025) docs(skills): file-collision check + goal-prompt size discipline for plan-pr-batch (#3914) Verify React 19.2 <Activity> with react_component (CSR + SSR-hydrate) + docs (#3938) Allow Pro RSC peer check for React 19.2 (#4026) Split llms-full.txt into OSS and Pro tiers to clear the 2048 KiB gate (#4021) Codify maintainer attention contract (#3987) Docs: correct vm.Script caching analysis caveats (#3997) CI: handle unavailable PR head repo for +ci-run-full (#3986) Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)

justin808 added the full-ci label Jun 14, 2026

coderabbitai Bot approved these changes Jun 14, 2026

View reviewed changes

justin808 merged commit 93fa46b into main Jun 14, 2026
23 of 26 checks passed

justin808 deleted the jg/4012-benchmark-pr-opt-in branch June 14, 2026 03:47

justin808 mentioned this pull request Jun 14, 2026

CI: handle unavailable PR head repo for +ci-run-full #3986

Merged

justin808 mentioned this pull request Jun 14, 2026

Release gate: react_on_rails 17.0.0 #3823

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make per-PR benchmarks opt-in and trim per-route warm-up (#4012)#4013

Make per-PR benchmarks opt-in and trim per-route warm-up (#4012)#4013
justin808 merged 1 commit into
mainfrom
jg/4012-benchmark-pr-opt-in

justin808 commented Jun 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 14, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

justin808 commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Not in scope (deliberately)

Testing

Merge criteria (self-merge by Claude Code)

Uh oh!

chatgpt-codex-connector Bot commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

greptile-apps Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justin808 commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

greptile-apps Bot commented Jun 14, 2026 •

edited

Loading