Skip to content

Make per-PR benchmarks opt-in and trim per-route warm-up (#4012)#4013

Merged
justin808 merged 1 commit into
mainfrom
jg/4012-benchmark-pr-opt-in
Jun 14, 2026
Merged

Make per-PR benchmarks opt-in and trim per-route warm-up (#4012)#4013
justin808 merged 1 commit into
mainfrom
jg/4012-benchmark-pr-opt-in

Conversation

@justin808

@justin808 justin808 commented Jun 14, 2026

Copy link
Copy Markdown
Member

Problem

Benchmarks dominate PR CI time — far more than any other check. Measured on real runs:

  • Each benchmark suite/shard job runs ~24–27 min, and a PR runs three of them (Core + Pro shard 1/2 + Pro shard 2/2) → ~76 billable minutes/PR.
  • Step-level timing of a 23.5-min Core job: "Execute benchmark suite" = 21.7 min (92%); all checkout/build/bundle/asset/boot/Bencher steps combined are ~1.8 min. It's the load test itself, not setup.
  • Cause: the suite benchmarks ~37 routes/shard serially, each route = ~5s warm-up (10 requests × sleep 0.5) + a 30s k6 load test ≈ 35s/route.

Crucially, per-PR benchmarks are informational only — the regression gate, confirmation rerun, and issue filing are all push-to-main only (benchmark.yml). On a PR the job just posts a summary comment, and single-run deltas on shared CI runners are noise-dominated (the thresholds are deliberately wide and correctly refuse to flag them). So we spent ~76 min/PR to produce a comment no one can act on with confidence.

Changes

  1. Per-PR benchmarks are now opt-in. suite_requested_by_event? selects suites on a PR only when a benchmark* label is present (benchmark, benchmark-core, benchmark-pro, benchmark-pro-node-renderer). Change detection (run_output / RUN_*_BENCHMARKS) no longer auto-triggers them. push to main and workflow_dispatch are unchanged, so main-branch tracking and the confirmation/issue-filing gate are untouched. Removed the now-dead run_output keys and the RUN_*_BENCHMARKS workflow env plumbing that fed them.
  2. Trimmed the per-route warm-up from 10×0.5s (5s/route) to 3×0.2s (~1s/route) in benchmarks/bench.rb. At ~37 routes/shard the old warm-up was pure CI time with no effect on the measured 30s k6 window; the load test self-warms remaining workers in its first fraction of a second.

Anyone wanting a perf check on a specific PR adds the benchmark (or a suite-specific) label.

Not in scope (deliberately)

Testing

  • bundle exec rspec benchmarks/spec → 271 examples, 0 failures (updated the one matrix test that asserted the old change-detection trigger).
  • bundle exec rubocop benchmarks/... → clean.
  • actionlint .github/workflows/benchmark.yml → clean; YAML parses.
  • Verified matrix selection end-to-end: PR + no label (change-detection on) → ["none"]; PR + benchmark label → all suites; push → all suites.

Docs updated: AGENTS.md PR-label guidance and the full-ci-no-benchmarks suppress-label comment.

Labels: full-ci — this is a CI-workflow/detector change, so run the broad test matrix; it does not need benchmarks.

Merge criteria (self-merge by Claude Code)

I'll squash-merge this once all of the following hold:

  1. Base is healthymain is not red at merge time (not stacking onto a broken base).
  2. All non-skipped PR checks are green — the full-ci test matrix, lint/RuboCop, and the Benchmark Workflow's detect-changes (which runs bundle exec rspec benchmarks/spec) all conclude success. The only acceptable non-success states are the benchmark suite jobs being skipped (see Add linting and CI scripts #3) and the superseded push-run jobs that concurrency cancelled.
  3. The behavioral change is validated on this PR — the Core/Pro benchmarks … run-suite jobs are absent/skipped because this PR has no benchmark* label, proving the opt-in gate works. ✅ already confirmed.
  4. No unresolved review feedback requesting changes, and mergeable stays true.

Not required: an actual labeled benchmark suite run on this PR. The matrix-selection logic and the warm-up code path are covered by benchmarks/spec (271 examples), every selection scenario was verified locally, and the next main push exercises the full suite + new warm-up end-to-end — spending ~67 min to run a suite here would re-introduce the exact cost this PR removes.

Fixes #4012

🤖 Generated with Claude Code

Benchmarks dominated PR CI time at ~76 billable min/PR (three ~24-27 min suite
jobs), and step timing showed 92% of each job was the load test itself, not
build/setup. Per-PR benchmark runs are informational only — the regression gate,
confirmation rerun, and issue filing are all main-push only — and single-run
deltas on shared CI runners are noise-dominated, so auto-running the full suite
on every code change cost far more than it informed.

1. Per-PR benchmarks are now opt-in. suite_requested_by_event? selects suites on
   a PR only when a benchmark* label is present; change detection (run_output /
   RUN_*_BENCHMARKS) no longer auto-triggers them. push to main and
   workflow_dispatch are unchanged, so main-branch tracking and the
   confirmation/issue-filing gate are untouched. Removed the now-dead run_output
   keys and the RUN_*_BENCHMARKS workflow plumbing that fed them.

2. Trimmed the per-route warm-up from 10x0.5s (5s/route) to 3x0.2s (~1s/route).
   At ~37 routes/shard the old warm-up was pure CI time with no effect on the
   measured 30s k6 window; the load test self-warms remaining workers.

Updated AGENTS.md PR-label guidance, the suppress-label comment, and the matrix
spec to match.

Fixes #4012

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Per-PR benchmark suite selection is changed from change-detection-driven auto-inclusion to label-only opt-in. The run_output fields are removed from all SUITES entries in generate_matrix.rb, the suite_requested_by_event? PR path now checks only labels, the corresponding RUN_*_BENCHMARKS env vars are removed from the workflow, and per-route warm-up is trimmed from 10×0.5s to 3×0.2s.

Changes

Benchmark opt-in and warm-up trim

Layer / File(s) Summary
Suite selection: remove run_output and label-only PR gating
benchmarks/generate_matrix.rb, benchmarks/spec/benchmark_matrix_spec.rb
Removes run_output keys from all three SUITES entries; updates suite_requested_by_event? so pull_request events select suites only by label intersection (no truthy_env?/run-output path). Replaces the spec that expected Pro suites to run on RUN_PRO_BENCHMARKS alone with one asserting suite_id: "none" when no benchmark label is present.
Workflow: remove RUN_*_BENCHMARKS env vars and update comments
.github/workflows/benchmark.yml
Removes RUN_CORE_BENCHMARKS, RUN_PRO_BENCHMARKS, and RUN_PRO_NODE_RENDERER_BENCHMARKS from the benchmark-matrices step. Updates detect-changes and suite-trigger comments to describe label-driven opt-in for PRs and non-fatal main-push alerts.
Warm-up reduction and docs update
benchmarks/bench.rb, AGENTS.md
Reduces per-route warm-up in benchmark_route from 10 requests × sleep 0.5 to 3 requests × sleep 0.2. Updates AGENTS.md with expanded benchmark/benchmark-* label descriptions and full-ci-no-benchmarks override semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • shakacode/react_on_rails#3950: Modifies benchmarks/generate_matrix.rb suite selection logic for pull_request events, including label-driven gating and full-ci-no-benchmarks suppression — the same code paths changed in this PR.
  • shakacode/react_on_rails#3747: Introduced the detect-changesRUN_*_BENCHMARKS env var wiring in .github/workflows/benchmark.yml that this PR removes.
  • shakacode/react_on_rails#3528: Unified benchmark matrix generation in generate_matrix.rb, establishing the SUITES/suite_requested_by_event? structure that this PR modifies.

Suggested labels

enhancement, benchmark, documentation

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the two main changes: making per-PR benchmarks opt-in via labels and reducing per-route warm-up overhead.
Linked Issues check ✅ Passed All code changes directly address issue #4012's objectives: opt-in PR benchmarks via label selection logic and warm-up reduction from 10×0.5s to 3×0.2s.
Out of Scope Changes check ✅ Passed All changes are scoped to the linked issue: benchmark workflow configuration, documentation, matrix generation logic, warm-up timing, and related tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jg/4012-benchmark-pr-opt-in

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown

Greptile Summary

This PR makes per-PR benchmarks opt-in via a benchmark* label (replacing automatic change-detection triggering), and trims the per-route warm-up from 10×0.5s to 3×0.2s in bench.rb. Push-to-main and workflow_dispatch paths are unchanged, preserving the regression gate.

  • Removed run_output keys from all three suite definitions and the corresponding RUN_*_BENCHMARKS env vars from the workflow, replacing the truthy_env? check in suite_requested_by_event? with a label-only check for PRs.
  • Warm-up trimmed from 5s/route to ~0.6s/route with a clear comment explaining that the 30s k6 window absorbs first-request latency.
  • One spec updated to assert the new opt-in behavior; two other suppression-label tests retain now-dead RUN_*_BENCHMARKS env vars that are harmless but stale.

Confidence Score: 4/5

Safe to merge — the main-branch regression gate is untouched and the opt-in mechanism is well-tested; the only rough edge is a couple of stale env vars in two spec examples.

The core logic change — removing truthy_env? + run_output from PR suite selection — is small, clearly correct, and directly validated by the updated spec. The warm-up reduction is straightforward arithmetic. The only rough edge is that two existing suppression-label tests still pass RUN_*_BENCHMARKS env vars that generate_matrix.rb no longer reads; those vars are now silent no-ops, so the tests pass but their descriptions and setup are misleading for future readers.

benchmarks/spec/benchmark_matrix_spec.rb — the two full-ci-no-benchmarks suppression examples still reference the removed RUN_* env vars.

Important Files Changed

Filename Overview
.github/workflows/benchmark.yml Removed three dead RUN_*_BENCHMARKS env vars from the Set benchmark matrices step and updated comments to reflect the new opt-in model; no logic changed in the workflow itself.
AGENTS.md Documentation updated to reflect opt-in benchmark labels and clarify full-ci-no-benchmarks is now an override for an explicit benchmark* label rather than a change-detection override.
benchmarks/bench.rb Trimmed per-route warm-up from 10×0.5s (5s) to 3×0.2s (~0.6s); comment explains rationale that the 30s k6 window self-warms remaining workers.
benchmarks/generate_matrix.rb Removed run_output keys from all three SUITES and replaced truthy_env?(suite.fetch(:run_output))
benchmarks/spec/benchmark_matrix_spec.rb Updated one test description to document the new opt-in behavior, but two existing suppression-label tests still pass RUN_*_BENCHMARKS env vars that are now no-ops, leaving stale setup and a misleading description.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Benchmark workflow triggered] --> B{Event type?}
    B -->|push to main| C[All suites selected]
    B -->|workflow_dispatch| C
    B -->|pull_request| D{Same-repo PR?}
    D -->|No - fork| E[No suites - placeholder row]
    D -->|Yes| F{full-ci-no-benchmarks label?}
    F -->|Yes| E
    F -->|No| G{BENCHMARK_NON_RUNTIME_ONLY?}
    G -->|true| E
    G -->|false| H{benchmark* label present?}
    H -->|No label| E
    H -->|benchmark| I[All suites selected]
    H -->|benchmark-core| J[Core suite only]
    H -->|benchmark-pro| K[Pro + Pro Node Renderer suites]
    H -->|benchmark-pro-node-renderer| L[Pro Node Renderer suite only]
    C --> M[suite_enabled? filtered by app_version]
    I --> M
    J --> M
    K --> M
    L --> M
    M --> N[build_matrix: shard rows emitted]
    E --> O[empty_matrix_row: suite_id=none]
Loading

Comments Outside Diff (1)

  1. benchmarks/spec/benchmark_matrix_spec.rb, line 87-116 (link)

    P2 Stale env vars and misleading description in suppression tests

    Two examples still pass RUN_CORE_BENCHMARKS, RUN_PRO_BENCHMARKS, and RUN_PRO_NODE_RENDERER_BENCHMARKS as env vars, but generate_matrix.rb no longer reads them (the run_output key and its truthy_env? call were removed). Those vars are now silent no-ops, so the setup doesn't actually exercise the case described by "suppresses every suite even when change detection requested them." A reader will wonder why those vars are there, and a future developer may mistakenly believe removing them would break something. The descriptions in both examples should be updated and the dead env-var lines dropped.

Reviews (1): Last reviewed commit: "Make per-PR benchmarks opt-in and trim p..." | Re-trigger Greptile

@justin808 justin808 merged commit 93fa46b into main Jun 14, 2026
23 of 26 checks passed
@justin808 justin808 deleted the jg/4012-benchmark-pr-opt-in branch June 14, 2026 03:47
justin808 added a commit that referenced this pull request Jun 14, 2026
…error-callbacks

* origin/main:
  Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013)
  Treat docs-internal/ tree as documentation in CI change detection (#4016)
  Add issue triage prompt skill (#3983)
  Point coordination docs at agent-coord bootstrap (#4008)
  Docs: backfill async RSC manifest changelog entry (#3993)
  Migrate conductor.json to .conductor/settings.toml (#4007)
  Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)

# Conflicts:
#	llms-full.txt
justin808 added a commit that referenced this pull request Jun 14, 2026
* origin/main:
  Fix Pro dummy lockfile drift and rich text demo (#3989)
  Add FOUC integration tests for generated CSS (#4005)
  Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013)
  Treat docs-internal/ tree as documentation in CI change detection (#4016)
  Add issue triage prompt skill (#3983)
  Point coordination docs at agent-coord bootstrap (#4008)
  Docs: backfill async RSC manifest changelog entry (#3993)
  Migrate conductor.json to .conductor/settings.toml (#4007)
  Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)

# Conflicts:
#	llms-full.txt
justin808 added a commit that referenced this pull request Jun 14, 2026
* origin/main:
  Fix Pro dummy lockfile drift and rich text demo (#3989)
  Add FOUC integration tests for generated CSS (#4005)
  Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013)
  Treat docs-internal/ tree as documentation in CI change detection (#4016)
  Add issue triage prompt skill (#3983)
  Point coordination docs at agent-coord bootstrap (#4008)
  Docs: backfill async RSC manifest changelog entry (#3993)
  Migrate conductor.json to .conductor/settings.toml (#4007)
  Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)

# Conflicts:
#	llms-full.txt
justin808 added a commit that referenced this pull request Jun 14, 2026
…t192-phase1

* origin/main: (43 commits)
  Split llms-full.txt into OSS and Pro tiers to clear the 2048 KiB gate (#4021)
  Codify maintainer attention contract (#3987)
  Docs: correct vm.Script caching analysis caveats (#3997)
  CI: handle unavailable PR head repo for +ci-run-full (#3986)
  Fix Pro dummy lockfile drift and rich text demo (#3989)
  Add FOUC integration tests for generated CSS (#4005)
  Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013)
  Treat docs-internal/ tree as documentation in CI change detection (#4016)
  Add issue triage prompt skill (#3983)
  Point coordination docs at agent-coord bootstrap (#4008)
  Docs: backfill async RSC manifest changelog entry (#3993)
  Migrate conductor.json to .conductor/settings.toml (#4007)
  Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)
  Docs: add OSS flagship demo to examples gallery (closes #3876) (#3959)
  Docs: rewrite RSC CSS guide with FOUC pipeline and per-approach sections (#3992)
  Docs: add 21 animated SVG diagrams for React Server Components (#3806)
  Fix Rspack generator: missing @rspack/dev-server and wrong ReactRefreshPlugin import (#3926)
  Docs: add React Performance Tracks profiling guide (#3961)
  Add cross-agent coordination backend workflow (#3977)
  Validate llms sidebar coverage and hosted pointers (#3960)
  ...
justin808 added a commit that referenced this pull request Jun 15, 2026
…e-demo-cost-docs

* origin/main:
  Add RSC FOUC acceptance coverage (#4033)
  Keep plan-pr-batch goal prompts under 4000 chars (#4025)
  docs(skills): file-collision check + goal-prompt size discipline for plan-pr-batch (#3914)
  Verify React 19.2 <Activity> with react_component (CSR + SSR-hydrate) + docs (#3938)
  Allow Pro RSC peer check for React 19.2 (#4026)
  Split llms-full.txt into OSS and Pro tiers to clear the 2048 KiB gate (#4021)
  Codify maintainer attention contract (#3987)
  Docs: correct vm.Script caching analysis caveats (#3997)
  CI: handle unavailable PR head repo for +ci-run-full (#3986)
  Fix Pro dummy lockfile drift and rich text demo (#3989)
  Add FOUC integration tests for generated CSS (#4005)
  Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013)
  Treat docs-internal/ tree as documentation in CI change detection (#4016)
  Add issue triage prompt skill (#3983)
  Point coordination docs at agent-coord bootstrap (#4008)
  Docs: backfill async RSC manifest changelog entry (#3993)
  Migrate conductor.json to .conductor/settings.toml (#4007)
  Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmarks dominate PR CI time (~76 billable min/PR); make per-PR runs opt-in

1 participant