Make per-PR benchmarks opt-in and trim per-route warm-up (#4012)#4013
Conversation
Benchmarks dominated PR CI time at ~76 billable min/PR (three ~24-27 min suite jobs), and step timing showed 92% of each job was the load test itself, not build/setup. Per-PR benchmark runs are informational only — the regression gate, confirmation rerun, and issue filing are all main-push only — and single-run deltas on shared CI runners are noise-dominated, so auto-running the full suite on every code change cost far more than it informed. 1. Per-PR benchmarks are now opt-in. suite_requested_by_event? selects suites on a PR only when a benchmark* label is present; change detection (run_output / RUN_*_BENCHMARKS) no longer auto-triggers them. push to main and workflow_dispatch are unchanged, so main-branch tracking and the confirmation/issue-filing gate are untouched. Removed the now-dead run_output keys and the RUN_*_BENCHMARKS workflow plumbing that fed them. 2. Trimmed the per-route warm-up from 10x0.5s (5s/route) to 3x0.2s (~1s/route). At ~37 routes/shard the old warm-up was pure CI time with no effect on the measured 30s k6 window; the load test self-warms remaining workers. Updated AGENTS.md PR-label guidance, the suppress-label comment, and the matrix spec to match. Fixes #4012 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
WalkthroughPer-PR benchmark suite selection is changed from change-detection-driven auto-inclusion to label-only opt-in. The ChangesBenchmark opt-in and warm-up trim
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR makes per-PR benchmarks opt-in via a
Confidence Score: 4/5Safe to merge — the main-branch regression gate is untouched and the opt-in mechanism is well-tested; the only rough edge is a couple of stale env vars in two spec examples. The core logic change — removing truthy_env? + run_output from PR suite selection — is small, clearly correct, and directly validated by the updated spec. The warm-up reduction is straightforward arithmetic. The only rough edge is that two existing suppression-label tests still pass RUN_*_BENCHMARKS env vars that generate_matrix.rb no longer reads; those vars are now silent no-ops, so the tests pass but their descriptions and setup are misleading for future readers. benchmarks/spec/benchmark_matrix_spec.rb — the two full-ci-no-benchmarks suppression examples still reference the removed RUN_* env vars. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Benchmark workflow triggered] --> B{Event type?}
B -->|push to main| C[All suites selected]
B -->|workflow_dispatch| C
B -->|pull_request| D{Same-repo PR?}
D -->|No - fork| E[No suites - placeholder row]
D -->|Yes| F{full-ci-no-benchmarks label?}
F -->|Yes| E
F -->|No| G{BENCHMARK_NON_RUNTIME_ONLY?}
G -->|true| E
G -->|false| H{benchmark* label present?}
H -->|No label| E
H -->|benchmark| I[All suites selected]
H -->|benchmark-core| J[Core suite only]
H -->|benchmark-pro| K[Pro + Pro Node Renderer suites]
H -->|benchmark-pro-node-renderer| L[Pro Node Renderer suite only]
C --> M[suite_enabled? filtered by app_version]
I --> M
J --> M
K --> M
L --> M
M --> N[build_matrix: shard rows emitted]
E --> O[empty_matrix_row: suite_id=none]
|
…error-callbacks * origin/main: Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) # Conflicts: # llms-full.txt
* origin/main: Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) # Conflicts: # llms-full.txt
* origin/main: Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) # Conflicts: # llms-full.txt
…t192-phase1 * origin/main: (43 commits) Split llms-full.txt into OSS and Pro tiers to clear the 2048 KiB gate (#4021) Codify maintainer attention contract (#3987) Docs: correct vm.Script caching analysis caveats (#3997) CI: handle unavailable PR head repo for +ci-run-full (#3986) Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963) Docs: add OSS flagship demo to examples gallery (closes #3876) (#3959) Docs: rewrite RSC CSS guide with FOUC pipeline and per-approach sections (#3992) Docs: add 21 animated SVG diagrams for React Server Components (#3806) Fix Rspack generator: missing @rspack/dev-server and wrong ReactRefreshPlugin import (#3926) Docs: add React Performance Tracks profiling guide (#3961) Add cross-agent coordination backend workflow (#3977) Validate llms sidebar coverage and hosted pointers (#3960) ...
…e-demo-cost-docs * origin/main: Add RSC FOUC acceptance coverage (#4033) Keep plan-pr-batch goal prompts under 4000 chars (#4025) docs(skills): file-collision check + goal-prompt size discipline for plan-pr-batch (#3914) Verify React 19.2 <Activity> with react_component (CSR + SSR-hydrate) + docs (#3938) Allow Pro RSC peer check for React 19.2 (#4026) Split llms-full.txt into OSS and Pro tiers to clear the 2048 KiB gate (#4021) Codify maintainer attention contract (#3987) Docs: correct vm.Script caching analysis caveats (#3997) CI: handle unavailable PR head repo for +ci-run-full (#3986) Fix Pro dummy lockfile drift and rich text demo (#3989) Add FOUC integration tests for generated CSS (#4005) Make per-PR benchmarks opt-in and trim per-route warm-up (#4012) (#4013) Treat docs-internal/ tree as documentation in CI change detection (#4016) Add issue triage prompt skill (#3983) Point coordination docs at agent-coord bootstrap (#4008) Docs: backfill async RSC manifest changelog entry (#3993) Migrate conductor.json to .conductor/settings.toml (#4007) Bump react-hooks lint to v6 and document RSC compiler boundary (#3963)
Problem
Benchmarks dominate PR CI time — far more than any other check. Measured on real runs:
sleep 0.5) + a 30sk6load test ≈ 35s/route.Crucially, per-PR benchmarks are informational only — the regression gate, confirmation rerun, and issue filing are all
push-to-mainonly (benchmark.yml). On a PR the job just posts a summary comment, and single-run deltas on shared CI runners are noise-dominated (the thresholds are deliberately wide and correctly refuse to flag them). So we spent ~76 min/PR to produce a comment no one can act on with confidence.Changes
suite_requested_by_event?selects suites on a PR only when abenchmark*label is present (benchmark,benchmark-core,benchmark-pro,benchmark-pro-node-renderer). Change detection (run_output/RUN_*_BENCHMARKS) no longer auto-triggers them.pushtomainandworkflow_dispatchare unchanged, so main-branch tracking and the confirmation/issue-filing gate are untouched. Removed the now-deadrun_outputkeys and theRUN_*_BENCHMARKSworkflow env plumbing that fed them.benchmarks/bench.rb. At ~37 routes/shard the old warm-up was pure CI time with no effect on the measured 30sk6window; the load test self-warms remaining workers in its first fraction of a second.Anyone wanting a perf check on a specific PR adds the
benchmark(or a suite-specific) label.Not in scope (deliberately)
k6DURATION— widens per-run noise on slow routes; needs threshold re-validation (Benchmark CI: tune thresholds / reduce flakiness so the main gate can be re-enabled #3169).Testing
bundle exec rspec benchmarks/spec→ 271 examples, 0 failures (updated the one matrix test that asserted the old change-detection trigger).bundle exec rubocop benchmarks/...→ clean.actionlint .github/workflows/benchmark.yml→ clean; YAML parses.["none"]; PR +benchmarklabel → all suites;push→ all suites.Docs updated: AGENTS.md PR-label guidance and the
full-ci-no-benchmarkssuppress-label comment.Labels:
full-ci— this is a CI-workflow/detector change, so run the broad test matrix; it does not need benchmarks.Merge criteria (self-merge by Claude Code)
I'll squash-merge this once all of the following hold:
mainis not red at merge time (not stacking onto a broken base).full-citest matrix, lint/RuboCop, and the Benchmark Workflow'sdetect-changes(which runsbundle exec rspec benchmarks/spec) all conclude success. The only acceptable non-success states are the benchmark suite jobs being skipped (see Add linting and CI scripts #3) and the superseded push-run jobs that concurrency cancelled.Core/Pro benchmarks … run-suitejobs are absent/skipped because this PR has nobenchmark*label, proving the opt-in gate works. ✅ already confirmed.mergeablestays true.Not required: an actual labeled benchmark suite run on this PR. The matrix-selection logic and the warm-up code path are covered by
benchmarks/spec(271 examples), every selection scenario was verified locally, and the nextmainpush exercises the full suite + new warm-up end-to-end — spending ~67 min to run a suite here would re-introduce the exact cost this PR removes.Fixes #4012
🤖 Generated with Claude Code