Fix Pro dummy integration test gating#3697
Conversation
|
Need the big picture first? Review this PR in Change Stack to see what changed before going file by file. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe PR wires the ChangesPro dummy tests detection and gating
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code Review: Fix Pro dummy integration test gating Overview: This PR fixes a real CI gating gap. The ci-changes-detector script was already computing and emitting run_pro_dummy_tests to GITHUB_OUTPUT, but the detect-changes job in pro-integration-tests.yml never declared it as a job-level output, so the three downstream integration jobs (webpack build, RSpec, Playwright) could never see it. Changes scoped to react_on_rails_pro/spec/dummy/** silently skipped all Pro integration tests. Correctness: The fix is correct and minimal. It exposes the missing output (run_pro_dummy_tests) in the detect-changes job outputs block, sets it to true in the force-run path (consistent with all other flags), and ORs it into the job conditions for all three Pro integration jobs. The pattern exactly mirrors how run_pro_tests is used. The detector script itself needed no changes. Test Coverage: The new test test_pro_dummy_only_change_runs_pro_dummy_tests_without_pro_unit_tests covers the key regression: a Pro dummy-only change sets run_pro_dummy_tests=true while keeping run_pro_tests=false. Minor observation: Comparable tests (e.g. test_pro_app_comment_only_change_runs_pro_lint_only) assert several outputs to prevent false negatives. The new test only asserts two. Since PRO_DUMMY_CHANGED=true also sets RUN_PRO_LINT=true in the detector, asserting 'run_pro_lint: true' in the new test would complete the coverage picture. This is a polish nit, not a bug. No other issues found:
Verdict: Approved. Clean, focused fix with a targeted regression test. |
Greptile SummaryThis PR wires up an already-computed
Confidence Score: 5/5Safe to merge. The change is a one-line output declaration and symmetric condition additions in the workflow; the underlying The workflow fix is mechanical: the output was already computed but never forwarded to downstream jobs. All three job conditions receive the same treatment in the same pattern as the existing No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[detect-changes job] -->|outputs| B{run_pro_tests?}
A -->|outputs NEW| C{run_pro_dummy_tests?}
A -->|outputs| D{non_runtime_only?}
B -->|true| E[build-dummy-app-webpack-test-bundles]
C -->|true| E
D -->|true on main push| F[skip all downstream jobs]
E --> G[rspec-dummy-app-node-renderer]
E --> H[dummy-app-node-renderer-e2e-tests]
B -->|true| G
C -->|true| G
B -->|true| H
C -->|true| H
style C fill:#90EE90,stroke:#228B22
Reviews (1): Last reviewed commit: "fix pro dummy integration test gating" | Re-trigger Greptile |
| test_pro_dummy_only_change_runs_pro_dummy_tests_without_pro_unit_tests() { | ||
| setup_repo | ||
| perl -0pi -e 's/renders/renders seeded data/' react_on_rails_pro/spec/dummy/spec/requests/posts_page_spec.rb | ||
| commit_change "pro dummy spec" | ||
|
|
||
| local out | ||
| out="$(detector_output)" | ||
| assert_contains "$out" '"run_pro_dummy_tests": true' "pro dummy output" | ||
| assert_contains "$out" '"run_pro_tests": false' "pro dummy output" | ||
| } |
There was a problem hiding this comment.
Missing scope-narrowing assertions
The test confirms the two key flags but doesn't assert that a Pro-dummy-only change doesn't bleed into the core gem suite. The analogous test test_pro_app_source_change_runs_pro_tests_only explicitly checks run_ruby_tests: false and run_js_tests: false with the comment "Scoped to Pro: a Pro app change must not drag in the core gem suite." Adding the same guards here would catch a future regression where the Pro-dummy path accidentally triggers core-gem jobs.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Pro Node Renderer Benchmark Summary
▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline |
Core Benchmark Summary
▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline |
Pro (shard 2/2) Benchmark Summary
▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline |
Pro (shard 1/2) Benchmark Summary
▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline |
Bencher posted benchmark results on PR #3697, a CI-config-only change (.github/workflows/pro-integration-tests.yml + the detector's own test harness). Root cause: benchmark.yml derived which suites to run from the test run_* flags, and those flags are deliberately inflated by CI plumbing — the CI-infrastructure catch-all (script/, bin/, actions) ran the full suite "for safety," and the suite-specific workflow YAML files (pro-integration-tests.yml → PRO_DUMMY, node-renderer-tests.yml → PRO_NODE_RENDERER, …) map to test categories. None of that can move runtime performance, so the benchmark runs were pure noise. Decouple benchmark selection from test selection: - The detector now tracks BENCH_CORE / BENCH_PRO / BENCH_PRO_NODE_RENDERER, set ONLY by genuine product-source arms (gem/app/package/dummy code) and the uncategorized catch-all. CI plumbing (new CI_INFRA_CHANGED flag) and the suite-workflow YAML files (is_benchmark_irrelevant_path guard) set no BENCH flag, so they still run their TESTS but never benchmarks. The targeted per-workflow test mapping (ci-optimization.md) is preserved. - The detector emits run_core_benchmarks / run_pro_benchmarks / run_pro_node_renderer_benchmarks; benchmark.yml reads those directly instead of re-deriving from run_* test flags. The mapping mirrors the old derivation exactly for every genuine-source category, so legit benchmark runs are unchanged — only CI-plumbing-triggered runs are dropped. push-to-main, workflow_dispatch, and the 'benchmark' PR label still run suites regardless. Adds regression tests for the #3697 file set, a workflow-only change (tests yes, benchmarks no), CI-infra + real source (benchmarks yes), and node-renderer source (node-renderer benchmark yes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Why Bencher kept posting benchmark results on PRs that can't possibly affect runtime performance. Two reported cases: - **[#3717](#3717 (comment) — a markdown/agent-tooling PR whose only non-`.md` file was a new `.claude/skills` symlink. - **[#3697](#3697 (comment) — a CI-config-only PR (`.github/workflows/pro-integration-tests.yml` + the detector's own test harness). Both slipped through because the change detector (`script/ci-changes-detector`) is the single gate feeding every test workflow **and** the benchmark matrix. ### Root cause 1 — non-`.md` repo metadata hits the catch-all (#3717) The extensionless `.claude/skills` symlink matched no detector category → fell into the `*` catch-all ("run everything for safety") → `non_runtime_only=false` → full suite + benchmarks. Same class as #3597. ### Root cause 2 — benchmarks piggyback on the test run-flags (#3697) `benchmark.yml` derived which suites to run from the test `run_*` flags. Those flags are intentionally inflated by CI plumbing: the CI-infrastructure catch-all (`script/`, `bin/`, actions) runs the full suite, and the suite-specific workflow YAMLs map to test categories (`pro-integration-tests.yml` → `PRO_DUMMY`, `node-renderer-tests.yml` → `PRO_NODE_RENDERER`, …). None of that moves runtime performance, but it all dragged in benchmarks. ## Fix **Commit 1 — agent/editor tooling is non-runtime.** Categorize `.claude/**`, `.agents/**`, `.cursor/**` alongside docs/`internal/**`/issue-templates, so agent-tooling PRs are `non_runtime_only` (no tests, no benchmarks). **Commit 2 — decouple benchmark selection from test selection.** - The detector tracks `BENCH_CORE` / `BENCH_PRO` / `BENCH_PRO_NODE_RENDERER`, set **only** by genuine product-source arms (gem/app/package/dummy code) and the uncategorized catch-all. CI plumbing (new `CI_INFRA_CHANGED` flag) and the suite-workflow YAMLs (`is_benchmark_irrelevant_path` guard) set no `BENCH` flag — they still run their **tests** but never benchmarks. The targeted per-workflow test mapping (`ci-optimization.md`) is preserved. - The detector emits `run_core_benchmarks` / `run_pro_benchmarks` / `run_pro_node_renderer_benchmarks`; `benchmark.yml` reads them directly instead of re-deriving from `run_*`. The mapping mirrors the old derivation **exactly** for every genuine-source category, so legitimate benchmark runs are unchanged — only CI-plumbing-triggered runs are dropped. - `push`-to-main, `workflow_dispatch`, and the `benchmark` PR label still run suites regardless (the label is the escape hatch for validating `benchmark.yml`/workflow edits). ## Verification - Replayed both PRs' exact file sets against the patched detector: #3717 → `non_runtime_only: true`; #3697 → all three `run_*_benchmarks: false` while tests still run. - Confirmed legit benchmark behavior is unchanged: core source → all 3 suites; node-renderer source → pro + node-renderer; pro-ruby → pro only; a suite-workflow-only edit → its tests but no benchmark. - `script/ci-changes-detector-test.bash`: **43 run, 0 failed** (added regressions for both root causes). `actionlint` clean on `benchmark.yml`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * CI now separates benchmark gating from test-selection, treats additional agent/editor/tooling paths as docs/non-runtime-only, treats CI-infrastructure edits as infra-only (forcing tests but suppressing benchmarks), and exposes per-suite benchmark run flags in CI outputs while preventing docs-only when CI infra changes. * **Tests** * Added regression tests covering docs/tooling paths, CI-infra-only edits, suite-specific gating, combined infra+runtime edits, and node-renderer benchmark behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes central CI gating logic for all workflows and benchmarks; behavior is heavily regression-tested but misclassification could skip needed benchmarks or still run Bencher on non-runtime PRs. > > **Overview** > Stops Bencher from running on PRs that only touch docs, agent tooling, or CI config by tightening **what counts as non-runtime** and **when benchmarks are selected**. > > **`script/ci-changes-detector`** now treats `.claude/**`, `.agents/**`, and `.cursor/**` like docs (fixes extensionless paths such as `.claude/skills` hitting the catch-all). CI plumbing uses a dedicated **`CI_INFRA_CHANGED`** flag: it still forces the full **test** matrix but does **not** set benchmark flags. Benchmark gating is decoupled from test `run_*` flags via **`BENCH_*`** flags and new outputs **`run_core_benchmarks`**, **`run_pro_benchmarks`**, and **`run_pro_node_renderer_benchmarks`**; workflow YAML under `.github/workflows/*` is excluded from benchmark relevance via **`is_benchmark_irrelevant_path`**. > > **`.github/workflows/benchmark.yml`** drops inline benchmark derivation and reads those detector outputs directly. **`ci-commands.yml`** aligns the docs-only heuristic with the same agent/editor paths. > > Regression coverage in **`script/ci-changes-detector-test.bash`** covers agent tooling, CI-only vs workflow-only edits, mixed CI + source, and node-renderer benchmarks. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 92ba2bd. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> ## Codex Decision Log - Full-matrix validation: accepted CodeRabbit's current-head nit and added the existing `full-ci` label so detector/benchmark-gating changes receive full CI coverage before merge. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
run_pro_dummy_testsdetector output frompro-integration-tests.ymlrun_pro_dummy_testsstays true while Pro unit tests remain falseCloses #3633.
Validation
rg -q 'run_pro_dummy_tests' .github/workflows/pro-integration-tests.yml(failed before the workflow change, passes after)bash script/ci-changes-detector-test.bash(38 run, 0 failed)actionlint .github/workflows/pro-integration-tests.ymlscript/ci-changes-detector origin/main(workflow change recommends broad CI)git diff --checkbash -n script/ci-changes-detector-test.bashpnpm exec prettier --check .github/workflows/pro-integration-tests.yml/Users/justin/.agents/skills/autoreview/scripts/autoreview --mode local(clean)bundle exec rubocopwas attempted; it fails on pre-existingreact_on_rails/spike/3313_prism_gemfile_rewriter/*offenses unrelated to this branch.Summary by CodeRabbit