Split benchmark tracking script by justin808 · Pull Request #4176 · shakacode/react_on_rails

justin808 · 2026-06-22T10:50:05Z

Summary

Fixes #3459 by splitting benchmarks/track_benchmarks.rb into focused modules without intended behavior changes:

Keeps the executable entrypoint in benchmarks/track_benchmarks.rb.
Adds focused implementation files under benchmarks/lib/track_benchmarks/**.
Adds focused specs under benchmarks/spec/track_benchmarks/**.

Validation

script/ci-changes-detector origin/main
bundle exec rspec benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
bundle exec rspec benchmarks/spec
Ruby syntax checks for benchmarks/track_benchmarks.rb, benchmarks/lib/track_benchmarks.rb, benchmarks/lib/track_benchmarks/**/*.rb, and focused specs.
bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks.rb benchmarks/lib/track_benchmarks benchmarks/spec/track_benchmarks
Safe missing-input smoke: BENCHMARK_JSON=/tmp/codex-missing-benchmark.json bundle exec ruby benchmarks/track_benchmarks.rb exits 1 as expected.
Pre-push hook: branch RuboCop and markdown-links passed.

Review / Churn Notes

codex review --base origin/main completed with no correctness/security/maintainability findings.
claude -p '/code-review origin/main' completed with no correctness findings; it noted optional maintainability cleanup around compatibility shims and memoization, intentionally kept out of scope for a behavior-preserving split.
bin/ci-local --changed was attempted by the worker and failed in pre-existing repo-wide RuboCop offenses under react_on_rails/spike/3313_prism_gemfile_rewriter/**, outside this lane's file-touch map.
claude -p '/simplify origin/main' was run under a 180s timeout and exited with no output.

Codex Decision Log

Non-blocking: Multiple downloaded confirmation candidate payloads are now treated as inconclusive instead of choosing Dir.glob(...).first.
- Decision: Keep the defensive guard and document it as artifact-corruption handling.
- Why: Arbitrary .first could confirm or clear the wrong benchmark payload when the artifact tree is malformed.
- Review later: None; this does not change normal benchmark execution or valid single-candidate confirmation behavior.

Changelog

Not user-visible; no changelog entry.

Confidence note:

Validated: commands above plus pre-push hook.
Evidence: local command output; PR branch codex/3459-track-benchmarks-split.
UNKNOWN: hosted CI/review status until GitHub checks and reviewers complete.
Residual risk: benchmark behavior is intended to be unchanged, but full live benchmark execution is left to hosted/maintainer-selected benchmark validation.

Summary by CodeRabbit

New Features
- Added confirmation rerun flow that classifies confirmed, cleared, or inconclusive results and publishes step-summary output.
- Added regression handoff candidate/confirmed JSON payloads and display-sidecar rendering for improved report content.
Bug Fixes
- Improved resiliency with normalized exit codes and automated retries when start-point hashes aren’t usable.
- Made main-branch regression candidate handling more non-fatal and added safer PR comment updating.
Refactor
- Modularized benchmark tracking into dedicated components and converted the main entry script into a thin delegator.
Tests
- Expanded RSpec coverage for branching/start-point args, summary rendering, payload writing, confirmation loading, and CLI orchestration.

Merge Rationale

Maintainer merge authority was granted on 2026-06-24 after this PR reached ready state.

Reasons to merge:

Implements issue Benchmark workflow follow-ups: unify suite jobs, split track_benchmarks, JSON-based regression detection #3459 by splitting benchmarks/track_benchmarks.rb into focused modules/specs without intended benchmark behavior changes.
Current head d30b12e8186ca2f7bc983b5c76d8f8f957031e46 is mergeable (mergeStateStatus: CLEAN) with 44 checks passed, 0 pending, and 0 failing.
Required readiness is green, strict merge ledger is clean (complete_allowed: true, no unknown fields, no violations), and unresolved review threads are 0.
The diff stays inside the assigned benchmark file-touch map: benchmarks/track_benchmarks.rb, benchmarks/lib/**, and benchmarks/spec/**.

Confidence note:

Validated: focused benchmark specs and syntax/RuboCop checks listed above, pr-ci-readiness, full hosted checks, review-thread triage, and script/pr-merge-ledger 4176 --strict --pretty --changelog-classification not_user_visible.
Evidence: closeout comment Split benchmark tracking script #4176 (comment); ledger artifact /tmp/pr-4176-merge-ledger-premerge.json.
UNKNOWN: none.
Residual risk: behavior-preserving benchmark tooling split; no intended benchmark behavior change.

coderabbitai · 2026-06-22T10:50:22Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

benchmarks/track_benchmarks.rb (~400 lines of inline logic) is extracted into eight focused modules under benchmarks/lib/track_benchmarks/: Config, BranchArgs, BencherRun, Summary, RegressionPayloads, Confirmation, PrComments, and Cli. The original script is reduced to Config-aliased constants, thin delegating wrappers, and a single Cli.new(...).run call. RSpec tests are added for BranchArgs, RegressionPayloads, Summary, Confirmation, and Cli.

Changes

Modular TrackBenchmarks extraction

Layer / File(s)	Summary
Config constants and module entry-point `benchmarks/lib/track_benchmarks/config.rb`, `benchmarks/lib/track_benchmarks.rb`	`TrackBenchmarks::Config` centralizes ENV-backed JSON path constants for benchmark/report/display files and candidate/confirmed hand-off payloads, defines a candidate input directory, and provides `env!` to enforce required environment variables with `::warning::` and exit `1` on missing keys. `benchmarks/lib/track_benchmarks.rb` wires `require_relative` for all internal modules and GitHub/CLI/reporting components.
Branch selection and Bencher execution `benchmarks/lib/track_benchmarks/branch_args.rb`, `benchmarks/lib/track_benchmarks/bencher_run.rb`, `benchmarks/spec/track_benchmarks/branch_args_spec.rb`	`BranchArgs` routes `GITHUB_EVENT_NAME` to branch and start-point args: `push` returns `"main"` with no start-point args; `pull_request` returns head ref plus base SHA start-point args with clone/reset flags; `workflow_dispatch` queries GitHub API for merge-base SHA and embeds `--start-point-hash` (falls back gracefully on API failure); confirmation mode synthesizes a unique branch name and main start-point args. `BencherRun` executes the Bencher runner, classifies errors, normalizes exit code for stale/filtered alerts (forced to `0` with notice), detects retry eligibility when stderr indicates missing Head Version baseline and report is not a regression, and strips `--start-point-hash` for retry. RSpec specs cover all event types, GitHub API success/failure paths, and unexpected event names.
Summary rendering and PR comment gating `benchmarks/lib/track_benchmarks/summary.rb`, `benchmarks/lib/track_benchmarks/pr_comments.rb`, `benchmarks/spec/track_benchmarks/summary_spec.rb`	`Summary` safely ingests JSON display-sidecar rows (with `Github.warning` diagnostics for malformed/missing/unreadable files and safe `[]` fallbacks), renders Markdown benchmark tables, appends suite-prefixed reports to GitHub step summary, generates regression handoff summaries with run-URL fallback when tables are empty, and extracts/deduplicates regressed benchmark names and alert pairs from report alerts. `PrComments.replace` gates comment replacement to `pull_request` events with non-empty markdown. RSpec specs validate sidecar parsing (success, malformed JSON, missing files, ENOENT/EACCES/EIO errors) and alert-pair deduplication.
Candidate and confirmed regression payload writing `benchmarks/lib/track_benchmarks/regression_payloads.rb`, `benchmarks/spec/track_benchmarks/regression_payloads_spec.rb`	`RegressionPayloads` detects main-push context via GitHub event/ref environment variables; `report_main_push_candidate` treats detected regressions as non-fatal by writing the candidate payload (fails only if write fails) and treats operational failures as fatal. `write_candidate` produces suite/shard identity, regression handoff summary, regressed benchmark names, and alert-pair JSON to the candidate path and emits `::notice::` with return `true`; returns `false` on exceptions with `::error::`. `write_confirmed` writes confirmed hand-off JSON including both first-run and confirmation markdown summaries, confirmed alert pairs, and unique regressed benchmark names extracted from `confirmed_alerts`. RSpec specs validate payload structure, alert preservation, benchmark deduplication, summary fallback wrapping, and return values.
Confirmation rerun flow `benchmarks/lib/track_benchmarks/confirmation.rb`, `benchmarks/spec/track_benchmarks_spec.rb`	`Confirmation` loads and validates candidate JSON from glob (returns `[nil, ""]` for missing/multiple/unreadable candidates with `::error::` output), classifies rerun outcomes as `:inconclusive` (missing report or non-zero exit without regression), `:cleared` (confirmed overlap empty), or `:confirmed` (non-empty confirmed overlap) by combining candidate actionable alerts and report regressed alert pairs, appends suite-level markdown summary indicating status and confirmed alerts, orchestrates confirmed payload writing (downgrades to `:inconclusive` on write failure), and exits with status `0` for `:confirmed`/`:cleared` and `1` for inconclusive. RSpec specs validate all candidate discovery inconclusive scenarios (missing, multiple, access failures) with expected `::error::` stderr output.
Cli orchestrator and top-level script refactoring `benchmarks/lib/track_benchmarks/cli.rb`, `benchmarks/track_benchmarks.rb`, `benchmarks/spec/track_benchmarks/cli_spec.rb`	`Cli` orchestrates the full run: branch/start-point selection, Bencher execution, optional retry without `--start-point-hash`, exit-code normalization, single Markdown render and post to job summary, then mode-dependent dispatch (confirmation via `Confirmation.run` vs PR report posting with `post_pull_request_report`), and main-push regression payload reporting on nonzero exit. `post_pull_request_report` gates on `pull_request` events, preserves prior comment on operational failure, substitutes regression handoff fallback with run-URL when table is empty, otherwise replaces with rendered report via `PrComments.replace` and `PrReportPoster`. `track_benchmarks.rb` reduced to Config-aliased constants, delegator wrapper methods forwarding to `TrackBenchmarks::*` modules, and single `Cli.new(suite_name: SUITE_NAME, report_marker: REPORT_MARKER).run` invocation. RSpec spec validates PR posting logic with and without `GITHUB_EVENT_NAME` environment variable.

Sequence Diagram(s)

sequenceDiagram
  participant Cli
  participant BranchArgs
  participant GithubCli
  participant BencherRun
  participant BencherRunner

  Cli->>BranchArgs: branch_and_start_point_args(env)
  alt confirmation mode
    BranchArgs-->>Cli: [synthetic_branch, main start-point args]
  else push
    BranchArgs-->>Cli: ["main", []]
  else pull_request
    BranchArgs-->>Cli: [head_ref, base_sha start-point args]
  else workflow_dispatch (not main)
    BranchArgs->>GithubCli: capture compare main...branch
    GithubCli-->>BranchArgs: merge-base SHA or failure
    BranchArgs-->>Cli: [branch, start-point args ± hash]
  end
  Cli->>BencherRun: run_bencher!(runner, branch, start_point_args)
  BencherRun->>BencherRunner: run(branch, start_point_args)
  BencherRunner-->>BencherRun: exit_code, report, stderr
  alt retry_without_start_point_hash?
    BencherRun->>BencherRun: without_start_point_hash(args)
    BencherRun->>BencherRunner: run(branch, stripped_args)
    BencherRunner-->>BencherRun: retry_exit_code, report
  end
  BencherRun->>BencherRun: normalized_exit_code(exit_code, report)
  BencherRun-->>Cli: final_exit_code, report

sequenceDiagram
  participant Script as track_benchmarks.rb
  participant Cli
  participant BranchArgs
  participant BencherRun
  participant Summary
  participant PrComments
  participant PrReportPoster
  participant Confirmation
  participant RegressionPayloads

  Script->>Cli: new(suite_name, report_marker).run
  Cli->>BranchArgs: branch_and_start_point_args
  BranchArgs-->>Cli: branch, start_point_args
  Cli->>BencherRun: run_bencher!(runner, branch, args)
  BencherRun-->>Cli: exit_code, report, stderr
  alt retry eligibility met
    Cli->>BencherRun: retry_without_start_point_hash
    BencherRun-->>Cli: retry_exit_code, report
  end
  Cli->>BencherRun: normalized_exit_code(exit_code, report)
  Cli->>Summary: rendered_report(report)
  Cli->>Summary: post_report_to_summary(markdown, suite_name)
  alt confirmation mode
    Cli->>Confirmation: run(report, exit_code, markdown, suite_name)
    Confirmation-->>Script: exit(status)
  else PR mode
    Cli->>PrComments: replace(markdown)
    PrComments->>PrReportPoster: replace(markdown) if pull_request
  end
  alt main-push with nonzero exit
    Cli->>RegressionPayloads: report_main_push_candidate
    RegressionPayloads->>RegressionPayloads: write_candidate
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

shakacode/react_on_rails#3448: The main PR refactors the benchmarks/track_benchmarks.rb CI tracking entrypoint into new benchmarks/lib/track_benchmarks/* modules, which directly extracts and modularizes the inline logic introduced in PR #3448.
shakacode/react_on_rails#3810: Introduced the candidate→confirm→confirmed artifact flow and confirmation-mode branching in track_benchmarks.rb that this PR modularizes into Confirmation, RegressionPayloads, and BranchArgs.
shakacode/react_on_rails#3830: Consumes the confirmed regression payload's ALERTS benchmark+measure pairs written by RegressionPayloads.write_confirmed introduced in this PR for rendering the "What regressed" section in regression issue bodies.

Suggested labels

enhancement, review-needed, benchmark

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Split benchmark tracking script' accurately describes the main refactoring of dividing benchmarks/track_benchmarks.rb into focused modules.
Linked Issues check	✅ Passed	The PR directly addresses the 'Split benchmarks/track_benchmarks.rb into focused modules' task from issue `#3459` by extracting seven responsibilities into distinct, testable modules with full test coverage.
Out of Scope Changes check	✅ Passed	All changes align with the scope of splitting track_benchmarks.rb into modular implementations with test coverage. No unrelated refactoring or scope creep detected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/3459-track-benchmarks-split

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

justin808 · 2026-06-22T10:50:52Z

+ci-status

justin808 · 2026-06-22T10:50:54Z

+ci-run-hosted

github-actions · 2026-06-22T10:51:02Z

CI Status

Head SHA: b59acb55bf85
Changed files: 13
Docs-only heuristic (matches ci-changes-detector metadata paths): no
ready-for-hosted-ci label: absent
force-full-hosted-ci label: absent
Current hosted-CI waiver: not present for this SHA

Only the required gate is active unless hosted CI is requested.

github-actions · 2026-06-22T10:51:13Z

Hosted CI Requested

Triggered 9 workflow(s) for b59acb55bf85.
Mode: optimized hosted CI (path-selected by script/ci-changes-detector).
Added ready-for-hosted-ci, so future commits will keep running optimized hosted CI until +ci-stop-hosted is used.

View progress in the Actions tab.

greptile-apps · 2026-06-22T10:55:21Z

Greptile Summary

This PR splits the monolithic benchmarks/track_benchmarks.rb script into focused modules under benchmarks/lib/track_benchmarks/** without intended behavior changes. The entrypoint becomes a thin shim that delegates to TrackBenchmarks::Cli and provides top-level compatibility functions for callers that previously relied on the script's top-level methods.

Extraction of seven modules (Config, BranchArgs, Summary, BencherRun, PrComments, RegressionPayloads, Confirmation) and one orchestrating class (Cli) with a correct dependency load order in lib/track_benchmarks.rb.
Three new spec files covering BranchArgs.branch_and_start_point_args (including the workflow_dispatch merge-base paths), RegressionPayloads.write_candidate/write_confirmed, and Summary.display_rows/regressed_alert_pairs.
Compatibility shim constants (BENCHMARK_JSON, etc.) and top-level wrapper functions in track_benchmarks.rb are preserved for require-only callers, while actual script execution routes through Cli#run.

Confidence Score: 5/5

Safe to merge — this is a pure structural refactoring with no logic changes and the added specs exercise the newly isolated units.

All extracted modules faithfully reproduce the original script's logic. The load order in the new loader file respects every inter-module dependency. The compatibility shims in the entrypoint preserve the existing top-level API for require-only callers, and lazy evaluation of PrReportPoster env vars on non-PR runs is correctly maintained via block-yielded poster in PrComments.replace. The retry-without-start-point-hash path, the confirmation flow, and the candidate/confirmed hand-off payloads all match the pre-split behaviour line-for-line. Three well-structured specs are added that cover the units most amenable to unit testing.

No files require special attention.

Important Files Changed

Filename	Overview
benchmarks/track_benchmarks.rb	Entrypoint trimmed from 445 to ~155 lines; now delegates all logic to Cli#run and keeps shim constants/functions for require-only callers. Behavior is unchanged.
benchmarks/lib/track_benchmarks.rb	New loader file that requires all submodules in the correct dependency order (Config → BranchArgs → Summary → BencherRun → PrComments → RegressionPayloads → Confirmation → Cli).
benchmarks/lib/track_benchmarks/cli.rb	New Cli class orchestrating the full benchmark tracking run; faithfully mirrors the old FILE==$PROGRAM_NAME block, including the start-point-hash retry, summary posting, confirmation branch, and PR comment replacement logic.
benchmarks/lib/track_benchmarks/config.rb	Centralises path constants and the env! helper; constants are evaluated once at load time from ENV, matching the old top-level constant definitions exactly.
benchmarks/lib/track_benchmarks/branch_args.rb	Extracts branch/start-point resolution for push, pull_request, workflow_dispatch, and confirmation modes; workflow_dispatch_args correctly marked private_class_method.
benchmarks/lib/track_benchmarks/confirmation.rb	Confirmation rerun classification and hand-off; rescue block in load_candidate correctly references path only after it is guaranteed non-nil.
benchmarks/lib/track_benchmarks/regression_payloads.rb	Candidate and confirmed hand-off writers extracted cleanly; error handling and exit behavior match the original script.
benchmarks/lib/track_benchmarks/summary.rb	Report rendering and step-summary helpers extracted; display_rows now takes display_json as a parameter (vs the old DISPLAY_JSON constant), enabling clean unit testing.
benchmarks/lib/track_benchmarks/pr_comments.rb	Thin PR-comment gate; uses a keyword-argument default for event_name and a block-yielded poster, preserving lazy evaluation of PrReportPoster env vars on non-PR runs.
benchmarks/lib/track_benchmarks/bencher_run.rb	Bencher execution helpers and exit-code classification extracted faithfully; without_start_point_hash correctly dups the args array before slicing.
benchmarks/spec/track_benchmarks/branch_args_spec.rb	New spec covers pull_request, workflow_dispatch success, and workflow_dispatch API-failure paths for branch_and_start_point_args.
benchmarks/spec/track_benchmarks/regression_payloads_spec.rb	New spec validates write_candidate and write_confirmed payload structure including deduplication of benchmark names.
benchmarks/spec/track_benchmarks/summary_spec.rb	New spec covers display_rows happy path, malformed-JSON fallback, and regressed_alert_pairs deduplication.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["track_benchmarks.rb\n(__FILE__ == $PROGRAM_NAME)"] -->|"require_relative 'lib/track_benchmarks'"| B["lib/track_benchmarks.rb\n(loader)"]
    B --> C["Config"]
    B --> D["BranchArgs"]
    B --> E["Summary"]
    B --> F["BencherRun"]
    B --> G["PrComments"]
    B --> H["RegressionPayloads"]
    B --> I["Confirmation"]
    B --> J["Cli"]

    A -->|"Cli.new(suite_name:, report_marker:).run"| J

    J --> D
    J --> F
    J --> E
    J --> G
    J --> H
    J --> I

    H --> E
    H --> C
    I --> H
    I --> E
    I --> C
    F --> E

    subgraph "Compatibility shims (top-level methods)"
        K["confirmation_mode?\nslugify\nrun_bencher!\nmain_push?\n..."]
    end
    A --> K
    K -->|"delegates"| D
    K -->|"delegates"| F
    K -->|"delegates"| H

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["track_benchmarks.rb\n(__FILE__ == $PROGRAM_NAME)"] -->|"require_relative 'lib/track_benchmarks'"| B["lib/track_benchmarks.rb\n(loader)"]
    B --> C["Config"]
    B --> D["BranchArgs"]
    B --> E["Summary"]
    B --> F["BencherRun"]
    B --> G["PrComments"]
    B --> H["RegressionPayloads"]
    B --> I["Confirmation"]
    B --> J["Cli"]

    A -->|"Cli.new(suite_name:, report_marker:).run"| J

    J --> D
    J --> F
    J --> E
    J --> G
    J --> H
    J --> I

    H --> E
    H --> C
    I --> H
    I --> E
    I --> C
    F --> E

    subgraph "Compatibility shims (top-level methods)"
        K["confirmation_mode?\nslugify\nrun_bencher!\nmain_push?\n..."]
    end
    A --> K
    K -->|"delegates"| D
    K -->|"delegates"| F
    K -->|"delegates"| H

_{Reviews (1): Last reviewed commit: "Split benchmark tracking script" | Re-trigger Greptile}

claude · 2026-06-22T10:56:04Z

Code Review — PR #4176: Split benchmark tracking script

Summary: This PR cleanly refactors a 511-line monolithic script into focused modules under benchmarks/lib/track_benchmarks/ plus a Cli orchestrator, while adding a focused spec sub-directory. Behavior is preserved. The individual modules are well-organized and the test suite is solid. A few areas worth attention:

1. Orchestration path (`Cli#run`) has no direct test coverage

The new Cli class is the runtime entry point that wires all modules together, but no spec exercises it. track_benchmarks_spec.rb calls the top-level shim methods (which delegate to individual modules), so a bug in Cli#run's orchestration — wrong method call order, wrong argument, missing guard — would be invisible.

For example, the retry logic in Cli#retry_without_start_point_hash is only tested indirectly via the retry_without_start_point_hash? shim, but the end-to-end [retry_result.exit_code, retry_result.report] unwrapping path is never exercised by specs.

A minimal describe TrackBenchmarks::Cli spec with a stubbed BencherRunner would close this gap.

2. `BranchArgs` missing coverage for `"push"` and unknown-event branches

branch_args_spec.rb covers pull_request and workflow_dispatch (both happy and failing paths). The "push" branch (which returns ["main", []]) and the unknown-event early-exit are untested. The unknown-event branch calls exit 1 — worth pinning with an expect { ... }.to raise_error(SystemExit) test since it's a safety net for misconfigured callers.

3. `regression_handoff_summary` emits a misleading "on main" error in confirmation context

In RegressionPayloads#write_confirmed (line 68):

RegressionReport::CONFIRMATION_SUMMARY => Summary.regression_handoff_summary(confirmation_markdown),

If confirmation_markdown is empty (display sidecar missing during the confirmation run), regression_handoff_summary emits:

::error::Bencher flagged a regression on main but the summary table could not be rendered

This fires during a confirmation run, not an initial main-push run. The phrase "on main" is misleading in that context. Consider passing a context label to the method or handling the empty case at the call site in write_confirmed.

4. `load_candidate` rescue references potentially-uninitialized `path`

In Confirmation#load_candidate (line 36):

rescue StandardError => e
  warn "::error::Could not read confirmation candidate #{path} ..."

If Dir.glob itself raises, path = never completes its assignment, and Ruby yields nil for path in the rescue block, producing a misleading message like "Could not read confirmation candidate ". Cosmetic fix: warn "...candidate #{path || dir} ...".

5. `ENV.fetch` inconsistency between `Cli` and `PrComments` (pre-existing)

Cli#post_pull_request_report uses ENV.fetch("GITHUB_EVENT_NAME") (raises KeyError if unset), while PrComments.replace uses ENV.fetch("GITHUB_EVENT_NAME", nil) (returns nil). This is a pre-existing inconsistency now isolated in its own method — worth aligning to the nil-default form for consistency and safety in non-CI environments.

Minor observations (no action required)

The 25+ shim methods in track_benchmarks.rb are dead code at runtime (exist only for test compatibility). Future cleanup could update track_benchmarks_spec.rb to call module methods directly and remove the shims.
BranchArgs.workflow_dispatch_args uses private_class_method on a module_function-declared method — correct Ruby but uncommon; a short comment explaining the intent would help future readers.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/lib/track_benchmarks/confirmation.rb`:
- Around line 13-18: The current implementation at line 13 blindly takes the
first glob result when searching for the candidate filename, which can select
the wrong file if multiple candidate files exist under the artifact directory
tree. Instead of using .first on the Dir.glob result, implement logic to
disambiguate between multiple matches by validating there is exactly one
candidate file or by selecting the correct one based on appropriate criteria
(such as shard or suite information). Add appropriate handling for the ambiguous
case where multiple candidates are found, either by raising an error or treating
it as inconclusive.

In `@benchmarks/lib/track_benchmarks/summary.rb`:
- Around line 21-35: The code in the display_rows method checks file existence
before reading, but a race condition can occur where the file is deleted or
becomes unreadable between the File.exist? check on line 21 and the File.read
call on line 23, causing unhandled Errno exceptions to crash the path. Extend
the rescue clause to catch file I/O errors (Errno exceptions) in addition to
JSON::ParserError, logging a warning and returning an empty array when file read
failures occur, ensuring graceful degradation with a warning instead of a crash.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e4f23a64-28a1-45a3-80b4-c3addffc697f

📥 Commits

Reviewing files that changed from the base of the PR and between 5290f17 and b59acb5.

📒 Files selected for processing (13)

benchmarks/lib/track_benchmarks.rb
benchmarks/lib/track_benchmarks/bencher_run.rb
benchmarks/lib/track_benchmarks/branch_args.rb
benchmarks/lib/track_benchmarks/cli.rb
benchmarks/lib/track_benchmarks/config.rb
benchmarks/lib/track_benchmarks/confirmation.rb
benchmarks/lib/track_benchmarks/pr_comments.rb
benchmarks/lib/track_benchmarks/regression_payloads.rb
benchmarks/lib/track_benchmarks/summary.rb
benchmarks/spec/track_benchmarks/branch_args_spec.rb
benchmarks/spec/track_benchmarks/regression_payloads_spec.rb
benchmarks/spec/track_benchmarks/summary_spec.rb
benchmarks/track_benchmarks.rb

justin808 · 2026-06-23T08:27:33Z

Address-review summary for PR #4176

Scan scope: full history (no prior address-review checkpoint).
Head SHA: 6443ebc6f8b90dd1da853a90df030195873d1de4.

Outcomes:

Fixed inline: confirmation candidate disambiguation, safe candidate path reporting, display sidecar read/error handling, unset GITHUB_EVENT_NAME PR detection, push/unexpected-event branch-arg specs, and empty confirmation-summary preservation.
Added focused specs for the review cases, including benchmarks/spec/track_benchmarks/cli_spec.rb.
No deferred review items remain from the fetched inline threads.

Validation after fixes:

ruby -c on changed benchmark Ruby files
bundle exec rspec benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks (59 examples, 0 failures)
bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
BENCHMARK_JSON=/tmp/react-on-rails-missing-benchmark.json ruby benchmarks/track_benchmarks.rb exited 1 with the expected missing JSON message
script/ci-changes-detector origin/main
git diff --check

Remaining: current-head hosted CI, benchmark CI, and bot review reruns are pending after this push.

justin808 · 2026-06-23T08:27:33Z

+ci-status

justin808 · 2026-06-23T08:27:34Z

+ci-run-hosted

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6443ebc6f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

claude · 2026-06-23T08:29:46Z

Code Review

Overall: Clean, well-structured refactor. The module boundaries are sensible, the naming is consistent, and the new specs cover the right edge cases. A few things to resolve before merge:

Two undocumented behavior changes

The PR description says "no intended behavior changes" but two real ones exist:

write_confirmed no longer applies regression_handoff_summary to the confirmation markdown (inline comment on regression_payloads.rb:68). write_candidate still applies the fallback on line 40, so the two hand-off writers are now asymmetric. If this is intentional — because the downstream report-regressions job tolerates an empty CONFIRMATION_SUMMARY — the spec title makes that clear but the PR description should acknowledge it.
Multiple regression-candidate.json files now return [nil, ""] (inconclusive) rather than silently picking .first (inline comment on confirmation.rb:23). This is a better choice and has a spec, but it is a behavior change.

Minor observations

private_class_method :workflow_dispatch_args (branch_args.rb:170): correct placement — just noting it is now properly encapsulated where the original was a public top-level helper.
Backward-compat shims in track_benchmarks.rb (the def run_bencher!, def load_candidate, etc. that now delegate to the modules): these keep the existing spec surface intact, which is pragmatic. If the spec file ever migrates to calling the module methods directly, the shims can be dropped.
BencherRun and Confirmation modules have no dedicated unit specs; coverage comes via the shims in track_benchmarks_spec.rb. Fine for a behavior-preserving refactor, but worth revisiting if these modules grow independently.
Top-level constant aliases (BENCHMARK_JSON = TrackBenchmarks::Config::BENCHMARK_JSON, etc.) freeze the ENV.fetch result at require time — same caveat as the old code, just noted for completeness in test contexts.

github-actions · 2026-06-23T08:32:21Z

CI Status

Head SHA: 6443ebc6f8b9
Changed files: 15
Docs-only heuristic (matches ci-changes-detector metadata paths): no
ready-for-hosted-ci label: present
force-full-hosted-ci label: absent
Current hosted-CI waiver: not present for this SHA

Optimized hosted CI is enabled for this PR.

justin808 · 2026-06-23T09:32:22Z

+ci-status

justin808 · 2026-06-23T09:32:22Z

Address-review update for head 5c14a9d007dfdb8a0ef15eb1ca599dbd70c9a154:

Fixed the empty confirmed-summary regression in 5c14a9d00 by routing confirmation summaries through the existing fallback with confirmation-specific wording.
Added specs covering empty confirmation fallback and non-empty confirmation preservation.
Updated the PR body with the multiple-candidate confirmation decision log.
Replied to and resolved the four remaining review threads; the require-time constant concern was auto-deferred as out of scope for this behavior-preserving split.

Validated locally after the final fix:

ruby -c benchmarks/track_benchmarks.rb
ruby -c benchmarks/lib/track_benchmarks/branch_args.rb
ruby -c benchmarks/lib/track_benchmarks/cli.rb
ruby -c benchmarks/lib/track_benchmarks/confirmation.rb
ruby -c benchmarks/lib/track_benchmarks/regression_payloads.rb
ruby -c benchmarks/lib/track_benchmarks/summary.rb
bundle exec rspec benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
BENCHMARK_JSON=/tmp/react-on-rails-missing-benchmark.json ruby benchmarks/track_benchmarks.rb exits 1 as expected
script/ci-changes-detector origin/main
git diff --check

Confidence note:

Validated: commands above plus pre-commit/pre-push hooks.
Evidence: final commit 5c14a9d00; resolved review threads; GitHub checks are rerunning on the new head.
UNKNOWN: final hosted CI result until the queued checks complete.
Residual risk: no intended benchmark behavior change; live benchmark behavior remains covered by hosted/maintainer-selected benchmark validation.

justin808 · 2026-06-23T09:32:22Z

+ci-run-hosted

github-actions · 2026-06-23T09:48:01Z

Hosted CI Requested

Triggered 9 workflow(s) for 2c46caf3ea1e.
Mode: optimized hosted CI (path-selected by script/ci-changes-detector).
Added ready-for-hosted-ci, so future commits will keep running optimized hosted CI until +ci-stop-hosted is used.

View progress in the Actions tab.

claude · 2026-06-23T09:49:50Z

Code Review

Good structural work — the split into focused modules is clean and the new specs cover the previously un-tested confirmation/payload paths well. Two correctness issues and one maintenance trap to address before merge.

Bug: `Cli#run` does not propagate injected `env` (three call sites)

Cli.new accepts env: for test isolation and uses it correctly inside post_pull_request_report. But the three other env-sensitive calls in run all fall through to the real ENV:

# cli.rb – current (broken)
branch, start_point_args = BranchArgs.branch_and_start_point_args          # env: ENV
if BranchArgs.confirmation_mode?                                            # env: ENV
if RegressionPayloads.main_push? && bencher_exit_code != 0                 # env: ENV

Each of those methods already accepts env: as a keyword argument. The fix is:

BranchArgs.branch_and_start_point_args(env:)
BranchArgs.confirmation_mode?(env:)
RegressionPayloads.main_push?(env:)

Because cli_spec only tests post_pull_request_report via send (not the full run path), this slipped through. See inline comments.

Maintenance trap: `post_report_to_summary` shim references `SUITE_NAME`

In benchmarks/track_benchmarks.rb:

def post_report_to_summary(markdown)
  TrackBenchmarks::Summary.post_report_to_summary(markdown, SUITE_NAME)
end

SUITE_NAME is only assigned inside the if __FILE__ == $PROGRAM_NAME block. The old code explicitly avoided this pattern (see the now-removed comment on the old rendered_report). Calling this shim from a require-only context raises NameError: uninitialized constant SUITE_NAME. The current specs use modules directly so this won't surface in CI — but it's a footgun for anyone who reaches for the legacy shim. Either add a comment documenting the restriction or remove the shim (it is never called by any existing spec).

Everything else looks good

Require order in lib/track_benchmarks.rb is correct (summary before confirmation).
The defensive multi-candidate guard in Confirmation.load_candidate (treating >1 candidate as inconclusive instead of .first) is the right call and is well-tested.
write_candidate/write_confirmed error handling and the Github.run_url stubs in specs are consistent.
Config constants evaluated at load time from ENV matches the prior behavior; no regression.
The Confirmation.finish process-exit pattern is unchanged from the original script; no regression.

justin808 · 2026-06-23T09:58:15Z

+ci-run-hosted

justin808 · 2026-06-23T09:58:15Z

Address-review update for head d30b12e8186ca2f7bc983b5c76d8f8f957031e46:

Propagated injected env: through Cli#run to branch selection, confirmation-mode detection, and main-push candidate detection.
Added a run-level Cli spec that verifies those injected-env calls.
Updated the compatibility shim so post_report_to_summary accepts an explicit suite name and only falls back to the script constant when it exists.
Replied to and resolved the five latest review threads.

Validated locally after this fix:

ruby -c benchmarks/track_benchmarks.rb
ruby -c benchmarks/lib/track_benchmarks/cli.rb
ruby -c benchmarks/spec/track_benchmarks/cli_spec.rb
bundle exec rspec benchmarks/spec/track_benchmarks/cli_spec.rb benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks/cli.rb benchmarks/spec/track_benchmarks/cli_spec.rb
script/ci-changes-detector origin/main
git diff --check
Pre-commit/pre-push hooks passed, including branch RuboCop.

Confidence note:

Validated: commands above plus hooks.
Evidence: final commit d30b12e81; resolved review threads; hosted checks are rerunning on the new head.
UNKNOWN: final hosted CI result until queued checks complete.
Residual risk: no intended benchmark behavior change; live benchmark workflow remains covered by hosted benchmark validation.

justin808 · 2026-06-23T09:58:15Z

+ci-status

github-actions · 2026-06-23T09:58:29Z

CI Status

Head SHA: d30b12e8186c
Changed files: 15
Docs-only heuristic (matches ci-changes-detector metadata paths): no
ready-for-hosted-ci label: present
force-full-hosted-ci label: absent
Current hosted-CI waiver: not present for this SHA

Optimized hosted CI is enabled for this PR.

github-actions · 2026-06-23T09:59:04Z

Hosted CI Requested

Triggered 9 workflow(s) for d30b12e8186c.
Mode: optimized hosted CI (path-selected by script/ci-changes-detector).
Added ready-for-hosted-ci, so future commits will keep running optimized hosted CI until +ci-stop-hosted is used.

View progress in the Actions tab.

claude · 2026-06-23T10:04:22Z

The modularization is clean and the intent (behavior-preserving split) is well-executed. Two correctness improvements over the original worth noting: the multiple-candidate inconclusive guard and the failure_context: 'during confirmation' fix in write_confirmed (the original would have said 'on main' in a confirmation-run error, which was misleading). Three things to flag:

post_report_to_summary shim uses Object.const_get to read SUITE_NAME — see inline. This reflection path is dead in normal execution now that Cli handles orchestration; callers should pass suite_name explicitly.
Confirmation.run hardcodes Config::CANDIDATE_INPUT_DIR with no injectable seam — see inline. No test exercises the full run_confirmation / Confirmation.run path; only load_candidate is tested in isolation.
Minor: regression_payloads_spec.rb covers only happy paths. The rescue branches in write_candidate and write_confirmed (when File.write raises) are load-bearing for the confirmation flow and remain untested. Pre-existing gap, but worth a follow-up.

github-actions · 2026-06-23T10:05:19Z

Pro Node Renderer Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
Pro Node Renderer: simple_eval (non-RSC)	2158.44 ▲16.6% (1851.85)	4.23 ▼7.4% (4.57)	5.2 ▼42.6% (9.06)	200=64760
Pro Node Renderer: react_ssr (non-RSC)	1833.28 ▲7.9% (1698.87)	4.95 ▼3.9% (5.15)	6.06 ▼28.7% (8.51)	200=55010

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

github-actions · 2026-06-23T10:22:28Z

Core Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
/: Core	3.84 ▲11.2% (3.45)	2146.05 ▼1.6% (2180.12)	2573.82 ▼16.0% (3065.18)	200=123
/client_side_hello_world: Core	745.5 ▲14.9% (648.56)	8.76 ▼8.9% (9.61)	21.58 ▲10.3% (19.56)	200=22523
/client_side_rescript_hello_world: Core	754.04 ▲10.8% (680.35)	9.04 ▼1.8% (9.2)	17.57 ▼6.9% (18.87)	200=22782
/client_side_hello_world_shared_store: Core	696.03 ▲12.3% (619.77)	8.67 ▼13.9% (10.07)	21.93 ▲15.4% (19.0)	200=21027
/client_side_hello_world_shared_store_controller: Core	710.19 ▲14.0% (622.92)	11.31 ▲14.4% (9.88)	20.04 ▼1.5% (20.34)	200=21314
/client_side_hello_world_shared_store_defer: Core	492.28 ▼22.9% (638.16)	11.79 ▲17.0% (10.08)	17.01 ▼15.2% (20.06)	200=14873
/server_side_hello_world_shared_store: Core	16.78 ▲12.4% (14.93)	492.09 ▼4.9% (517.19)	659.38 ▼3.9% (685.79)	200=515
/server_side_hello_world_shared_store_controller: Core	16.97 ▲14.4% (14.84)	379.87 ▼22.8% (492.38)	609.0 ▼18.7% (749.42)	200=523
/server_side_hello_world_shared_store_defer: Core	16.95 ▲14.2% (14.84)	466.03 ▼6.7% (499.24)	653.32 ▼3.8% (679.25)	200=520
/server_side_hello_world: Core	33.95 ▲12.3% (30.23)	241.91 ▼5.5% (255.94)	293.99 ▼6.6% (314.69)	200=1035
/server_side_hello_world_hooks: Core	34.05 ▲14.9% (29.63)	264.3 ▲6.5% (248.29)	298.69 ▼5.0% (314.34)	200=1031
/server_side_hello_world_props: Core	34.22 ▲16.7% (29.31)	241.77 ▼6.4% (258.22)	297.72 ▼7.6% (322.28)	200=1039
/client_side_log_throw: Core	764.79 ▲12.2% (681.58)	10.35 ▲12.5% (9.2)	18.28 ▼2.9% (18.84)	200=23103
/server_side_log_throw: Core	33.14 ▲13.9% (29.09)	248.34 ▼4.2% (259.35)	311.95 ▼1.9% (318.08)	200=1004
/server_side_log_throw_plain_js: Core	33.57 ▲12.7% (29.78)	250.09 ▼3.1% (258.2)	302.02 ▼5.8% (320.55)	200=1021
/server_side_log_throw_raise: Core	33.86 ▲12.9% (30.0)	243.75 ▼6.3% (260.19)	300.51 ▼5.1% (316.53)	3xx=1028
/server_side_log_throw_raise_invoker: Core	642.06 ▼16.5% (768.6)	9.22 ▲11.7% (8.25)	14.21 ▼12.0% (16.15)	200=19396
/server_side_hello_world_es5: Core	34.62 ▲15.5% (29.98)	235.97 ▼8.7% (258.59)	292.42 ▼8.4% (319.33)	200=1053
/server_side_redux_app: Core	33.32 ▲16.9% (28.51)	248.12 ▼0.1% (248.28)	306.23 ▼3.8% (318.4)	200=1012
/server_side_hello_world_with_options: Core	34.8 ▲16.2% (29.96)	235.87 ▼5.6% (249.89)	291.7 ▼7.6% (315.78)	200=1056
/server_side_redux_app_cached: Core	766.16 ▲17.3% (653.4)	8.88 ▼10.2% (9.89)	17.75 ▲1.7% (17.46)	200=23147
/client_side_manual_render: Core	784.28 ▲18.5% (661.83)	8.48 ▼9.9% (9.41)	16.78 ▼14.8% (19.71)	200=23694
/render_js: Core	37.39 ▲16.9% (31.99)	221.15 ▼5.5% (234.08)	271.05 ▼8.2% (295.37)	200=1137
/react_router: Core	32.72 ▲15.0% (28.46)	278.9 ▲3.8% (268.76)	311.34 ▼7.4% (336.24)	200=991
/pure_component: Core	34.92 ▲13.5% (30.76)	190.16 ▼26.1% (257.17)	275.84 ▼11.9% (313.23)	200=1062
/react_compiler_example: Core	34.41 ▲15.3% (29.85)	241.29 ▼4.7% (253.16)	296.33 ▼6.9% (318.17)	200=1044
/css_modules_images_fonts_example: Core	34.35 ▲13.4% (30.28)	241.55 ▼6.9% (259.47)	301.9 ▼4.0% (314.63)	200=1043
/turbolinks_cache_disabled: Core	756.71 ▲8.7% (696.33)	8.23 ▼9.4% (9.09)	19.9 ▲6.3% (18.73)	200=22861
/rendered_html: Core	34.74 ▲14.9% (30.25)	238.87 ▼7.2% (257.45)	295.91 ▼6.3% (315.71)	200=1053
/xhr_refresh: Core	17.56 ▲13.8% (15.43)	328.53 ▼33.5% (494.4)	554.33 ▼16.9% (667.43)	200=541
/react_helmet: Core	33.45 ▲11.2% (30.09)	186.32 ▼29.3% (263.55)	285.72 ▼11.2% (321.84)	200=1022
/broken_app: Core	33.56 ▲15.2% (29.12)	245.37 ▼4.8% (257.84)	303.99 ▼5.7% (322.4)	200=1019
/image_example: Core	34.28 ▲14.4% (29.98)	242.81 ▼8.5% (265.45)	298.72 ▼7.9% (324.28)	200=1042
/font_optimization_example: Core	873.63 ▲14.9% (760.2)	7.29 ▼11.2% (8.21)	16.73 ▲6.7% (15.68)	200=26391
/client_side_activity: Core	765.44 ▲15.5% (662.91)	6.6 ▼31.6% (9.64)	14.26 ▼26.0% (19.27)	200=23280
/server_side_activity: Core	34.1 ▲14.2% (29.86)	267.22 ▲5.3% (253.7)	301.26 ▼6.7% (323.02)	200=1034
/turbo_frame_tag_hello_world: Core	836.56 ▲14.0% (733.51)	7.51 ▼13.9% (8.72)	17.67 ▲3.3% (17.1)	200=25273
/manual_render_test: Core	550.78 ▼18.9% (679.42)	10.48 ▲13.5% (9.23)	12.18 ▼34.7% (18.66)	200=16641
/root_error_callbacks: Core	29.38 ▼3.0% (30.3)	196.12 ▼23.3% (255.74)	256.59 ▼18.4% (314.53)	200=899
/hydration_scheduling: Core	11.91 ▲13.9% (10.45)	682.48 ▼9.5% (754.4)	1031.11 ▼2.6% (1058.2)	200=367
/rails_form: Core	779.15 ▲16.7% (667.62)	9.98 ▲7.2% (9.31)	17.6 ▼3.9% (18.32)	200=23535

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

justin808 · 2026-06-23T10:22:41Z

Address-review update for head d30b12e8186ca2f7bc983b5c76d8f8f957031e46.

Resolved current-head Claude threads without code changes:

Kept the post_report_to_summary(markdown, suite_name = nil) SUITE_NAME fallback as an intentional compatibility shim for the old top-level helper surface; normal script execution goes through Cli#run.
Deferred adding a new dir: keyword through Confirmation.run/run_confirmation; existing specs already exercise load_candidate(dir) directly, and adding a new injection API is testability surface beyond the no-behavior-change split.
Deferred the env.fetch("GITHUB_EVENT_NAME", nil) style preference to avoid restarting final-candidate gates for a no-behavior nit.

Readiness evidence:

Review threads after cutoff: 0 unresolved.
PR_BATCH_SKILL_DIR=.agents/skills/pr-batch .agents/skills/pr-batch/bin/pr-ci-readiness 4176 --repo shakacode/react_on_rails -> READY with required checks.
script/pr-merge-ledger 4176 --strict --pretty --changelog-classification not_user_visible -> complete_allowed: true, no violations or unknown fields (/tmp/pr-4176-merge-ledger.json).
Hosted benchmark/build rows are still running/pending; no failures observed in the latest poll.

github-actions · 2026-06-23T10:24:00Z

Pro (shard 2/2) Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
/empty: Pro	1171.36 ▼1.0% (1183.34)	6.75 ▲14.1% (5.92)	10.35 ▲4.4% (9.91)	200=35381
/ssr_shell_error: Pro	124.93 ▼31.1% (181.22)	47.36 ▲11.8% (42.37)	96.15 ▲40.8% (68.31)	200=3778
/ssr_sync_error: Pro	152.82 ▼16.2% (182.26)	48.18 ▲13.1% (42.61)	71.92 ▲4.5% (68.81)	200=4621
/rsc_component_error: Pro	152.38 ▼7.8% (165.31)	47.22 ▲13.0% (41.77)	77.97 ▲12.0% (69.6)	200=4593,3xx=13
/non_existing_stream_react_component: Pro	182.46 ▼6.8% (195.78)	47.8 ▲28.3% (37.27)	66.03 ▲4.4% (63.27)	200=5513
/server_side_redux_app_cached: Pro	300.51 ▼16.2% (358.61)	19.27 ▼4.5% (20.18)	49.67 ▲48.3% (33.49)	200=9082
/loadable: Pro	147.27 ▼13.5% (170.22)	52.15 ▲17.3% (44.45)	79.43 ▲8.8% (73.02)	200=4451
/apollo_graphql: Pro	124.6 ▼0.6% (125.32)	49.68 ▼0.1% (49.71)	76.77 ▼9.1% (84.43)	200=4185,3xx=17
/console_logs_in_async_server: Pro	3.57 ▲11.7% (3.2)	2126.17 ▲0.1% (2123.6)	2143.0 ▼1.8% (2182.84)	200=122
/stream_error_demo: Pro	2.9 ▼94.7% (54.97)	2012.3 ▲18.3% (1700.73)	2044.59 ▲18.3% (1728.09)	200=94
/stream_async_components: Pro	142.25 ▼11.4% (160.55)	37.7 ▼14.3% (44.02)	69.32 ▼3.3% (71.68)	200=4285,3xx=46
/rsc_posts_page_over_http: Pro	141.13 ▼15.4% (166.83)	51.82 ▲10.8% (46.76)	83.93 ▲4.1% (80.64)	200=4268
/rsc_echo_props: Pro	61.34 ▼29.3% (86.75)	126.87 ▲22.3% (103.71)	177.07 ▲4.9% (168.73)	200=1861
/client_side_fouc_probe: Pro	349.34 ▼1.8% (355.84)	22.11 ▲11.0% (19.92)	35.38 ▲9.4% (32.34)	200=10555
/async_on_server_sync_on_client_client_render: Pro	342.81 ▲2.6% (334.03)	22.43 ▲9.8% (20.42)	34.63 ▼5.3% (36.56)	200=10360
/server_router_client_render: Pro	352.93 ▲0.9% (349.81)	21.68 ▲5.0% (20.66)	34.89 ▲3.6% (33.68)	200=10664
/unwrapped_rsc_route_stream_render: Pro	175.38 ▼11.0% (197.14)	42.69 ▲9.0% (39.16)	68.94 ▲5.0% (65.63)	200=5302
/async_render_function_returns_component: Pro	188.88 ▼8.1% (205.51)	29.15 ▼19.6% (36.25)	54.54 ▼5.1% (57.47)	200=5747
/native_metadata: Pro	167.73 ▼12.7% (192.06)	44.31 ▲26.3% (35.08)	69.36 ▲15.8% (59.88)	200=5053,3xx=18
/hybrid_metadata_streaming: Pro	162.63 ▼15.4% (192.2)	46.45 ▲17.9% (39.4)	73.6 ▲15.8% (63.54)	200=4921
/cache_demo: Pro	122.98 ▼23.2% (160.08)	44.38 ▼9.7% (49.14)	74.15 ▼9.4% (81.86)	200=3722
/client_side_hello_world: Pro	324.71 ▼6.9% (348.87)	23.81 ▲18.0% (20.18)	35.43 ▲7.0% (33.11)	200=9752
/client_side_hello_world_shared_store_controller: Pro	218.39 ▼32.5% (323.5)	26.0 ▲21.6% (21.39)	31.7 ▼11.8% (35.96)	200=6643
/server_side_hello_world_shared_store: Pro	107.67 ▼22.4% (138.77)	73.39 ▲29.0% (56.9)	103.68 ▲16.8% (88.76)	200=3256
/server_side_hello_world_shared_store_defer: Pro	92.85 ▼35.0% (142.76)	62.2 ▲13.4% (54.84)	79.39 ▼9.8% (87.97)	200=2829
/server_side_hello_world_hooks: Pro	177.37 ▼11.8% (201.02)	44.23 ▲23.0% (35.96)	68.02 ▲13.5% (59.93)	200=5362
/server_side_log_throw: Pro	177.67 ▼7.8% (192.64)	34.55 ▼8.3% (37.68)	58.87 ▼8.4% (64.25)	200=5356,3xx=16
/source_mapped_prerender_error_probe: Pro	264.65 ▲2.0% (259.41)	32.93 ▲19.3% (27.61)	45.1 ▲3.4% (43.62)	3xx=7997
/server_side_log_throw_raise_invoker: Pro	413.76 ▲0.2% (413.09)	13.57 ▼19.4% (16.84)	25.21 ▼9.2% (27.75)	200=12504
/server_side_redux_app: Pro	174.1 ▼13.0% (200.21)	35.99 ▼7.0% (38.71)	61.15 ▼3.8% (63.59)	200=5264
/server_side_redux_app_cached: Pro	369.42 ▲3.0% (358.61)	21.52 ▲6.7% (20.18)	34.27 ▲2.3% (33.49)	200=11162
/render_js: Pro	388.14 ▲1.7% (381.76)	20.19 ▲7.0% (18.88)	29.38 ▼7.1% (31.61)	200=11728
/pure_component: Pro	191.08 ▼7.9% (207.42)	45.77 ▲24.9% (36.64)	61.68 ▼3.2% (63.71)	200=5776
/turbolinks_cache_disabled: Pro	361.93 ▼3.0% (373.12)	21.11 ▲9.8% (19.23)	31.61 ▲1.3% (31.19)	200=10952
/xhr_refresh: Pro	128.81 ▼15.0% (151.49)	59.22 ▲10.3% (53.67)	82.28 ▼1.8% (83.79)	200=3897
/broken_app: Pro	182.07 ▼12.7% (208.53)	43.23 ▲16.9% (36.97)	66.12 ▲9.7% (60.26)	200=5470
/server_render_with_timeout: Pro	60.46 ▼47.6% (115.29)	113.25 ▲15.6% (97.93)	121.53 ▲5.7% (114.93)	200=1839

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

github-actions · 2026-06-23T10:25:05Z

Pro (shard 1/2) Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
/: Pro	50.68 ▼29.3% (71.7)	156.04 ▲14.6% (136.13)	222.55 ▲17.0% (190.22)	200=1535
/error_scenarios_hub: Pro	388.15 ▲7.5% (360.92)	19.55 ▼2.4% (20.04)	29.53 ▼9.6% (32.66)	200=11731
/ssr_async_error: Pro	2.9 ▼95.5% (64.03)	2011.5 ▲20.5% (1669.01)	2062.99 ▲20.6% (1710.65)	200=94
/ssr_async_prop_error: Pro	1.52 ▼97.5% (60.73)	5021.5 ▲20.7% (4159.95)	5451.1 ▼24.5% (7217.46)	200=55
/non_existing_react_component: Pro	158.19 ▼24.5% (209.58)	36.79 ▼1.9% (37.52)	80.09 ▲14.8% (69.77)	200=4783
/non_existing_rsc_payload: Pro	186.01 ▼13.2% (214.4)	39.94 ▲9.6% (36.43)	65.19 ▲8.0% (60.36)	200=5622
/cached_react_helmet: Pro	397.68 ▲6.2% (374.62)	19.76 ▲1.3% (19.51)	31.49 ▼1.0% (31.8)	200=12020
/cached_redux_component: Pro	339.63 ▼10.8% (380.58)	13.55 ▼29.5% (19.22)	19.87 ▼38.1% (32.09)	200=10331
/lazy_apollo_graphql: Pro	146.84 ▲3.8% (141.51)	49.22 ▼3.6% (51.06)	72.68 ▼11.5% (82.09)	200=4443
/redis_receiver: Pro	87.85 ▼7.9% (95.38)	60.48 ▼11.3% (68.19)	141.09 ▼1.7% (143.5)	200=2678
/stream_shell_error_demo: Pro	139.54 ▼20.9% (176.45)	53.81 ▲35.7% (39.67)	80.19 ▲12.2% (71.48)	200=4201,3xx=19
/test_incremental_rendering: Pro	2.9 ▼95.7% (67.9)	2011.25 ▲20.6% (1667.8)	2027.43 ▲20.2% (1687.21)	200=94
/rsc_posts_page_over_redis: Pro	92.15 ▼3.8% (95.83)	63.93 ▼10.2% (71.18)	128.77 ▲9.2% (117.97)	200=2787
/rsc_fouc_probe: Pro	151.21 ▼11.0% (169.97)	30.28 ▼29.2% (42.77)	48.9 ▼31.3% (71.18)	200=4605
/async_on_server_sync_on_client: Pro	2.17 ▼96.4% (60.69)	3011.47 ▲20.6% (2496.65)	3490.78 ▼1.6% (3546.24)	200=73
/server_router: Pro	165.71 ▼12.8% (190.11)	44.81 ▲9.7% (40.86)	74.4 ▲5.3% (70.63)	200=5009
/unwrapped_rsc_route_client_render: Pro	402.72 ▲7.6% (374.28)	13.78 ▼27.7% (19.07)	25.93 ▼16.3% (30.99)	200=12173
/async_render_function_returns_string: Pro	158.23 ▼21.2% (200.9)	36.09 ▼1.0% (36.44)	113.2 ▲88.6% (60.02)	200=4784
/async_components_demo: Pro	7.62 ▼83.2% (45.28)	1027.11 ▲19.8% (857.17)	1059.03 ▲17.8% (899.17)	200=239
/stream_native_metadata: Pro	180.23 ▼11.3% (203.25)	30.18 ▼22.6% (39.01)	56.75 ▼11.1% (63.81)	200=5488
/rsc_native_metadata: Pro	8.63 ▼87.4% (68.5)	1011.07 ▲20.3% (840.28)	1021.16 ▲17.3% (870.51)	200=271
/react_intl_rsc_demo: Pro	87.68 ▼35.9% (136.87)	85.38 ▲29.7% (65.82)	128.55 ▲20.2% (106.92)	200=2638
/client_side_hello_world_shared_store: Pro	367.8 ▲6.3% (345.94)	20.78 ▲1.5% (20.47)	30.52 ▼8.2% (33.26)	200=11122
/client_side_hello_world_shared_store_defer: Pro	369.58 ▲7.9% (342.65)	20.43 ▼1.8% (20.81)	30.61 ▼6.2% (32.65)	200=11169
/server_side_hello_world_shared_store_controller: Pro	131.2 ▼6.7% (140.58)	66.64 ▲24.4% (53.57)	91.0 ▲2.5% (88.77)	200=3944
/server_side_hello_world: Pro	191.96 ▼5.6% (203.25)	38.18 ▲5.8% (36.09)	61.26 ▲2.2% (59.93)	200=5794,3xx=9
/client_side_log_throw: Pro	401.75 ▲8.7% (369.52)	18.72 ▼1.7% (19.05)	28.18 ▼11.8% (31.97)	200=12140
/server_side_log_throw_plain_js: Pro	420.44 ▲11.7% (376.57)	18.86 ▼0.2% (18.9)	29.25 ▼8.8% (32.07)	200=12703
/server_side_log_throw_raise: Pro	198.44 ▼36.6% (313.19)	29.95 ▲12.1% (26.72)	65.75 ▲49.7% (43.91)	3xx=5999
/server_side_hello_world_es5: Pro	163.21 ▼21.0% (206.66)	32.64 ▼11.7% (36.95)	60.11 ▲0.4% (59.88)	200=4950,3xx=18
/server_side_hello_world_with_options: Pro	194.23 ▼0.2% (194.68)	38.78 ▲5.6% (36.74)	62.12 ▼0.4% (62.37)	200=5871
/client_side_manual_render: Pro	408.41 ▲13.3% (360.35)	18.48 ▼4.7% (19.39)	27.95 ▼7.6% (30.24)	200=12343
/react_router: Pro	212.07 ▼0.8% (213.85)	34.42 ▼4.6% (36.07)	57.7 ▼2.1% (58.93)	200=6410
/css_modules_images_fonts_example: Pro	199.01 ▼2.7% (204.44)	30.76 ▼18.7% (37.83)	53.01 ▼10.4% (59.17)	200=6003,3xx=13
/rendered_html: Pro	208.1 ▲4.3% (199.5)	37.12 ▲0.8% (36.82)	54.93 ▼6.8% (58.92)	200=6291
/react_helmet: Pro	168.06 ▲10.4% (152.2)	39.79 ▼28.0% (55.28)	70.32 ▼15.0% (82.71)	200=5084
/image_example: Pro	206.64 ▲3.5% (199.6)	36.54 ▲1.2% (36.1)	57.81 ▼0.3% (58.01)	200=6244
/posts_page: Pro	138.09 ▼6.4% (147.47)	52.23 ▲2.8% (50.8)	75.59 ▼5.4% (79.92)	200=4171,3xx=5

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

justin808 · 2026-06-23T11:13:02Z

Batch closeout confidence note for head d30b12e8186ca2f7bc983b5c76d8f8f957031e46:

Validated locally: focused benchmark tracker specs and local CI evidence recorded earlier in this PR; final branch diff remains inside benchmarks/track_benchmarks.rb, benchmarks/lib/**, and benchmarks/spec/**.
Hosted/PR gates: pr-ci-readiness reports READY; full check list is 44 passed, 0 pending, 0 failing; mergeStateStatus is CLEAN.
Review triage: current cutoff 2026-06-23T10:22:41Z; unresolved review threads: 0.
Merge ledger: script/pr-merge-ledger 4176 --strict --pretty --changelog-classification not_user_visible is clean (complete_allowed: true, no unknown fields, no violations).
UNKNOWN: none.
Residual risk: behavior-preserving benchmark tooling split; no intended benchmark behavior change.

Final batch state: ready-no-merge-authority. The batch goal has merge_authority: ask, so I am not merging.

Split benchmark tracking script

b59acb5

justin808 added the benchmark label Jun 22, 2026

github-actions Bot added the ready-for-hosted-ci Run optimized hosted GitHub CI for this PR label Jun 22, 2026

claude Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benchmarks/lib/track_benchmarks/regression_payloads.rb Outdated

claude Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benchmarks/lib/track_benchmarks/confirmation.rb Outdated

claude Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benchmarks/lib/track_benchmarks/cli.rb Outdated

claude Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benchmarks/spec/track_benchmarks/branch_args_spec.rb

coderabbitai Bot requested changes Jun 22, 2026

View reviewed changes

Comment thread benchmarks/lib/track_benchmarks/confirmation.rb Outdated

Comment thread benchmarks/lib/track_benchmarks/summary.rb

Address benchmark tracker review feedback

6443ebc

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread benchmarks/lib/track_benchmarks/regression_payloads.rb Outdated