Skip to content

Split benchmark tracking script#4176

Merged
justin808 merged 5 commits into
mainfrom
codex/3459-track-benchmarks-split
Jun 25, 2026
Merged

Split benchmark tracking script#4176
justin808 merged 5 commits into
mainfrom
codex/3459-track-benchmarks-split

Conversation

@justin808

@justin808 justin808 commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #3459 by splitting benchmarks/track_benchmarks.rb into focused modules without intended behavior changes:

  • Keeps the executable entrypoint in benchmarks/track_benchmarks.rb.
  • Adds focused implementation files under benchmarks/lib/track_benchmarks/**.
  • Adds focused specs under benchmarks/spec/track_benchmarks/**.

Validation

  • script/ci-changes-detector origin/main
  • bundle exec rspec benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
  • bundle exec rspec benchmarks/spec
  • Ruby syntax checks for benchmarks/track_benchmarks.rb, benchmarks/lib/track_benchmarks.rb, benchmarks/lib/track_benchmarks/**/*.rb, and focused specs.
  • bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks.rb benchmarks/lib/track_benchmarks benchmarks/spec/track_benchmarks
  • Safe missing-input smoke: BENCHMARK_JSON=/tmp/codex-missing-benchmark.json bundle exec ruby benchmarks/track_benchmarks.rb exits 1 as expected.
  • Pre-push hook: branch RuboCop and markdown-links passed.

Review / Churn Notes

  • codex review --base origin/main completed with no correctness/security/maintainability findings.
  • claude -p '/code-review origin/main' completed with no correctness findings; it noted optional maintainability cleanup around compatibility shims and memoization, intentionally kept out of scope for a behavior-preserving split.
  • bin/ci-local --changed was attempted by the worker and failed in pre-existing repo-wide RuboCop offenses under react_on_rails/spike/3313_prism_gemfile_rewriter/**, outside this lane's file-touch map.
  • claude -p '/simplify origin/main' was run under a 180s timeout and exited with no output.

Codex Decision Log

  • Non-blocking: Multiple downloaded confirmation candidate payloads are now treated as inconclusive instead of choosing Dir.glob(...).first.
    • Decision: Keep the defensive guard and document it as artifact-corruption handling.
    • Why: Arbitrary .first could confirm or clear the wrong benchmark payload when the artifact tree is malformed.
    • Review later: None; this does not change normal benchmark execution or valid single-candidate confirmation behavior.

Changelog

Not user-visible; no changelog entry.

Confidence note:

  • Validated: commands above plus pre-push hook.
  • Evidence: local command output; PR branch codex/3459-track-benchmarks-split.
  • UNKNOWN: hosted CI/review status until GitHub checks and reviewers complete.
  • Residual risk: benchmark behavior is intended to be unchanged, but full live benchmark execution is left to hosted/maintainer-selected benchmark validation.

Summary by CodeRabbit

  • New Features
    • Added confirmation rerun flow that classifies confirmed, cleared, or inconclusive results and publishes step-summary output.
    • Added regression handoff candidate/confirmed JSON payloads and display-sidecar rendering for improved report content.
  • Bug Fixes
    • Improved resiliency with normalized exit codes and automated retries when start-point hashes aren’t usable.
    • Made main-branch regression candidate handling more non-fatal and added safer PR comment updating.
  • Refactor
    • Modularized benchmark tracking into dedicated components and converted the main entry script into a thin delegator.
  • Tests
    • Expanded RSpec coverage for branching/start-point args, summary rendering, payload writing, confirmation loading, and CLI orchestration.

Merge Rationale

Maintainer merge authority was granted on 2026-06-24 after this PR reached ready state.

Reasons to merge:

  • Implements issue Benchmark workflow follow-ups: unify suite jobs, split track_benchmarks, JSON-based regression detection #3459 by splitting benchmarks/track_benchmarks.rb into focused modules/specs without intended benchmark behavior changes.
  • Current head d30b12e8186ca2f7bc983b5c76d8f8f957031e46 is mergeable (mergeStateStatus: CLEAN) with 44 checks passed, 0 pending, and 0 failing.
  • Required readiness is green, strict merge ledger is clean (complete_allowed: true, no unknown fields, no violations), and unresolved review threads are 0.
  • The diff stays inside the assigned benchmark file-touch map: benchmarks/track_benchmarks.rb, benchmarks/lib/**, and benchmarks/spec/**.

Confidence note:

  • Validated: focused benchmark specs and syntax/RuboCop checks listed above, pr-ci-readiness, full hosted checks, review-thread triage, and script/pr-merge-ledger 4176 --strict --pretty --changelog-classification not_user_visible.
  • Evidence: closeout comment Split benchmark tracking script #4176 (comment); ledger artifact /tmp/pr-4176-merge-ledger-premerge.json.
  • UNKNOWN: none.
  • Residual risk: behavior-preserving benchmark tooling split; no intended benchmark behavior change.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

benchmarks/track_benchmarks.rb (~400 lines of inline logic) is extracted into eight focused modules under benchmarks/lib/track_benchmarks/: Config, BranchArgs, BencherRun, Summary, RegressionPayloads, Confirmation, PrComments, and Cli. The original script is reduced to Config-aliased constants, thin delegating wrappers, and a single Cli.new(...).run call. RSpec tests are added for BranchArgs, RegressionPayloads, Summary, Confirmation, and Cli.

Changes

Modular TrackBenchmarks extraction

Layer / File(s) Summary
Config constants and module entry-point
benchmarks/lib/track_benchmarks/config.rb, benchmarks/lib/track_benchmarks.rb
TrackBenchmarks::Config centralizes ENV-backed JSON path constants for benchmark/report/display files and candidate/confirmed hand-off payloads, defines a candidate input directory, and provides env! to enforce required environment variables with ::warning:: and exit 1 on missing keys. benchmarks/lib/track_benchmarks.rb wires require_relative for all internal modules and GitHub/CLI/reporting components.
Branch selection and Bencher execution
benchmarks/lib/track_benchmarks/branch_args.rb, benchmarks/lib/track_benchmarks/bencher_run.rb, benchmarks/spec/track_benchmarks/branch_args_spec.rb
BranchArgs routes GITHUB_EVENT_NAME to branch and start-point args: push returns "main" with no start-point args; pull_request returns head ref plus base SHA start-point args with clone/reset flags; workflow_dispatch queries GitHub API for merge-base SHA and embeds --start-point-hash (falls back gracefully on API failure); confirmation mode synthesizes a unique branch name and main start-point args. BencherRun executes the Bencher runner, classifies errors, normalizes exit code for stale/filtered alerts (forced to 0 with notice), detects retry eligibility when stderr indicates missing Head Version baseline and report is not a regression, and strips --start-point-hash for retry. RSpec specs cover all event types, GitHub API success/failure paths, and unexpected event names.
Summary rendering and PR comment gating
benchmarks/lib/track_benchmarks/summary.rb, benchmarks/lib/track_benchmarks/pr_comments.rb, benchmarks/spec/track_benchmarks/summary_spec.rb
Summary safely ingests JSON display-sidecar rows (with Github.warning diagnostics for malformed/missing/unreadable files and safe [] fallbacks), renders Markdown benchmark tables, appends suite-prefixed reports to GitHub step summary, generates regression handoff summaries with run-URL fallback when tables are empty, and extracts/deduplicates regressed benchmark names and alert pairs from report alerts. PrComments.replace gates comment replacement to pull_request events with non-empty markdown. RSpec specs validate sidecar parsing (success, malformed JSON, missing files, ENOENT/EACCES/EIO errors) and alert-pair deduplication.
Candidate and confirmed regression payload writing
benchmarks/lib/track_benchmarks/regression_payloads.rb, benchmarks/spec/track_benchmarks/regression_payloads_spec.rb
RegressionPayloads detects main-push context via GitHub event/ref environment variables; report_main_push_candidate treats detected regressions as non-fatal by writing the candidate payload (fails only if write fails) and treats operational failures as fatal. write_candidate produces suite/shard identity, regression handoff summary, regressed benchmark names, and alert-pair JSON to the candidate path and emits ::notice:: with return true; returns false on exceptions with ::error::. write_confirmed writes confirmed hand-off JSON including both first-run and confirmation markdown summaries, confirmed alert pairs, and unique regressed benchmark names extracted from confirmed_alerts. RSpec specs validate payload structure, alert preservation, benchmark deduplication, summary fallback wrapping, and return values.
Confirmation rerun flow
benchmarks/lib/track_benchmarks/confirmation.rb, benchmarks/spec/track_benchmarks_spec.rb
Confirmation loads and validates candidate JSON from glob (returns [nil, ""] for missing/multiple/unreadable candidates with ::error:: output), classifies rerun outcomes as :inconclusive (missing report or non-zero exit without regression), :cleared (confirmed overlap empty), or :confirmed (non-empty confirmed overlap) by combining candidate actionable alerts and report regressed alert pairs, appends suite-level markdown summary indicating status and confirmed alerts, orchestrates confirmed payload writing (downgrades to :inconclusive on write failure), and exits with status 0 for :confirmed/:cleared and 1 for inconclusive. RSpec specs validate all candidate discovery inconclusive scenarios (missing, multiple, access failures) with expected ::error:: stderr output.
Cli orchestrator and top-level script refactoring
benchmarks/lib/track_benchmarks/cli.rb, benchmarks/track_benchmarks.rb, benchmarks/spec/track_benchmarks/cli_spec.rb
Cli orchestrates the full run: branch/start-point selection, Bencher execution, optional retry without --start-point-hash, exit-code normalization, single Markdown render and post to job summary, then mode-dependent dispatch (confirmation via Confirmation.run vs PR report posting with post_pull_request_report), and main-push regression payload reporting on nonzero exit. post_pull_request_report gates on pull_request events, preserves prior comment on operational failure, substitutes regression handoff fallback with run-URL when table is empty, otherwise replaces with rendered report via PrComments.replace and PrReportPoster. track_benchmarks.rb reduced to Config-aliased constants, delegator wrapper methods forwarding to TrackBenchmarks::* modules, and single Cli.new(suite_name: SUITE_NAME, report_marker: REPORT_MARKER).run invocation. RSpec spec validates PR posting logic with and without GITHUB_EVENT_NAME environment variable.

Sequence Diagram(s)

sequenceDiagram
  participant Cli
  participant BranchArgs
  participant GithubCli
  participant BencherRun
  participant BencherRunner

  Cli->>BranchArgs: branch_and_start_point_args(env)
  alt confirmation mode
    BranchArgs-->>Cli: [synthetic_branch, main start-point args]
  else push
    BranchArgs-->>Cli: ["main", []]
  else pull_request
    BranchArgs-->>Cli: [head_ref, base_sha start-point args]
  else workflow_dispatch (not main)
    BranchArgs->>GithubCli: capture compare main...branch
    GithubCli-->>BranchArgs: merge-base SHA or failure
    BranchArgs-->>Cli: [branch, start-point args ± hash]
  end
  Cli->>BencherRun: run_bencher!(runner, branch, start_point_args)
  BencherRun->>BencherRunner: run(branch, start_point_args)
  BencherRunner-->>BencherRun: exit_code, report, stderr
  alt retry_without_start_point_hash?
    BencherRun->>BencherRun: without_start_point_hash(args)
    BencherRun->>BencherRunner: run(branch, stripped_args)
    BencherRunner-->>BencherRun: retry_exit_code, report
  end
  BencherRun->>BencherRun: normalized_exit_code(exit_code, report)
  BencherRun-->>Cli: final_exit_code, report
Loading
sequenceDiagram
  participant Script as track_benchmarks.rb
  participant Cli
  participant BranchArgs
  participant BencherRun
  participant Summary
  participant PrComments
  participant PrReportPoster
  participant Confirmation
  participant RegressionPayloads

  Script->>Cli: new(suite_name, report_marker).run
  Cli->>BranchArgs: branch_and_start_point_args
  BranchArgs-->>Cli: branch, start_point_args
  Cli->>BencherRun: run_bencher!(runner, branch, args)
  BencherRun-->>Cli: exit_code, report, stderr
  alt retry eligibility met
    Cli->>BencherRun: retry_without_start_point_hash
    BencherRun-->>Cli: retry_exit_code, report
  end
  Cli->>BencherRun: normalized_exit_code(exit_code, report)
  Cli->>Summary: rendered_report(report)
  Cli->>Summary: post_report_to_summary(markdown, suite_name)
  alt confirmation mode
    Cli->>Confirmation: run(report, exit_code, markdown, suite_name)
    Confirmation-->>Script: exit(status)
  else PR mode
    Cli->>PrComments: replace(markdown)
    PrComments->>PrReportPoster: replace(markdown) if pull_request
  end
  alt main-push with nonzero exit
    Cli->>RegressionPayloads: report_main_push_candidate
    RegressionPayloads->>RegressionPayloads: write_candidate
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • shakacode/react_on_rails#3448: The main PR refactors the benchmarks/track_benchmarks.rb CI tracking entrypoint into new benchmarks/lib/track_benchmarks/* modules, which directly extracts and modularizes the inline logic introduced in PR #3448.
  • shakacode/react_on_rails#3810: Introduced the candidate→confirm→confirmed artifact flow and confirmation-mode branching in track_benchmarks.rb that this PR modularizes into Confirmation, RegressionPayloads, and BranchArgs.
  • shakacode/react_on_rails#3830: Consumes the confirmed regression payload's ALERTS benchmark+measure pairs written by RegressionPayloads.write_confirmed introduced in this PR for rendering the "What regressed" section in regression issue bodies.

Suggested labels

enhancement, review-needed, benchmark

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Split benchmark tracking script' accurately describes the main refactoring of dividing benchmarks/track_benchmarks.rb into focused modules.
Linked Issues check ✅ Passed The PR directly addresses the 'Split benchmarks/track_benchmarks.rb into focused modules' task from issue #3459 by extracting seven responsibilities into distinct, testable modules with full test coverage.
Out of Scope Changes check ✅ Passed All changes align with the scope of splitting track_benchmarks.rb into modular implementations with test coverage. No unrelated refactoring or scope creep detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/3459-track-benchmarks-split

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@justin808

Copy link
Copy Markdown
Member Author

+ci-status

@justin808

Copy link
Copy Markdown
Member Author

+ci-run-hosted

@github-actions

Copy link
Copy Markdown
Contributor

CI Status

Head SHA: b59acb55bf85
Changed files: 13
Docs-only heuristic (matches ci-changes-detector metadata paths): no
ready-for-hosted-ci label: absent
force-full-hosted-ci label: absent
Current hosted-CI waiver: not present for this SHA

Only the required gate is active unless hosted CI is requested.

@github-actions github-actions Bot added the ready-for-hosted-ci Run optimized hosted GitHub CI for this PR label Jun 22, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Hosted CI Requested

Triggered 9 workflow(s) for b59acb55bf85.
Mode: optimized hosted CI (path-selected by script/ci-changes-detector).
Added ready-for-hosted-ci, so future commits will keep running optimized hosted CI until +ci-stop-hosted is used.

View progress in the Actions tab.

@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown

Greptile Summary

This PR splits the monolithic benchmarks/track_benchmarks.rb script into focused modules under benchmarks/lib/track_benchmarks/** without intended behavior changes. The entrypoint becomes a thin shim that delegates to TrackBenchmarks::Cli and provides top-level compatibility functions for callers that previously relied on the script's top-level methods.

  • Extraction of seven modules (Config, BranchArgs, Summary, BencherRun, PrComments, RegressionPayloads, Confirmation) and one orchestrating class (Cli) with a correct dependency load order in lib/track_benchmarks.rb.
  • Three new spec files covering BranchArgs.branch_and_start_point_args (including the workflow_dispatch merge-base paths), RegressionPayloads.write_candidate/write_confirmed, and Summary.display_rows/regressed_alert_pairs.
  • Compatibility shim constants (BENCHMARK_JSON, etc.) and top-level wrapper functions in track_benchmarks.rb are preserved for require-only callers, while actual script execution routes through Cli#run.

Confidence Score: 5/5

Safe to merge — this is a pure structural refactoring with no logic changes and the added specs exercise the newly isolated units.

All extracted modules faithfully reproduce the original script's logic. The load order in the new loader file respects every inter-module dependency. The compatibility shims in the entrypoint preserve the existing top-level API for require-only callers, and lazy evaluation of PrReportPoster env vars on non-PR runs is correctly maintained via block-yielded poster in PrComments.replace. The retry-without-start-point-hash path, the confirmation flow, and the candidate/confirmed hand-off payloads all match the pre-split behaviour line-for-line. Three well-structured specs are added that cover the units most amenable to unit testing.

No files require special attention.

Important Files Changed

Filename Overview
benchmarks/track_benchmarks.rb Entrypoint trimmed from 445 to ~155 lines; now delegates all logic to Cli#run and keeps shim constants/functions for require-only callers. Behavior is unchanged.
benchmarks/lib/track_benchmarks.rb New loader file that requires all submodules in the correct dependency order (Config → BranchArgs → Summary → BencherRun → PrComments → RegressionPayloads → Confirmation → Cli).
benchmarks/lib/track_benchmarks/cli.rb New Cli class orchestrating the full benchmark tracking run; faithfully mirrors the old FILE==$PROGRAM_NAME block, including the start-point-hash retry, summary posting, confirmation branch, and PR comment replacement logic.
benchmarks/lib/track_benchmarks/config.rb Centralises path constants and the env! helper; constants are evaluated once at load time from ENV, matching the old top-level constant definitions exactly.
benchmarks/lib/track_benchmarks/branch_args.rb Extracts branch/start-point resolution for push, pull_request, workflow_dispatch, and confirmation modes; workflow_dispatch_args correctly marked private_class_method.
benchmarks/lib/track_benchmarks/confirmation.rb Confirmation rerun classification and hand-off; rescue block in load_candidate correctly references path only after it is guaranteed non-nil.
benchmarks/lib/track_benchmarks/regression_payloads.rb Candidate and confirmed hand-off writers extracted cleanly; error handling and exit behavior match the original script.
benchmarks/lib/track_benchmarks/summary.rb Report rendering and step-summary helpers extracted; display_rows now takes display_json as a parameter (vs the old DISPLAY_JSON constant), enabling clean unit testing.
benchmarks/lib/track_benchmarks/pr_comments.rb Thin PR-comment gate; uses a keyword-argument default for event_name and a block-yielded poster, preserving lazy evaluation of PrReportPoster env vars on non-PR runs.
benchmarks/lib/track_benchmarks/bencher_run.rb Bencher execution helpers and exit-code classification extracted faithfully; without_start_point_hash correctly dups the args array before slicing.
benchmarks/spec/track_benchmarks/branch_args_spec.rb New spec covers pull_request, workflow_dispatch success, and workflow_dispatch API-failure paths for branch_and_start_point_args.
benchmarks/spec/track_benchmarks/regression_payloads_spec.rb New spec validates write_candidate and write_confirmed payload structure including deduplication of benchmark names.
benchmarks/spec/track_benchmarks/summary_spec.rb New spec covers display_rows happy path, malformed-JSON fallback, and regressed_alert_pairs deduplication.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["track_benchmarks.rb\n(__FILE__ == $PROGRAM_NAME)"] -->|"require_relative 'lib/track_benchmarks'"| B["lib/track_benchmarks.rb\n(loader)"]
    B --> C["Config"]
    B --> D["BranchArgs"]
    B --> E["Summary"]
    B --> F["BencherRun"]
    B --> G["PrComments"]
    B --> H["RegressionPayloads"]
    B --> I["Confirmation"]
    B --> J["Cli"]

    A -->|"Cli.new(suite_name:, report_marker:).run"| J

    J --> D
    J --> F
    J --> E
    J --> G
    J --> H
    J --> I

    H --> E
    H --> C
    I --> H
    I --> E
    I --> C
    F --> E

    subgraph "Compatibility shims (top-level methods)"
        K["confirmation_mode?\nslugify\nrun_bencher!\nmain_push?\n..."]
    end
    A --> K
    K -->|"delegates"| D
    K -->|"delegates"| F
    K -->|"delegates"| H
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["track_benchmarks.rb\n(__FILE__ == $PROGRAM_NAME)"] -->|"require_relative 'lib/track_benchmarks'"| B["lib/track_benchmarks.rb\n(loader)"]
    B --> C["Config"]
    B --> D["BranchArgs"]
    B --> E["Summary"]
    B --> F["BencherRun"]
    B --> G["PrComments"]
    B --> H["RegressionPayloads"]
    B --> I["Confirmation"]
    B --> J["Cli"]

    A -->|"Cli.new(suite_name:, report_marker:).run"| J

    J --> D
    J --> F
    J --> E
    J --> G
    J --> H
    J --> I

    H --> E
    H --> C
    I --> H
    I --> E
    I --> C
    F --> E

    subgraph "Compatibility shims (top-level methods)"
        K["confirmation_mode?\nslugify\nrun_bencher!\nmain_push?\n..."]
    end
    A --> K
    K -->|"delegates"| D
    K -->|"delegates"| F
    K -->|"delegates"| H
Loading

Reviews (1): Last reviewed commit: "Split benchmark tracking script" | Re-trigger Greptile

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Code Review — PR #4176: Split benchmark tracking script

Summary: This PR cleanly refactors a 511-line monolithic script into focused modules under benchmarks/lib/track_benchmarks/ plus a Cli orchestrator, while adding a focused spec sub-directory. Behavior is preserved. The individual modules are well-organized and the test suite is solid. A few areas worth attention:


1. Orchestration path (Cli#run) has no direct test coverage

The new Cli class is the runtime entry point that wires all modules together, but no spec exercises it. track_benchmarks_spec.rb calls the top-level shim methods (which delegate to individual modules), so a bug in Cli#run's orchestration — wrong method call order, wrong argument, missing guard — would be invisible.

For example, the retry logic in Cli#retry_without_start_point_hash is only tested indirectly via the retry_without_start_point_hash? shim, but the end-to-end [retry_result.exit_code, retry_result.report] unwrapping path is never exercised by specs.

A minimal describe TrackBenchmarks::Cli spec with a stubbed BencherRunner would close this gap.


2. BranchArgs missing coverage for "push" and unknown-event branches

branch_args_spec.rb covers pull_request and workflow_dispatch (both happy and failing paths). The "push" branch (which returns ["main", []]) and the unknown-event early-exit are untested. The unknown-event branch calls exit 1 — worth pinning with an expect { ... }.to raise_error(SystemExit) test since it's a safety net for misconfigured callers.


3. regression_handoff_summary emits a misleading "on main" error in confirmation context

In RegressionPayloads#write_confirmed (line 68):

RegressionReport::CONFIRMATION_SUMMARY => Summary.regression_handoff_summary(confirmation_markdown),

If confirmation_markdown is empty (display sidecar missing during the confirmation run), regression_handoff_summary emits:

::error::Bencher flagged a regression on main but the summary table could not be rendered

This fires during a confirmation run, not an initial main-push run. The phrase "on main" is misleading in that context. Consider passing a context label to the method or handling the empty case at the call site in write_confirmed.


4. load_candidate rescue references potentially-uninitialized path

In Confirmation#load_candidate (line 36):

rescue StandardError => e
  warn "::error::Could not read confirmation candidate #{path} ..."

If Dir.glob itself raises, path = never completes its assignment, and Ruby yields nil for path in the rescue block, producing a misleading message like "Could not read confirmation candidate ". Cosmetic fix: warn "...candidate #{path || dir} ...".


5. ENV.fetch inconsistency between Cli and PrComments (pre-existing)

Cli#post_pull_request_report uses ENV.fetch("GITHUB_EVENT_NAME") (raises KeyError if unset), while PrComments.replace uses ENV.fetch("GITHUB_EVENT_NAME", nil) (returns nil). This is a pre-existing inconsistency now isolated in its own method — worth aligning to the nil-default form for consistency and safety in non-CI environments.


Minor observations (no action required)

  • The 25+ shim methods in track_benchmarks.rb are dead code at runtime (exist only for test compatibility). Future cleanup could update track_benchmarks_spec.rb to call module methods directly and remove the shims.
  • BranchArgs.workflow_dispatch_args uses private_class_method on a module_function-declared method — correct Ruby but uncommon; a short comment explaining the intent would help future readers.

Comment thread benchmarks/lib/track_benchmarks/regression_payloads.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/confirmation.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/cli.rb Outdated
Comment thread benchmarks/spec/track_benchmarks/branch_args_spec.rb

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/lib/track_benchmarks/confirmation.rb`:
- Around line 13-18: The current implementation at line 13 blindly takes the
first glob result when searching for the candidate filename, which can select
the wrong file if multiple candidate files exist under the artifact directory
tree. Instead of using .first on the Dir.glob result, implement logic to
disambiguate between multiple matches by validating there is exactly one
candidate file or by selecting the correct one based on appropriate criteria
(such as shard or suite information). Add appropriate handling for the ambiguous
case where multiple candidates are found, either by raising an error or treating
it as inconclusive.

In `@benchmarks/lib/track_benchmarks/summary.rb`:
- Around line 21-35: The code in the display_rows method checks file existence
before reading, but a race condition can occur where the file is deleted or
becomes unreadable between the File.exist? check on line 21 and the File.read
call on line 23, causing unhandled Errno exceptions to crash the path. Extend
the rescue clause to catch file I/O errors (Errno exceptions) in addition to
JSON::ParserError, logging a warning and returning an empty array when file read
failures occur, ensuring graceful degradation with a warning instead of a crash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e4f23a64-28a1-45a3-80b4-c3addffc697f

📥 Commits

Reviewing files that changed from the base of the PR and between 5290f17 and b59acb5.

📒 Files selected for processing (13)
  • benchmarks/lib/track_benchmarks.rb
  • benchmarks/lib/track_benchmarks/bencher_run.rb
  • benchmarks/lib/track_benchmarks/branch_args.rb
  • benchmarks/lib/track_benchmarks/cli.rb
  • benchmarks/lib/track_benchmarks/config.rb
  • benchmarks/lib/track_benchmarks/confirmation.rb
  • benchmarks/lib/track_benchmarks/pr_comments.rb
  • benchmarks/lib/track_benchmarks/regression_payloads.rb
  • benchmarks/lib/track_benchmarks/summary.rb
  • benchmarks/spec/track_benchmarks/branch_args_spec.rb
  • benchmarks/spec/track_benchmarks/regression_payloads_spec.rb
  • benchmarks/spec/track_benchmarks/summary_spec.rb
  • benchmarks/track_benchmarks.rb

Comment thread benchmarks/lib/track_benchmarks/confirmation.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/summary.rb
@justin808

Copy link
Copy Markdown
Member Author

Address-review summary for PR #4176

Scan scope: full history (no prior address-review checkpoint).
Head SHA: 6443ebc6f8b90dd1da853a90df030195873d1de4.

Outcomes:

  • Fixed inline: confirmation candidate disambiguation, safe candidate path reporting, display sidecar read/error handling, unset GITHUB_EVENT_NAME PR detection, push/unexpected-event branch-arg specs, and empty confirmation-summary preservation.
  • Added focused specs for the review cases, including benchmarks/spec/track_benchmarks/cli_spec.rb.
  • No deferred review items remain from the fetched inline threads.

Validation after fixes:

  • ruby -c on changed benchmark Ruby files
  • bundle exec rspec benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks (59 examples, 0 failures)
  • bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
  • BENCHMARK_JSON=/tmp/react-on-rails-missing-benchmark.json ruby benchmarks/track_benchmarks.rb exited 1 with the expected missing JSON message
  • script/ci-changes-detector origin/main
  • git diff --check

Remaining: current-head hosted CI, benchmark CI, and bot review reruns are pending after this push.

@justin808

Copy link
Copy Markdown
Member Author

+ci-status

@justin808

Copy link
Copy Markdown
Member Author

+ci-run-hosted

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6443ebc6f8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread benchmarks/lib/track_benchmarks/regression_payloads.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/regression_payloads.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/confirmation.rb
Comment thread benchmarks/track_benchmarks.rb
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Code Review

Overall: Clean, well-structured refactor. The module boundaries are sensible, the naming is consistent, and the new specs cover the right edge cases. A few things to resolve before merge:

Two undocumented behavior changes

The PR description says "no intended behavior changes" but two real ones exist:

  1. write_confirmed no longer applies regression_handoff_summary to the confirmation markdown (inline comment on regression_payloads.rb:68). write_candidate still applies the fallback on line 40, so the two hand-off writers are now asymmetric. If this is intentional — because the downstream report-regressions job tolerates an empty CONFIRMATION_SUMMARY — the spec title makes that clear but the PR description should acknowledge it.

  2. Multiple regression-candidate.json files now return [nil, ""] (inconclusive) rather than silently picking .first (inline comment on confirmation.rb:23). This is a better choice and has a spec, but it is a behavior change.

Minor observations

  • private_class_method :workflow_dispatch_args (branch_args.rb:170): correct placement — just noting it is now properly encapsulated where the original was a public top-level helper.
  • Backward-compat shims in track_benchmarks.rb (the def run_bencher!, def load_candidate, etc. that now delegate to the modules): these keep the existing spec surface intact, which is pragmatic. If the spec file ever migrates to calling the module methods directly, the shims can be dropped.
  • BencherRun and Confirmation modules have no dedicated unit specs; coverage comes via the shims in track_benchmarks_spec.rb. Fine for a behavior-preserving refactor, but worth revisiting if these modules grow independently.
  • Top-level constant aliases (BENCHMARK_JSON = TrackBenchmarks::Config::BENCHMARK_JSON, etc.) freeze the ENV.fetch result at require time — same caveat as the old code, just noted for completeness in test contexts.

@github-actions

Copy link
Copy Markdown
Contributor

CI Status

Head SHA: 6443ebc6f8b9
Changed files: 15
Docs-only heuristic (matches ci-changes-detector metadata paths): no
ready-for-hosted-ci label: present
force-full-hosted-ci label: absent
Current hosted-CI waiver: not present for this SHA

Optimized hosted CI is enabled for this PR.

@justin808

Copy link
Copy Markdown
Member Author

+ci-status

@justin808

Copy link
Copy Markdown
Member Author

Address-review update for head 5c14a9d007dfdb8a0ef15eb1ca599dbd70c9a154:

  • Fixed the empty confirmed-summary regression in 5c14a9d00 by routing confirmation summaries through the existing fallback with confirmation-specific wording.
  • Added specs covering empty confirmation fallback and non-empty confirmation preservation.
  • Updated the PR body with the multiple-candidate confirmation decision log.
  • Replied to and resolved the four remaining review threads; the require-time constant concern was auto-deferred as out of scope for this behavior-preserving split.

Validated locally after the final fix:

  • ruby -c benchmarks/track_benchmarks.rb
  • ruby -c benchmarks/lib/track_benchmarks/branch_args.rb
  • ruby -c benchmarks/lib/track_benchmarks/cli.rb
  • ruby -c benchmarks/lib/track_benchmarks/confirmation.rb
  • ruby -c benchmarks/lib/track_benchmarks/regression_payloads.rb
  • ruby -c benchmarks/lib/track_benchmarks/summary.rb
  • bundle exec rspec benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
  • bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
  • BENCHMARK_JSON=/tmp/react-on-rails-missing-benchmark.json ruby benchmarks/track_benchmarks.rb exits 1 as expected
  • script/ci-changes-detector origin/main
  • git diff --check

Confidence note:

  • Validated: commands above plus pre-commit/pre-push hooks.
  • Evidence: final commit 5c14a9d00; resolved review threads; GitHub checks are rerunning on the new head.
  • UNKNOWN: final hosted CI result until the queued checks complete.
  • Residual risk: no intended benchmark behavior change; live benchmark behavior remains covered by hosted/maintainer-selected benchmark validation.

@justin808

Copy link
Copy Markdown
Member Author

+ci-run-hosted

@github-actions

Copy link
Copy Markdown
Contributor

Hosted CI Requested

Triggered 9 workflow(s) for 2c46caf3ea1e.
Mode: optimized hosted CI (path-selected by script/ci-changes-detector).
Added ready-for-hosted-ci, so future commits will keep running optimized hosted CI until +ci-stop-hosted is used.

View progress in the Actions tab.

Comment thread benchmarks/lib/track_benchmarks/cli.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/cli.rb Outdated
Comment thread benchmarks/lib/track_benchmarks/cli.rb Outdated
Comment thread benchmarks/track_benchmarks.rb
Comment thread benchmarks/spec/track_benchmarks/cli_spec.rb
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Code Review

Good structural work — the split into focused modules is clean and the new specs cover the previously un-tested confirmation/payload paths well. Two correctness issues and one maintenance trap to address before merge.

Bug: Cli#run does not propagate injected env (three call sites)

Cli.new accepts env: for test isolation and uses it correctly inside post_pull_request_report. But the three other env-sensitive calls in run all fall through to the real ENV:

# cli.rb – current (broken)
branch, start_point_args = BranchArgs.branch_and_start_point_args          # env: ENV
if BranchArgs.confirmation_mode?                                            # env: ENV
if RegressionPayloads.main_push? && bencher_exit_code != 0                 # env: ENV

Each of those methods already accepts env: as a keyword argument. The fix is:

BranchArgs.branch_and_start_point_args(env:)
BranchArgs.confirmation_mode?(env:)
RegressionPayloads.main_push?(env:)

Because cli_spec only tests post_pull_request_report via send (not the full run path), this slipped through. See inline comments.

Maintenance trap: post_report_to_summary shim references SUITE_NAME

In benchmarks/track_benchmarks.rb:

def post_report_to_summary(markdown)
  TrackBenchmarks::Summary.post_report_to_summary(markdown, SUITE_NAME)
end

SUITE_NAME is only assigned inside the if __FILE__ == $PROGRAM_NAME block. The old code explicitly avoided this pattern (see the now-removed comment on the old rendered_report). Calling this shim from a require-only context raises NameError: uninitialized constant SUITE_NAME. The current specs use modules directly so this won't surface in CI — but it's a footgun for anyone who reaches for the legacy shim. Either add a comment documenting the restriction or remove the shim (it is never called by any existing spec).

Everything else looks good

  • Require order in lib/track_benchmarks.rb is correct (summary before confirmation).
  • The defensive multi-candidate guard in Confirmation.load_candidate (treating >1 candidate as inconclusive instead of .first) is the right call and is well-tested.
  • write_candidate/write_confirmed error handling and the Github.run_url stubs in specs are consistent.
  • Config constants evaluated at load time from ENV matches the prior behavior; no regression.
  • The Confirmation.finish process-exit pattern is unchanged from the original script; no regression.

@justin808

Copy link
Copy Markdown
Member Author

+ci-run-hosted

@justin808

Copy link
Copy Markdown
Member Author

Address-review update for head d30b12e8186ca2f7bc983b5c76d8f8f957031e46:

  • Propagated injected env: through Cli#run to branch selection, confirmation-mode detection, and main-push candidate detection.
  • Added a run-level Cli spec that verifies those injected-env calls.
  • Updated the compatibility shim so post_report_to_summary accepts an explicit suite name and only falls back to the script constant when it exists.
  • Replied to and resolved the five latest review threads.

Validated locally after this fix:

  • ruby -c benchmarks/track_benchmarks.rb
  • ruby -c benchmarks/lib/track_benchmarks/cli.rb
  • ruby -c benchmarks/spec/track_benchmarks/cli_spec.rb
  • bundle exec rspec benchmarks/spec/track_benchmarks/cli_spec.rb benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/track_benchmarks
  • bundle exec rubocop benchmarks/track_benchmarks.rb benchmarks/lib/track_benchmarks/cli.rb benchmarks/spec/track_benchmarks/cli_spec.rb
  • script/ci-changes-detector origin/main
  • git diff --check
  • Pre-commit/pre-push hooks passed, including branch RuboCop.

Confidence note:

  • Validated: commands above plus hooks.
  • Evidence: final commit d30b12e81; resolved review threads; hosted checks are rerunning on the new head.
  • UNKNOWN: final hosted CI result until queued checks complete.
  • Residual risk: no intended benchmark behavior change; live benchmark workflow remains covered by hosted benchmark validation.

@justin808

Copy link
Copy Markdown
Member Author

+ci-status

@github-actions

Copy link
Copy Markdown
Contributor

CI Status

Head SHA: d30b12e8186c
Changed files: 15
Docs-only heuristic (matches ci-changes-detector metadata paths): no
ready-for-hosted-ci label: present
force-full-hosted-ci label: absent
Current hosted-CI waiver: not present for this SHA

Optimized hosted CI is enabled for this PR.

@github-actions

Copy link
Copy Markdown
Contributor

Hosted CI Requested

Triggered 9 workflow(s) for d30b12e8186c.
Mode: optimized hosted CI (path-selected by script/ci-changes-detector).
Added ready-for-hosted-ci, so future commits will keep running optimized hosted CI until +ci-stop-hosted is used.

View progress in the Actions tab.

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

The modularization is clean and the intent (behavior-preserving split) is well-executed. Two correctness improvements over the original worth noting: the multiple-candidate inconclusive guard and the failure_context: 'during confirmation' fix in write_confirmed (the original would have said 'on main' in a confirmation-run error, which was misleading). Three things to flag:

  1. post_report_to_summary shim uses Object.const_get to read SUITE_NAME — see inline. This reflection path is dead in normal execution now that Cli handles orchestration; callers should pass suite_name explicitly.

  2. Confirmation.run hardcodes Config::CANDIDATE_INPUT_DIR with no injectable seam — see inline. No test exercises the full run_confirmation / Confirmation.run path; only load_candidate is tested in isolation.

  3. Minor: regression_payloads_spec.rb covers only happy paths. The rescue branches in write_candidate and write_confirmed (when File.write raises) are load-bearing for the confirmation flow and remain untested. Pre-existing gap, but worth a follow-up.

Comment thread benchmarks/track_benchmarks.rb
Comment thread benchmarks/lib/track_benchmarks/confirmation.rb
Comment thread benchmarks/lib/track_benchmarks/cli.rb
@github-actions

Copy link
Copy Markdown
Contributor

Pro Node Renderer Benchmark Summary

Benchmark RPS p50(ms) p90(ms) Status
Pro Node Renderer: simple_eval (non-RSC) 2158.44 ▲16.6% (1851.85) 4.23 ▼7.4% (4.57) 5.2 ▼42.6% (9.06) 200=64760
Pro Node Renderer: react_ssr (non-RSC) 1833.28 ▲7.9% (1698.87) 4.95 ▼3.9% (5.15) 6.06 ▼28.7% (8.51) 200=55010

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

@github-actions

Copy link
Copy Markdown
Contributor

Core Benchmark Summary

Benchmark RPS p50(ms) p90(ms) Status
/: Core 3.84 ▲11.2% (3.45) 2146.05 ▼1.6% (2180.12) 2573.82 ▼16.0% (3065.18) 200=123
/client_side_hello_world: Core 745.5 ▲14.9% (648.56) 8.76 ▼8.9% (9.61) 21.58 ▲10.3% (19.56) 200=22523
/client_side_rescript_hello_world: Core 754.04 ▲10.8% (680.35) 9.04 ▼1.8% (9.2) 17.57 ▼6.9% (18.87) 200=22782
/client_side_hello_world_shared_store: Core 696.03 ▲12.3% (619.77) 8.67 ▼13.9% (10.07) 21.93 ▲15.4% (19.0) 200=21027
/client_side_hello_world_shared_store_controller: Core 710.19 ▲14.0% (622.92) 11.31 ▲14.4% (9.88) 20.04 ▼1.5% (20.34) 200=21314
/client_side_hello_world_shared_store_defer: Core 492.28 ▼22.9% (638.16) 11.79 ▲17.0% (10.08) 17.01 ▼15.2% (20.06) 200=14873
/server_side_hello_world_shared_store: Core 16.78 ▲12.4% (14.93) 492.09 ▼4.9% (517.19) 659.38 ▼3.9% (685.79) 200=515
/server_side_hello_world_shared_store_controller: Core 16.97 ▲14.4% (14.84) 379.87 ▼22.8% (492.38) 609.0 ▼18.7% (749.42) 200=523
/server_side_hello_world_shared_store_defer: Core 16.95 ▲14.2% (14.84) 466.03 ▼6.7% (499.24) 653.32 ▼3.8% (679.25) 200=520
/server_side_hello_world: Core 33.95 ▲12.3% (30.23) 241.91 ▼5.5% (255.94) 293.99 ▼6.6% (314.69) 200=1035
/server_side_hello_world_hooks: Core 34.05 ▲14.9% (29.63) 264.3 ▲6.5% (248.29) 298.69 ▼5.0% (314.34) 200=1031
/server_side_hello_world_props: Core 34.22 ▲16.7% (29.31) 241.77 ▼6.4% (258.22) 297.72 ▼7.6% (322.28) 200=1039
/client_side_log_throw: Core 764.79 ▲12.2% (681.58) 10.35 ▲12.5% (9.2) 18.28 ▼2.9% (18.84) 200=23103
/server_side_log_throw: Core 33.14 ▲13.9% (29.09) 248.34 ▼4.2% (259.35) 311.95 ▼1.9% (318.08) 200=1004
/server_side_log_throw_plain_js: Core 33.57 ▲12.7% (29.78) 250.09 ▼3.1% (258.2) 302.02 ▼5.8% (320.55) 200=1021
/server_side_log_throw_raise: Core 33.86 ▲12.9% (30.0) 243.75 ▼6.3% (260.19) 300.51 ▼5.1% (316.53) 3xx=1028
/server_side_log_throw_raise_invoker: Core 642.06 ▼16.5% (768.6) 9.22 ▲11.7% (8.25) 14.21 ▼12.0% (16.15) 200=19396
/server_side_hello_world_es5: Core 34.62 ▲15.5% (29.98) 235.97 ▼8.7% (258.59) 292.42 ▼8.4% (319.33) 200=1053
/server_side_redux_app: Core 33.32 ▲16.9% (28.51) 248.12 ▼0.1% (248.28) 306.23 ▼3.8% (318.4) 200=1012
/server_side_hello_world_with_options: Core 34.8 ▲16.2% (29.96) 235.87 ▼5.6% (249.89) 291.7 ▼7.6% (315.78) 200=1056
/server_side_redux_app_cached: Core 766.16 ▲17.3% (653.4) 8.88 ▼10.2% (9.89) 17.75 ▲1.7% (17.46) 200=23147
/client_side_manual_render: Core 784.28 ▲18.5% (661.83) 8.48 ▼9.9% (9.41) 16.78 ▼14.8% (19.71) 200=23694
/render_js: Core 37.39 ▲16.9% (31.99) 221.15 ▼5.5% (234.08) 271.05 ▼8.2% (295.37) 200=1137
/react_router: Core 32.72 ▲15.0% (28.46) 278.9 ▲3.8% (268.76) 311.34 ▼7.4% (336.24) 200=991
/pure_component: Core 34.92 ▲13.5% (30.76) 190.16 ▼26.1% (257.17) 275.84 ▼11.9% (313.23) 200=1062
/react_compiler_example: Core 34.41 ▲15.3% (29.85) 241.29 ▼4.7% (253.16) 296.33 ▼6.9% (318.17) 200=1044
/css_modules_images_fonts_example: Core 34.35 ▲13.4% (30.28) 241.55 ▼6.9% (259.47) 301.9 ▼4.0% (314.63) 200=1043
/turbolinks_cache_disabled: Core 756.71 ▲8.7% (696.33) 8.23 ▼9.4% (9.09) 19.9 ▲6.3% (18.73) 200=22861
/rendered_html: Core 34.74 ▲14.9% (30.25) 238.87 ▼7.2% (257.45) 295.91 ▼6.3% (315.71) 200=1053
/xhr_refresh: Core 17.56 ▲13.8% (15.43) 328.53 ▼33.5% (494.4) 554.33 ▼16.9% (667.43) 200=541
/react_helmet: Core 33.45 ▲11.2% (30.09) 186.32 ▼29.3% (263.55) 285.72 ▼11.2% (321.84) 200=1022
/broken_app: Core 33.56 ▲15.2% (29.12) 245.37 ▼4.8% (257.84) 303.99 ▼5.7% (322.4) 200=1019
/image_example: Core 34.28 ▲14.4% (29.98) 242.81 ▼8.5% (265.45) 298.72 ▼7.9% (324.28) 200=1042
/font_optimization_example: Core 873.63 ▲14.9% (760.2) 7.29 ▼11.2% (8.21) 16.73 ▲6.7% (15.68) 200=26391
/client_side_activity: Core 765.44 ▲15.5% (662.91) 6.6 ▼31.6% (9.64) 14.26 ▼26.0% (19.27) 200=23280
/server_side_activity: Core 34.1 ▲14.2% (29.86) 267.22 ▲5.3% (253.7) 301.26 ▼6.7% (323.02) 200=1034
/turbo_frame_tag_hello_world: Core 836.56 ▲14.0% (733.51) 7.51 ▼13.9% (8.72) 17.67 ▲3.3% (17.1) 200=25273
/manual_render_test: Core 550.78 ▼18.9% (679.42) 10.48 ▲13.5% (9.23) 12.18 ▼34.7% (18.66) 200=16641
/root_error_callbacks: Core 29.38 ▼3.0% (30.3) 196.12 ▼23.3% (255.74) 256.59 ▼18.4% (314.53) 200=899
/hydration_scheduling: Core 11.91 ▲13.9% (10.45) 682.48 ▼9.5% (754.4) 1031.11 ▼2.6% (1058.2) 200=367
/rails_form: Core 779.15 ▲16.7% (667.62) 9.98 ▲7.2% (9.31) 17.6 ▼3.9% (18.32) 200=23535

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

@justin808

Copy link
Copy Markdown
Member Author

Address-review update for head d30b12e8186ca2f7bc983b5c76d8f8f957031e46.

Resolved current-head Claude threads without code changes:

  • Kept the post_report_to_summary(markdown, suite_name = nil) SUITE_NAME fallback as an intentional compatibility shim for the old top-level helper surface; normal script execution goes through Cli#run.
  • Deferred adding a new dir: keyword through Confirmation.run/run_confirmation; existing specs already exercise load_candidate(dir) directly, and adding a new injection API is testability surface beyond the no-behavior-change split.
  • Deferred the env.fetch("GITHUB_EVENT_NAME", nil) style preference to avoid restarting final-candidate gates for a no-behavior nit.

Readiness evidence:

  • Review threads after cutoff: 0 unresolved.
  • PR_BATCH_SKILL_DIR=.agents/skills/pr-batch .agents/skills/pr-batch/bin/pr-ci-readiness 4176 --repo shakacode/react_on_rails -> READY with required checks.
  • script/pr-merge-ledger 4176 --strict --pretty --changelog-classification not_user_visible -> complete_allowed: true, no violations or unknown fields (/tmp/pr-4176-merge-ledger.json).
  • Hosted benchmark/build rows are still running/pending; no failures observed in the latest poll.

@github-actions

Copy link
Copy Markdown
Contributor

Pro (shard 2/2) Benchmark Summary

Benchmark RPS p50(ms) p90(ms) Status
/empty: Pro 1171.36 ▼1.0% (1183.34) 6.75 ▲14.1% (5.92) 10.35 ▲4.4% (9.91) 200=35381
/ssr_shell_error: Pro 124.93 ▼31.1% (181.22) 47.36 ▲11.8% (42.37) 96.15 ▲40.8% (68.31) 200=3778
/ssr_sync_error: Pro 152.82 ▼16.2% (182.26) 48.18 ▲13.1% (42.61) 71.92 ▲4.5% (68.81) 200=4621
/rsc_component_error: Pro 152.38 ▼7.8% (165.31) 47.22 ▲13.0% (41.77) 77.97 ▲12.0% (69.6) 200=4593,3xx=13
/non_existing_stream_react_component: Pro 182.46 ▼6.8% (195.78) 47.8 ▲28.3% (37.27) 66.03 ▲4.4% (63.27) 200=5513
/server_side_redux_app_cached: Pro 300.51 ▼16.2% (358.61) 19.27 ▼4.5% (20.18) 49.67 ▲48.3% (33.49) 200=9082
/loadable: Pro 147.27 ▼13.5% (170.22) 52.15 ▲17.3% (44.45) 79.43 ▲8.8% (73.02) 200=4451
/apollo_graphql: Pro 124.6 ▼0.6% (125.32) 49.68 ▼0.1% (49.71) 76.77 ▼9.1% (84.43) 200=4185,3xx=17
/console_logs_in_async_server: Pro 3.57 ▲11.7% (3.2) 2126.17 ▲0.1% (2123.6) 2143.0 ▼1.8% (2182.84) 200=122
/stream_error_demo: Pro 2.9 ▼94.7% (54.97) 2012.3 ▲18.3% (1700.73) 2044.59 ▲18.3% (1728.09) 200=94
/stream_async_components: Pro 142.25 ▼11.4% (160.55) 37.7 ▼14.3% (44.02) 69.32 ▼3.3% (71.68) 200=4285,3xx=46
/rsc_posts_page_over_http: Pro 141.13 ▼15.4% (166.83) 51.82 ▲10.8% (46.76) 83.93 ▲4.1% (80.64) 200=4268
/rsc_echo_props: Pro 61.34 ▼29.3% (86.75) 126.87 ▲22.3% (103.71) 177.07 ▲4.9% (168.73) 200=1861
/client_side_fouc_probe: Pro 349.34 ▼1.8% (355.84) 22.11 ▲11.0% (19.92) 35.38 ▲9.4% (32.34) 200=10555
/async_on_server_sync_on_client_client_render: Pro 342.81 ▲2.6% (334.03) 22.43 ▲9.8% (20.42) 34.63 ▼5.3% (36.56) 200=10360
/server_router_client_render: Pro 352.93 ▲0.9% (349.81) 21.68 ▲5.0% (20.66) 34.89 ▲3.6% (33.68) 200=10664
/unwrapped_rsc_route_stream_render: Pro 175.38 ▼11.0% (197.14) 42.69 ▲9.0% (39.16) 68.94 ▲5.0% (65.63) 200=5302
/async_render_function_returns_component: Pro 188.88 ▼8.1% (205.51) 29.15 ▼19.6% (36.25) 54.54 ▼5.1% (57.47) 200=5747
/native_metadata: Pro 167.73 ▼12.7% (192.06) 44.31 ▲26.3% (35.08) 69.36 ▲15.8% (59.88) 200=5053,3xx=18
/hybrid_metadata_streaming: Pro 162.63 ▼15.4% (192.2) 46.45 ▲17.9% (39.4) 73.6 ▲15.8% (63.54) 200=4921
/cache_demo: Pro 122.98 ▼23.2% (160.08) 44.38 ▼9.7% (49.14) 74.15 ▼9.4% (81.86) 200=3722
/client_side_hello_world: Pro 324.71 ▼6.9% (348.87) 23.81 ▲18.0% (20.18) 35.43 ▲7.0% (33.11) 200=9752
/client_side_hello_world_shared_store_controller: Pro 218.39 ▼32.5% (323.5) 26.0 ▲21.6% (21.39) 31.7 ▼11.8% (35.96) 200=6643
/server_side_hello_world_shared_store: Pro 107.67 ▼22.4% (138.77) 73.39 ▲29.0% (56.9) 103.68 ▲16.8% (88.76) 200=3256
/server_side_hello_world_shared_store_defer: Pro 92.85 ▼35.0% (142.76) 62.2 ▲13.4% (54.84) 79.39 ▼9.8% (87.97) 200=2829
/server_side_hello_world_hooks: Pro 177.37 ▼11.8% (201.02) 44.23 ▲23.0% (35.96) 68.02 ▲13.5% (59.93) 200=5362
/server_side_log_throw: Pro 177.67 ▼7.8% (192.64) 34.55 ▼8.3% (37.68) 58.87 ▼8.4% (64.25) 200=5356,3xx=16
/source_mapped_prerender_error_probe: Pro 264.65 ▲2.0% (259.41) 32.93 ▲19.3% (27.61) 45.1 ▲3.4% (43.62) 3xx=7997
/server_side_log_throw_raise_invoker: Pro 413.76 ▲0.2% (413.09) 13.57 ▼19.4% (16.84) 25.21 ▼9.2% (27.75) 200=12504
/server_side_redux_app: Pro 174.1 ▼13.0% (200.21) 35.99 ▼7.0% (38.71) 61.15 ▼3.8% (63.59) 200=5264
/server_side_redux_app_cached: Pro 369.42 ▲3.0% (358.61) 21.52 ▲6.7% (20.18) 34.27 ▲2.3% (33.49) 200=11162
/render_js: Pro 388.14 ▲1.7% (381.76) 20.19 ▲7.0% (18.88) 29.38 ▼7.1% (31.61) 200=11728
/pure_component: Pro 191.08 ▼7.9% (207.42) 45.77 ▲24.9% (36.64) 61.68 ▼3.2% (63.71) 200=5776
/turbolinks_cache_disabled: Pro 361.93 ▼3.0% (373.12) 21.11 ▲9.8% (19.23) 31.61 ▲1.3% (31.19) 200=10952
/xhr_refresh: Pro 128.81 ▼15.0% (151.49) 59.22 ▲10.3% (53.67) 82.28 ▼1.8% (83.79) 200=3897
/broken_app: Pro 182.07 ▼12.7% (208.53) 43.23 ▲16.9% (36.97) 66.12 ▲9.7% (60.26) 200=5470
/server_render_with_timeout: Pro 60.46 ▼47.6% (115.29) 113.25 ▲15.6% (97.93) 121.53 ▲5.7% (114.93) 200=1839

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

@github-actions

Copy link
Copy Markdown
Contributor

Pro (shard 1/2) Benchmark Summary

Benchmark RPS p50(ms) p90(ms) Status
/: Pro 50.68 ▼29.3% (71.7) 156.04 ▲14.6% (136.13) 222.55 ▲17.0% (190.22) 200=1535
/error_scenarios_hub: Pro 388.15 ▲7.5% (360.92) 19.55 ▼2.4% (20.04) 29.53 ▼9.6% (32.66) 200=11731
/ssr_async_error: Pro 2.9 ▼95.5% (64.03) 2011.5 ▲20.5% (1669.01) 2062.99 ▲20.6% (1710.65) 200=94
/ssr_async_prop_error: Pro 1.52 ▼97.5% (60.73) 5021.5 ▲20.7% (4159.95) 5451.1 ▼24.5% (7217.46) 200=55
/non_existing_react_component: Pro 158.19 ▼24.5% (209.58) 36.79 ▼1.9% (37.52) 80.09 ▲14.8% (69.77) 200=4783
/non_existing_rsc_payload: Pro 186.01 ▼13.2% (214.4) 39.94 ▲9.6% (36.43) 65.19 ▲8.0% (60.36) 200=5622
/cached_react_helmet: Pro 397.68 ▲6.2% (374.62) 19.76 ▲1.3% (19.51) 31.49 ▼1.0% (31.8) 200=12020
/cached_redux_component: Pro 339.63 ▼10.8% (380.58) 13.55 ▼29.5% (19.22) 19.87 ▼38.1% (32.09) 200=10331
/lazy_apollo_graphql: Pro 146.84 ▲3.8% (141.51) 49.22 ▼3.6% (51.06) 72.68 ▼11.5% (82.09) 200=4443
/redis_receiver: Pro 87.85 ▼7.9% (95.38) 60.48 ▼11.3% (68.19) 141.09 ▼1.7% (143.5) 200=2678
/stream_shell_error_demo: Pro 139.54 ▼20.9% (176.45) 53.81 ▲35.7% (39.67) 80.19 ▲12.2% (71.48) 200=4201,3xx=19
/test_incremental_rendering: Pro 2.9 ▼95.7% (67.9) 2011.25 ▲20.6% (1667.8) 2027.43 ▲20.2% (1687.21) 200=94
/rsc_posts_page_over_redis: Pro 92.15 ▼3.8% (95.83) 63.93 ▼10.2% (71.18) 128.77 ▲9.2% (117.97) 200=2787
/rsc_fouc_probe: Pro 151.21 ▼11.0% (169.97) 30.28 ▼29.2% (42.77) 48.9 ▼31.3% (71.18) 200=4605
/async_on_server_sync_on_client: Pro 2.17 ▼96.4% (60.69) 3011.47 ▲20.6% (2496.65) 3490.78 ▼1.6% (3546.24) 200=73
/server_router: Pro 165.71 ▼12.8% (190.11) 44.81 ▲9.7% (40.86) 74.4 ▲5.3% (70.63) 200=5009
/unwrapped_rsc_route_client_render: Pro 402.72 ▲7.6% (374.28) 13.78 ▼27.7% (19.07) 25.93 ▼16.3% (30.99) 200=12173
/async_render_function_returns_string: Pro 158.23 ▼21.2% (200.9) 36.09 ▼1.0% (36.44) 113.2 ▲88.6% (60.02) 200=4784
/async_components_demo: Pro 7.62 ▼83.2% (45.28) 1027.11 ▲19.8% (857.17) 1059.03 ▲17.8% (899.17) 200=239
/stream_native_metadata: Pro 180.23 ▼11.3% (203.25) 30.18 ▼22.6% (39.01) 56.75 ▼11.1% (63.81) 200=5488
/rsc_native_metadata: Pro 8.63 ▼87.4% (68.5) 1011.07 ▲20.3% (840.28) 1021.16 ▲17.3% (870.51) 200=271
/react_intl_rsc_demo: Pro 87.68 ▼35.9% (136.87) 85.38 ▲29.7% (65.82) 128.55 ▲20.2% (106.92) 200=2638
/client_side_hello_world_shared_store: Pro 367.8 ▲6.3% (345.94) 20.78 ▲1.5% (20.47) 30.52 ▼8.2% (33.26) 200=11122
/client_side_hello_world_shared_store_defer: Pro 369.58 ▲7.9% (342.65) 20.43 ▼1.8% (20.81) 30.61 ▼6.2% (32.65) 200=11169
/server_side_hello_world_shared_store_controller: Pro 131.2 ▼6.7% (140.58) 66.64 ▲24.4% (53.57) 91.0 ▲2.5% (88.77) 200=3944
/server_side_hello_world: Pro 191.96 ▼5.6% (203.25) 38.18 ▲5.8% (36.09) 61.26 ▲2.2% (59.93) 200=5794,3xx=9
/client_side_log_throw: Pro 401.75 ▲8.7% (369.52) 18.72 ▼1.7% (19.05) 28.18 ▼11.8% (31.97) 200=12140
/server_side_log_throw_plain_js: Pro 420.44 ▲11.7% (376.57) 18.86 ▼0.2% (18.9) 29.25 ▼8.8% (32.07) 200=12703
/server_side_log_throw_raise: Pro 198.44 ▼36.6% (313.19) 29.95 ▲12.1% (26.72) 65.75 ▲49.7% (43.91) 3xx=5999
/server_side_hello_world_es5: Pro 163.21 ▼21.0% (206.66) 32.64 ▼11.7% (36.95) 60.11 ▲0.4% (59.88) 200=4950,3xx=18
/server_side_hello_world_with_options: Pro 194.23 ▼0.2% (194.68) 38.78 ▲5.6% (36.74) 62.12 ▼0.4% (62.37) 200=5871
/client_side_manual_render: Pro 408.41 ▲13.3% (360.35) 18.48 ▼4.7% (19.39) 27.95 ▼7.6% (30.24) 200=12343
/react_router: Pro 212.07 ▼0.8% (213.85) 34.42 ▼4.6% (36.07) 57.7 ▼2.1% (58.93) 200=6410
/css_modules_images_fonts_example: Pro 199.01 ▼2.7% (204.44) 30.76 ▼18.7% (37.83) 53.01 ▼10.4% (59.17) 200=6003,3xx=13
/rendered_html: Pro 208.1 ▲4.3% (199.5) 37.12 ▲0.8% (36.82) 54.93 ▼6.8% (58.92) 200=6291
/react_helmet: Pro 168.06 ▲10.4% (152.2) 39.79 ▼28.0% (55.28) 70.32 ▼15.0% (82.71) 200=5084
/image_example: Pro 206.64 ▲3.5% (199.6) 36.54 ▲1.2% (36.1) 57.81 ▼0.3% (58.01) 200=6244
/posts_page: Pro 138.09 ▼6.4% (147.47) 52.23 ▲2.8% (50.8) 75.59 ▼5.4% (79.92) 200=4171,3xx=5

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

@justin808

Copy link
Copy Markdown
Member Author

Batch closeout confidence note for head d30b12e8186ca2f7bc983b5c76d8f8f957031e46:

  • Validated locally: focused benchmark tracker specs and local CI evidence recorded earlier in this PR; final branch diff remains inside benchmarks/track_benchmarks.rb, benchmarks/lib/**, and benchmarks/spec/**.
  • Hosted/PR gates: pr-ci-readiness reports READY; full check list is 44 passed, 0 pending, 0 failing; mergeStateStatus is CLEAN.
  • Review triage: current cutoff 2026-06-23T10:22:41Z; unresolved review threads: 0.
  • Merge ledger: script/pr-merge-ledger 4176 --strict --pretty --changelog-classification not_user_visible is clean (complete_allowed: true, no unknown fields, no violations).
  • UNKNOWN: none.
  • Residual risk: behavior-preserving benchmark tooling split; no intended benchmark behavior change.

Final batch state: ready-no-merge-authority. The batch goal has merge_authority: ask, so I am not merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark ready-for-hosted-ci Run optimized hosted GitHub CI for this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark workflow follow-ups: unify suite jobs, split track_benchmarks, JSON-based regression detection

1 participant