Filter stale Bencher alerts before reporting by justin808 · Pull Request #3822 · shakacode/react_on_rails

justin808 · 2026-06-09T00:34:06Z

Summary

Filter active Bencher alerts against the current report boundaries before treating them as regressions.
Keep malformed or unmatchable active alerts fail-safe so report-shape drift still fails visibly.
Normalize stale-only Bencher alert exits after the start-point retry and operational checks so cleared alerts do not file regression issues.
Add parser and benchmark tracking specs for stale alerts, measure-less alerts, fail-safe cases, and stale-only exit normalization.

Validation

bundle exec rubocop benchmarks/lib/bencher_report.rb benchmarks/track_benchmarks.rb benchmarks/spec/bencher_report_spec.rb benchmarks/spec/track_benchmarks_spec.rb -> 4 files inspected, no offenses
bundle exec rspec benchmarks/spec/bencher_report_spec.rb benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/report_table_integration_spec.rb -> 83 examples, 0 failures
bundle exec rspec benchmarks/spec -> 229 examples, 0 failures
git diff --check -> passed
script/ci-changes-detector origin/main -> Benchmark scripts; recommends Lint (Ruby + JS)
(cd react_on_rails && bundle exec rubocop) -> 214 files inspected, no offenses
codex review --base origin/main -> no actionable correctness issues found
git push pre-push hook -> branch Ruby RuboCop passed on changed Ruby files; Markdown link check had no Markdown files

Labels: benchmark. Benchmark reporting is performance-sensitive; full-ci is not recommended because this is a focused benchmark script/parser/spec change and the CI detector only recommends lint.

Agent Merge Confidence

Mode: development (no open Release gate: tracker found by title search; batch assignment: rc-accelerated-2026-06-08-two-machine)
Score: 8/10
Auto-merge recommendation: no (user requested no auto-merge; no independent finalizer)
Affected areas: benchmark regression reporting, Bencher report parsing, benchmark confirmation handoff
CI detector: script/ci-changes-detector origin/main -> Benchmark scripts; recommends Lint (Ruby + JS)
Validation run:

bundle exec rubocop benchmarks/lib/bencher_report.rb benchmarks/track_benchmarks.rb benchmarks/spec/bencher_report_spec.rb benchmarks/spec/track_benchmarks_spec.rb -> 4 files inspected, no offenses
bundle exec rspec benchmarks/spec/bencher_report_spec.rb benchmarks/spec/track_benchmarks_spec.rb benchmarks/spec/report_table_integration_spec.rb -> 83 examples, 0 failures
bundle exec rspec benchmarks/spec -> 229 examples, 0 failures
git diff --check -> passed
(cd react_on_rails && bundle exec rubocop) -> 214 files inspected, no offenses
codex review --base origin/main -> no actionable correctness issues found
Review/check gate:
Codex review: complete for c84a4be3, no actionable correctness issues
GitHub checks: pending after PR creation
Known residual risk: No live Bencher service run was available locally; coverage uses pinned JSON fixtures/specs for the Bencher CLI v0.6.2 report shape.
Finalized by: not finalized; authoring agent only

Note

Medium Risk
Changes benchmark regression gating and CI exit codes; incorrect filtering could hide real regressions or clear jobs that should fail, though fail-safe paths and extensive specs mitigate this.

Overview
Bencher regression detection now cross-checks active alerts[] against current report boundaries before counting a regression. Stale alerts (metrics back within limits) are dropped from #alerts / #regression? but tracked via new #filtered_alert?.

Fail-safe behavior is unchanged for ambiguous cases: missing benchmark, unmatchable boundary, unknown limit side, or measure-less alerts with no regression on the alert side still count as regressions so schema drift stays visible.

CI exit handling adds normalized_bencher_exit_code: after start-point-hash retry, a non-zero Bencher exit with only filtered (stale) alerts and no real regression is normalized to 0 with a ::notice::, avoiding false regression filing while preserving the raw exit for retry logic first.

Specs cover stale vs current alerts, measure-less alerts, fail-safe paths, and exit normalization.

^{Reviewed by Cursor Bugbot for commit c84a4be. Bugbot is set up for automated code reviews on this repo. Configure here.}

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved alert classification to distinguish actual performance regressions from other active alerts.
- Enhanced exit code handling to prevent false workflow failures when only non-regression alerts are present.
Tests
- Expanded test coverage for edge cases in alert filtering and regression detection.
- Added tests for retry behavior and exit code normalization logic.

coderabbitai · 2026-06-09T00:34:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24dc8cb6-f570-40d0-b9de-723c5f394e92

📥 Commits

Reviewing files that changed from the base of the PR and between 25b94cc and c84a4be.

📒 Files selected for processing (4)

benchmarks/lib/bencher_report.rb
benchmarks/spec/bencher_report_spec.rb
benchmarks/spec/track_benchmarks_spec.rb
benchmarks/track_benchmarks.rb

Walkthrough

BencherReport now distinguishes current regression alerts from filtered/stale alerts during initialization by partitioning active alerts into separate collections. A new normalized_bencher_exit_code function converts non-zero Bencher exits to success (0) when only stale/filtered alerts remain. Test fixtures are refactored to use consistent helpers and expanded with new cases for exit-code and retry-handling behavior.

Changes

Bencher Alert Classification and Exit Normalization

Layer / File(s)	Summary
Alert Classification Core Implementation `benchmarks/lib/bencher_report.rb`	`BencherReport` partitions active alerts into `@alerts` (current regressions via `current_regression_alert?`) and `@filtered_alerts` (other active alerts). Adds `filtered_alert?` predicate and refactors `regression?` and `perf_links_unavailable?` as Ruby shorthand. Alert classification considers inferred direction from the alert limit and compares against configured boundaries, treating missing data as potential regressions.
Alert Classification Test Coverage `benchmarks/spec/bencher_report_spec.rb`	Comprehensive test helpers and cases cover regression classification: active alerts ignored when metric is not a regression, malformed/missing alert data handling (missing benchmark name, missing boundary), measure-less alert behavior depending on benchmark regression presence, and `failed_pct` regressions detected when crossing upper boundary. Tests validate both `regression?` and `filtered_alert?` outcomes.
Exit Code Normalization Implementation `benchmarks/track_benchmarks.rb`	New `normalized_bencher_exit_code(exit_code, report)` converts non-zero Bencher exits to 0 when the report contains only stale/filtered active alerts and no current regression, emitting a GitHub notice. Applied immediately after `--start-point-hash` retry logic and before confirmation/candidate flows.
Test Fixtures Modernization and Integration Tests `benchmarks/spec/track_benchmarks_spec.rb`	Introduces JSON-building helpers (`result`, `rps_measure`, `p50_measure`, `active_alert`) to replace hardcoded alert/result hashes. Refactors `regressed_benchmark_names`, `regressed_alert_pairs`, and `confirmation_outcome` tests using consistent helper-built structures. Adds retry-handling tests: one preserves original alert exit code to enable `retry_without_start_point_hash?`, and another verifies `normalized_bencher_exit_code` converts stale-alert exits to 0 with notice emission.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

performance-regression issues are being posted when no significant regressions are detected #3795: Changes address false-positive performance-regression issue posting by filtering out stale/non-regression alerts and normalizing exit codes to treat them as non-fatal operational results.

Possibly related PRs

shakacode/react_on_rails#3627: Modified the same perf_links_unavailable? method in benchmarks/lib/bencher_report.rb (added there and now re-expressed in shorthand).
shakacode/react_on_rails#3586: Introduced earlier regression classification logic in BencherReport that this PR extends with alert filtering and boundary-based regression detection.
shakacode/react_on_rails#3810: Refines the confirmation-stage workflow that depends on the BencherReport current-regression classification and exit-code normalization introduced here.

Suggested labels

enhancement, review-needed, full-ci, benchmark, P2

Suggested reviewers

alexeyr-ci2

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Filter stale Bencher alerts before reporting' is specific and clearly describes the main change: filtering of stale/non-regression alerts before they are treated as regressions in reporting.
Linked Issues check	✅ Passed	The PR implements the core objective from issue `#3795` by filtering non-regression alerts (stale/unmatchable) so they no longer cause false-positive regression issues, matching the requirement to prevent spurious performance-regression posts.
Out of Scope Changes check	✅ Passed	All changes are directly related to filtering Bencher alerts and normalizing exit codes to prevent false-positive regression reporting, with no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/b-3795-benchmark-regression-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-06-09T00:37:39Z

Code Review — Filter stale Bencher alerts before reporting

Overview

This PR correctly addresses a false-positive regression problem: when Bencher carries an active alert from a previous run, the old code treated it as a current regression even when the current metric is healthy. The fix partitions alerts into current regressions vs. stale ones and normalizes the Bencher exit code after retry handling.

The fail-safe philosophy (missing benchmark name, invalid direction, no boundary on the expected side → treat as regression) is the right default for a reporting system where a silent miss is worse than a noisy false-positive.

What's solid

Fail-safe coverage — every uncertain case (malformed alert, unknown limit string, no matching boundary) keeps the alert live rather than silently dropping it. This is the correct default for regression detection.
Retry ordering — normalized_bencher_exit_code is called after the start-point-hash retry, which is tested explicitly. A stale alert won't mask a legitimate hash-miss retry.
Test breadth — all the interesting cases (stale alert filtered, measure-less alert, opposite-side improvement ignored, fail-safe passthrough, exit-code normalization) have coverage.
Boundary mirroring — the Boundary#significance logic correctly uses the symmetric t-test interval so one-sided thresholds still classify both directions.

Issues

1. `.values` can yield duplicate `Boundary` objects in the measure-less path

index_measure stores each boundary under both its slug key and its name key (when they normalize differently), so @boundaries.fetch(benchmark, {}).values can return the same Boundary instance twice. For any? this only causes a redundant evaluation, but it is subtle enough to warrant .uniq:

matching_boundaries = @boundaries.fetch(alert.benchmark, {}).values.uniq.select do |boundary|
  threshold_side?(boundary, direction)
end

2. `filtered_alert?` naming is ambiguous

filtered_alert? reads naturally as "is this alert filtered?" (instance predicate) rather than "does this report contain filtered alerts?". Since callers use it to decide whether to suppress a non-zero exit, a name that conveys the cardinality is clearer:

def stale_alerts? = !@filtered_alerts.empty?
# or
def filtered_alerts? = !@filtered_alerts.empty?

3. Double-negative condition in `normalized_bencher_exit_code`

return exit_code unless exit_code != 0 && ... is tricky to parse quickly. Flipping to a positive guard reads more naturally:

def normalized_bencher_exit_code(exit_code, report)
  return exit_code if exit_code.zero? || !report&.filtered_alert? || regression?(report)

  Github.notice("Bencher reported only stale active alert(s); no current boundary-backed regression remains.")
  0
end

Minor observations

The rubocop:disable Metrics/CyclomaticComplexity wrapper is a signal worth noting: current_regression_alert? could be decomposed into two private helpers — one for the measure-less path and one for the measure-specific path — which would satisfy RuboCop without a disable and make the two branches individually testable. Not blocking, but worth a follow-up.
threshold_side? returns the limit value (a Float or nil) rather than an explicit boolean. In Ruby this works correctly since 0.0 is truthy, but an explicit !boundary.lower_limit.nil? form would make the intent unmistakable for future readers.

Verdict

The logic is correct, the fail-safe decisions are sound, and the test coverage is thorough. The issues above are all minor: .uniq avoids subtle redundant work, renaming filtered_alert? improves readability at every call site, and straightening the guard condition makes the normalizer easier to audit. None block merge, but the renaming in particular will show up in every caller going forward, so it's worth fixing before this lands.

greptile-apps · 2026-06-09T00:39:13Z

Greptile Summary

This PR adds stale-alert filtering to the Bencher benchmark regression reporter: active alerts from Bencher are now cross-referenced against the current report's boundaries before being treated as regressions, and a normalized_bencher_exit_code helper converts a stale-only non-zero exit to success after retry and operational checks have run.

BencherReport#current_regression_alert? partitions active alerts into current regressions (@alerts) and filtered/stale ones (@filtered_alerts), with fail-safe return true for any alert that can't be matched (no benchmark name, unknown direction, or missing boundary), so report-shape drift still surfaces loudly.
normalized_bencher_exit_code in track_benchmarks.rb is placed after the start-point-hash retry, so a stale-only exit does not suppress an otherwise retriable "Head Version not found" condition.
Specs cover stale alerts, measure-less alerts, fail-safe cases, opposite-side improvements, and the new stale-only exit normalization path; existing fixtures were updated to include matching boundary results.

Confidence Score: 4/5

Safe to merge; the filtering logic is well-guarded with fail-safe defaults and is thoroughly covered by specs.

The core current_regression_alert? method uses explicit fail-safes so any unrecognised alert shape keeps the alert active rather than silently dropping a real regression. The placement of normalized_bencher_exit_code after the retry block is correct and directly tested. All findings are style/clarity notes with no impact on runtime behaviour.

No files require special attention; the two style suggestions in bencher_report.rb and track_benchmarks.rb are non-blocking.

Important Files Changed

Filename	Overview
benchmarks/lib/bencher_report.rb	Adds `current_regression_alert?` to partition active alerts into current vs stale. Logic is sound with good fail-safe defaults; minor style note on `threshold_side?` returning a numeric value used as boolean.
benchmarks/track_benchmarks.rb	Adds `normalized_bencher_exit_code` and calls it after retry handling. Guard condition uses `return … unless` with `!= 0 && … && !regression?`; semantics are correct but slightly dense to read.
benchmarks/spec/bencher_report_spec.rb	Comprehensive new specs for stale-alert filtering, measure-less alerts, fail-safe cases, and opposite-side improvements. Coverage is thorough and fixtures are well-constructed.
benchmarks/spec/track_benchmarks_spec.rb	Updated fixtures now include matching `results` boundaries so the new filtering logic resolves correctly; adds specs for stale-only exit normalization and the preserved retry ordering.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Bencher CLI exits with JSON report] --> B{Parse JSON\nBencherReport.parse}
    B -->|FormatError| C[exit 1 — fail loud]
    B -->|OK| D[index_boundaries from results]
    D --> E[parse_alerts → partition via current_regression_alert?]
    E --> F["@alerts (current regressions)"]
    E --> G["@filtered_alerts (stale alerts)"]
    F --> H{regression?}
    H -->|true| J[Real regression — keep exit code]
    H -->|false| K{retry_without_start_point_hash?}
    K -->|exit≠0 + Head Version not found| L[Retry without --start-point-hash]
    L --> M[Re-run Bencher → new report]
    M --> N[normalized_bencher_exit_code]
    K -->|no| N
    N -->|exit≠0 + filtered_alert? + !regression?| O[emit notice → return 0]
    N -->|otherwise| P[Return original exit code]
    J --> Q[Downstream regression reporting]
    O --> Q
    P --> Q

_{Reviews (1): Last reviewed commit: "Filter stale Bencher alerts before repor..." | Re-trigger Greptile}

github-actions · 2026-06-09T00:40:55Z

Pro Node Renderer Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
Pro Node Renderer: simple_eval (non-RSC)	2180.64 ▼1.1% (2205.6)	3.98 ▼3.4% (4.12)	6.14 ▲17.4% (5.23)	200=65433
Pro Node Renderer: react_ssr (non-RSC)	1944.53 ▲0.3% (1939.63)	4.57 ▼2.6% (4.69)	6.03 ▲0.1% (6.02)	200=58346

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

github-actions · 2026-06-09T00:58:22Z

Core Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
/: Core	3.54 ▲11.7% (3.17)	2329.06 ▼1.6% (2367.87)	2878.36 ▼8.5% (3146.48)	200=113
/client_side_hello_world: Core	780.07 ▲17.8% (662.39)	8.61 ▼5.2% (9.09)	16.68 ▼5.9% (17.72)	200=23568
/client_side_rescript_hello_world: Core	734.49 ▲10.6% (664.26)	8.49 ▼10.0% (9.43)	17.51 ▼1.7% (17.81)	200=22192
/client_side_hello_world_shared_store: Core	682.39 ▲6.9% (638.6)	8.9 ▼8.0% (9.68)	19.04 ▼2.1% (19.45)	200=20618
/client_side_hello_world_shared_store_controller: Core	492.75 ▼23.0% (639.79)	11.62 ▲21.2% (9.59)	16.19 ▼21.1% (20.52)	200=14888
/client_side_hello_world_shared_store_defer: Core	707.05 ▲8.4% (652.39)	4.34 ▼55.2% (9.69)	16.11 ▼20.7% (20.3)	200=21508
/server_side_hello_world_shared_store: Core	12.59 ▼7.9% (13.67)	468.52 ▼17.7% (568.98)	818.07 ▲8.9% (751.32)	200=386
/server_side_hello_world_shared_store_controller: Core	15.1 ▲10.4% (13.67)	598.98 ▲9.1% (548.97)	745.53 ▲0.7% (740.57)	200=461
/server_side_hello_world_shared_store_defer: Core	14.87 ▲7.8% (13.8)	212.54 ▼62.5% (566.64)	691.83 ▼9.3% (762.95)	200=461
/server_side_hello_world: Core	30.43 ▲9.8% (27.72)	277.6 ▲1.8% (272.81)	311.37 ▼10.3% (347.25)	200=928
/server_side_hello_world_hooks: Core	29.52 ▲6.4% (27.74)	305.96 ▲10.0% (278.2)	338.31 ▼3.8% (351.77)	200=896
/server_side_hello_world_props: Core	28.69 ▲3.1% (27.83)	294.91 ▲4.6% (282.01)	359.84 ▲3.9% (346.46)	200=873
/client_side_log_throw: Core	711.51 ▲7.8% (660.18)	8.52 ▼11.8% (9.66)	21.13 ▲13.6% (18.6)	200=21496
/server_side_log_throw: Core	29.69 ▲10.1% (26.98)	280.24 ▼3.3% (289.76)	324.19 ▼8.1% (352.89)	200=905
/server_side_log_throw_plain_js: Core	30.33 ▲10.3% (27.5)	280.48 ▲2.7% (273.02)	334.15 ▼4.7% (350.47)	200=921
/server_side_log_throw_raise: Core	25.33 ▼8.1% (27.55)	232.31 ▼17.0% (279.96)	344.76 ▼0.4% (345.99)	3xx=770
/server_side_log_throw_raise_invoker: Core	902.58 ▲15.1% (784.42)	7.68 ▼8.7% (8.41)	14.07 ▼10.1% (15.66)	200=27267
/server_side_hello_world_es5: Core	30.99 ▲15.9% (26.75)	276.19 ▼0.7% (278.02)	316.22 ▼9.9% (351.02)	200=942
/server_side_redux_app: Core	29.68 ▲10.3% (26.92)	288.46 ▼2.9% (296.98)	332.14 ▼8.1% (361.5)	200=903
/server_side_hello_world_with_options: Core	31.08 ▲12.1% (27.73)	290.64 ▲2.6% (283.15)	316.4 ▼9.3% (348.72)	200=941
/server_side_redux_app_cached: Core	510.12 ▼22.5% (658.3)	11.31 ▲12.0% (10.1)	17.69 ▼3.2% (18.28)	200=15415
/client_side_manual_render: Core	747.13 ▲9.2% (684.28)	8.25 ▼11.2% (9.29)	19.64 ▲6.4% (18.46)	200=22573
/render_js: Core	32.21 ▲9.9% (29.3)	259.75 ▼1.3% (263.06)	298.29 ▼9.1% (328.12)	200=982
/react_router: Core	28.28 ▲8.9% (25.96)	299.72 ▼1.8% (305.27)	342.8 ▼7.3% (369.72)	200=859
/pure_component: Core	23.79 ▼15.2% (28.07)	250.42 ▼11.4% (282.55)	337.34 ▼3.1% (348.1)	200=726
/css_modules_images_fonts_example: Core	21.7 ▼18.9% (26.77)	282.95 ▲4.2% (271.66)	379.16 ▲8.3% (349.99)	200=661
/turbolinks_cache_disabled: Core	773.62 ▲13.0% (684.68)	8.38 ▼9.9% (9.3)	16.95 ▼8.9% (18.61)	200=23373
/rendered_html: Core	30.43 ▲9.8% (27.72)	282.63 0.0% (282.64)	327.33 ▼5.8% (347.6)	200=923
/xhr_refresh: Core	15.28 ▲7.4% (14.22)	558.0 ▲4.4% (534.26)	737.8 ▲1.4% (727.37)	200=470
/react_helmet: Core	30.08 ▲11.1% (27.07)	300.95 ▲5.8% (284.37)	325.61 ▼8.1% (354.33)	200=912
/broken_app: Core	29.86 ▲9.5% (27.28)	303.61 ▲10.7% (274.18)	330.06 ▼5.1% (347.95)	200=905
/image_example: Core	29.99 ▲9.7% (27.35)	278.49 ▼3.0% (286.97)	319.84 ▼8.6% (349.84)	200=915
/turbo_frame_tag_hello_world: Core	822.47 ▲11.0% (740.66)	8.2 ▼5.3% (8.66)	15.77 ▼5.1% (16.62)	200=24849
/manual_render_test: Core	501.92 ▼26.2% (679.81)	11.36 ▲20.6% (9.42)	20.58 ▲10.3% (18.66)	200=15164

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

github-actions · 2026-06-09T01:01:42Z

Pro (shard 1/2) Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
/: Pro	174.26 ▼3.0% (179.73)	51.26 ▲19.4% (42.93)	64.71 ▲4.9% (61.66)	200=5235
/error_scenarios_hub: Pro	360.34 ▲2.0% (353.23)	21.1 ▲5.8% (19.94)	31.99 ▲4.5% (30.63)	200=10891
/ssr_async_error: Pro	341.28 ▲1.1% (337.46)	22.8 ▲8.8% (20.95)	34.38 ▼2.7% (35.34)	200=10310
/ssr_async_prop_error: Pro	311.56 ▼4.1% (324.77)	24.34 ▲14.1% (21.33)	37.54 ▲4.5% (35.93)	200=9417
/non_existing_react_component: Pro	347.81 ▼0.3% (348.87)	22.53 ▲10.6% (20.37)	36.37 ▲12.4% (32.36)	200=10509
/non_existing_rsc_payload: Pro	363.23 ▼0.2% (363.93)	21.26 ▲7.0% (19.88)	34.78 ▲3.8% (33.49)	200=10976
/cached_react_helmet: Pro	374.41 ▲1.8% (367.77)	21.04 ▲10.9% (18.97)	32.64 ▲6.3% (30.69)	200=11310
/cached_redux_component: Pro	369.93 ▼3.3% (382.59)	23.6 ▲21.1% (19.49)	31.63 ▲2.7% (30.79)	200=11175
/lazy_apollo_graphql: Pro	145.15 ▼3.0% (149.69)	45.05 ▼7.4% (48.63)	71.02 ▼6.6% (76.01)	200=4389
/redis_receiver: Pro	93.06 ▲8.0% (86.18)	70.7 ▲3.2% (68.53)	134.36 ▼11.6% (152.02)	200=2783,3xx=32
/stream_shell_error_demo: Pro	328.79 ▼0.8% (331.44)	23.71 ▲14.9% (20.64)	35.5 ▲2.5% (34.64)	200=9936
/test_incremental_rendering: Pro	321.42 ▼5.1% (338.64)	23.63 ▲10.9% (21.32)	36.73 ▲6.4% (34.52)	200=9715
/rsc_posts_page_over_redis: Pro	94.56 ▼2.6% (97.03)	78.12 ▲11.0% (70.39)	127.86 ▲11.3% (114.91)	200=2862
/async_on_server_sync_on_client: Pro	253.95 ▼20.9% (320.9)	22.97 ▲1.5% (22.63)	74.25 ▲86.4% (39.84)	200=7674
/server_router: Pro	336.19 ▲1.0% (332.95)	23.01 ▲7.0% (21.5)	34.66 ▼2.1% (35.39)	200=10157
/unwrapped_rsc_route_client_render: Pro	364.4 ▼4.0% (379.45)	15.41 ▼19.1% (19.06)	28.66 ▼4.6% (30.03)	200=11083
/async_render_function_returns_string: Pro	263.16 ▼22.0% (337.34)	21.47 ▲1.8% (21.1)	27.24 ▼18.6% (33.46)	200=8004
/async_components_demo: Pro	210.68 ▲3.7% (203.14)	42.21 ▲14.5% (36.85)	53.4 ▲5.0% (50.88)	200=6337
/stream_native_metadata: Pro	339.6 ▲0.9% (336.42)	22.89 ▲9.4% (20.93)	34.29 ▼5.1% (36.14)	200=10268
/rsc_native_metadata: Pro	333.03 ▲3.1% (323.14)	22.98 ▲3.6% (22.19)	39.19 ▲3.4% (37.91)	200=10064
/react_intl_rsc_demo: Pro	321.1 ▲3.4% (310.5)	17.31 ▼15.9% (20.57)	33.25 ▼31.2% (48.31)	200=9706
/client_side_hello_world_shared_store: Pro	327.79 ▼3.1% (338.11)	23.18 ▲9.5% (21.16)	37.94 ▲15.9% (32.73)	200=9905
/client_side_hello_world_shared_store_defer: Pro	354.38 ▲6.3% (333.38)	24.47 ▲14.5% (21.37)	33.37 ▼3.5% (34.56)	200=10705
/server_side_hello_world_shared_store_controller: Pro	232.71 ▼18.2% (284.51)	24.32 ▼8.6% (26.62)	30.07 ▼27.1% (41.26)	200=7084
/server_side_hello_world: Pro	332.02 ▼0.2% (332.58)	23.52 ▲6.0% (22.19)	34.63 ▼0.3% (34.74)	200=10031
/client_side_log_throw: Pro	345.16 ▼6.8% (370.47)	22.4 ▲15.3% (19.42)	33.26 ▲6.7% (31.19)	200=10430
/server_side_log_throw_plain_js: Pro	293.44 ▼21.9% (375.66)	19.43 ▲0.6% (19.32)	24.91 ▼21.8% (31.84)	200=8870
/server_side_log_throw_raise_invoker: Pro	411.91 ▲0.9% (408.09)	13.44 ▼21.0% (17.0)	25.31 ▼12.1% (28.78)	200=12527
/server_side_redux_app: Pro	273.83 ▼17.5% (331.86)	20.87 ▼6.8% (22.38)	26.24 ▼26.9% (35.89)	200=8277
/server_side_redux_app_cached: Pro	354.87 ▼3.7% (368.48)	15.82 ▼20.4% (19.86)	29.94 ▼4.4% (31.31)	200=10730
/render_js: Pro	309.65 ▼17.7% (376.28)	18.71 ▼4.4% (19.57)	50.38 ▲57.2% (32.04)	200=9357
/pure_component: Pro	349.18 ▲4.7% (333.43)	22.21 ▲4.2% (21.32)	32.8 ▼7.1% (35.31)	200=10551
/turbolinks_cache_disabled: Pro	355.25 ▼2.7% (364.99)	21.78 ▲9.7% (19.86)	32.01 ▼1.7% (32.55)	200=10735
/xhr_refresh: Pro	264.13 ▼8.2% (287.86)	29.76 ▲16.8% (25.48)	42.62 ▲8.7% (39.2)	200=7984
/broken_app: Pro	256.88 ▼24.2% (338.93)	22.67 ▲5.3% (21.53)	52.47 ▲49.5% (35.11)	200=7764
/server_render_with_timeout: Pro	305.7 ▼8.2% (332.88)	28.89 ▲34.9% (21.42)	38.16 ▲9.2% (34.96)	200=9178

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

github-actions · 2026-06-09T01:01:49Z

Pro (shard 2/2) Benchmark Summary

Benchmark	RPS	p50(ms)	p90(ms)	Status
/empty: Pro	1284.01 ▲2.9% (1247.45)	4.06 ▼30.9% (5.87)	8.48 ▼5.5% (8.98)	200=39046
/ssr_shell_error: Pro	352.17 ▲3.4% (340.54)	22.02 ▲0.2% (21.98)	35.68 ▲2.2% (34.91)	200=10642
/ssr_sync_error: Pro	356.93 ▲6.2% (336.18)	21.85 ▲5.9% (20.63)	32.3 ▼6.2% (34.42)	200=10788
/rsc_component_error: Pro	342.57 ▲2.5% (334.1)	22.64 ▲6.2% (21.32)	33.65 ▼6.0% (35.81)	200=10351
/non_existing_stream_react_component: Pro	320.38 ▼6.5% (342.59)	17.91 ▼12.5% (20.48)	21.69 ▼37.3% (34.6)	200=9683
/server_side_redux_app_cached: Pro	416.82 ▲13.1% (368.48)	19.09 ▼3.9% (19.86)	27.99 ▼10.6% (31.31)	200=12593
/loadable: Pro	356.51 ▲16.4% (306.28)	15.52 ▼34.0% (23.51)	30.28 ▼15.7% (35.9)	200=10779
/apollo_graphql: Pro	151.66 ▲8.2% (140.2)	46.88 ▼6.2% (49.96)	70.66 ▼15.4% (83.5)	200=4587
/console_logs_in_async_server: Pro	3.56 ▲11.1% (3.2)	2119.59 ▼0.1% (2121.52)	2154.04 ▼9.8% (2386.88)	200=122
/stream_error_demo: Pro	367.34 ▲12.7% (326.01)	14.9 ▼28.6% (20.88)	28.55 ▼19.2% (35.33)	200=11174
/stream_async_components: Pro	368.95 ▲12.1% (329.01)	21.1 ▼3.3% (21.83)	33.43 ▼8.3% (36.45)	200=11150
/rsc_posts_page_over_http: Pro	309.93 ▼5.5% (328.04)	18.65 ▼16.4% (22.3)	31.6 ▼14.0% (36.73)	200=9377
/rsc_echo_props: Pro	255.01 ▲14.2% (223.36)	31.7 ▼2.5% (32.51)	48.1 ▼6.5% (51.47)	200=7707
/async_on_server_sync_on_client_client_render: Pro	398.01 ▲13.6% (350.33)	19.31 ▼7.4% (20.84)	31.65 ▼4.8% (33.26)	200=12027
/server_router_client_render: Pro	396.44 ▲14.2% (347.24)	19.15 ▼6.1% (20.39)	28.56 ▼12.3% (32.57)	200=11981
/unwrapped_rsc_route_stream_render: Pro	383.45 ▲14.1% (336.14)	19.66 ▼5.8% (20.86)	34.07 ▼6.1% (36.29)	200=11590
/async_render_function_returns_component: Pro	386.62 ▲15.3% (335.23)	20.43 ▼0.3% (20.49)	32.0 ▼5.6% (33.92)	200=11684
/native_metadata: Pro	375.0 ▲11.2% (337.23)	21.22 ▼3.6% (22.0)	31.76 ▼5.0% (33.44)	200=11333
/hybrid_metadata_streaming: Pro	317.99 ▼4.6% (333.44)	18.2 ▼16.4% (21.77)	61.99 ▲74.6% (35.5)	200=9608
/cache_demo: Pro	364.82 ▲12.8% (323.4)	20.98 ▼6.2% (22.36)	34.82 ▼6.6% (37.3)	200=11024
/client_side_hello_world: Pro	401.02 ▲9.8% (365.12)	19.22 ▼0.3% (19.28)	28.12 ▼10.7% (31.49)	200=12117
/client_side_hello_world_shared_store_controller: Pro	379.66 ▲11.6% (340.31)	16.68 ▼19.5% (20.72)	27.38 ▼17.9% (33.36)	200=11473
/server_side_hello_world_shared_store: Pro	297.09 ▲3.1% (288.11)	26.43 ▲1.7% (25.99)	37.53 ▼4.1% (39.14)	200=8984
/server_side_hello_world_shared_store_defer: Pro	290.82 ▲1.5% (286.4)	27.55 ▲8.6% (25.37)	41.95 ▲7.5% (39.04)	200=8789
/server_side_hello_world_hooks: Pro	376.81 ▲7.3% (351.19)	20.24 ▼3.8% (21.04)	34.12 ▼4.4% (35.68)	200=11386
/server_side_log_throw: Pro	370.7 ▲8.9% (340.55)	21.18 ▼1.4% (21.49)	30.23 ▼14.4% (35.33)	200=11203
/server_side_log_throw_raise: Pro	727.62 ▲12.0% (649.73)	3.78 ▼65.3% (10.89)	16.96 ▼9.9% (18.83)	3xx=22131
/server_side_hello_world_es5: Pro	369.7 ▲9.0% (339.28)	15.22 ▼28.1% (21.17)	28.14 ▼22.6% (36.35)	200=11246
/server_side_hello_world_with_options: Pro	380.94 ▲13.2% (336.47)	20.58 ▼5.2% (21.7)	30.6 ▼12.1% (34.83)	200=11513
/client_side_manual_render: Pro	402.06 ▲8.2% (371.65)	15.66 ▼19.6% (19.48)	26.05 ▼18.4% (31.91)	200=12151
/react_router: Pro	301.48 ▼23.7% (395.12)	19.44 ▲10.7% (17.56)	22.47 ▼22.9% (29.14)	200=9113
/css_modules_images_fonts_example: Pro	372.76 ▲8.7% (342.92)	21.11 ▼1.9% (21.51)	33.36 ▲0.3% (33.24)	200=11263
/rendered_html: Pro	381.14 ▲10.2% (345.91)	20.64 ▼0.3% (20.69)	32.9 ▼2.8% (33.86)	200=11516
/react_helmet: Pro	371.41 ▲11.3% (333.74)	21.33 ▼0.9% (21.51)	33.03 ▼6.5% (35.32)	200=11222
/image_example: Pro	376.45 ▲10.5% (340.64)	21.09 ▼5.7% (22.36)	31.64 ▼7.2% (34.08)	200=11373
/posts_page: Pro	257.98 ▲5.2% (245.22)	30.34 ▼50.6% (61.4)	42.16 ▼51.4% (86.72)	200=7798

▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline

justin808 · 2026-06-09T01:54:11Z

Worker B review-thread triage for c84a4be36a94906ada15115dfabde33df51145ad.

Unresolved review threads triaged:

Thread	Reviewer	Triage	Decision
r3377147372	Claude	Optional	`filtered_alert?` naming is readability-only; no behavior or merge risk found. Waived to avoid churn.
r3377147524	Claude	Optional	Duplicate boundary values can repeat the same `Boundary` in the measure-less path, but `any?` is idempotent and side-effect-free. No correctness blocker. Waived.
r3377147839	Claude	Optional	`normalized_bencher_exit_code` guard can be written more positively, but current condition is correct and covered. Waived.
r3377151207	Greptile	Optional	`threshold_side?` returns numeric-or-nil for truthiness. Current use is correct, including `0.0` truthiness. Waived.
r3377151231	Greptile	Noise duplicate	Same positive-guard readability point as Claude r3377147839. No code change.
r3377151283	Greptile	Noise duplicate	Same duplicate-boundary point as Claude r3377147524. No code change.

Validation run from the PR head:

script/ci-changes-detector origin/main -> Benchmark scripts; recommends Lint (Ruby + JS).
bundle exec rubocop benchmarks/lib/bencher_report.rb benchmarks/track_benchmarks.rb benchmarks/spec/bencher_report_spec.rb benchmarks/spec/track_benchmarks_spec.rb -> 4 files inspected, no offenses. RuboCop printed existing rubocop-ast deprecation warnings only.
bundle exec rspec benchmarks/spec/bencher_report_spec.rb benchmarks/spec/track_benchmarks_spec.rb -> 79 examples, 0 failures.
git diff --check -> passed.

Live CI/review state observed:

Release mode: strict-rc from release gate Release gate: react_on_rails 17.0.0 #3823. No auto-merge from this lane.
PR state: open, not draft, mergeStateStatus: CLEAN, head c84a4be36a94906ada15115dfabde33df51145ad.
Required checks: none reported by gh pr checks --required.
Visible checks: complete; claude-review, Greptile Review, Cursor Bugbot, CodeRabbit, CodeQL/build/analyze checks, and Core/Pro benchmark suites passed; path-skipped jobs are consistent with the change detector and benchmark selection.

Full-CI decision: not requested. This PR is benchmark-related, the benchmark label is present, benchmark suites passed, and the live change detector only recommends lint. Full CI would be extra churn unless the coordinator or maintainer wants a stricter final gate.

Worker B lane result: merge-qualified from this lane after waiving optional/noise advisory comments. Coordinator owns final merge sequencing.

…sitives) (#3829) ## Problem ~99% of recent benchmark "performance-regression" alerts on `main` are false positives from a single **orphaned server-side Bencher threshold**. `p90_latency` (and `p99_latency`) were intentionally dropped from `track_benchmarks.rb` `THRESHOLDS` as too noisy ("tail noise can't meet the 1/20 target"). But Bencher thresholds are persistent server-side objects keyed on (branch, testbed, measure) — removing a measure from the CLI `--threshold-*` args stops *updating* the threshold, it never *deletes* it. The p90 metric is still submitted (for the dashboard), so the orphaned p90 threshold keeps evaluating p90 tail noise and firing alerts on nearly every run. These alerts are doubly bad: - They file a `performance-regression` issue (exit≠0), and - They are **invisible in the summary table** — the p90 column has no `:direction`, so it is never flagged 🔴. The filed issue names nothing actionable (this is the #3782 / #3795 "🔴 only appears in the legend" symptom). ### Evidence (public Bencher API, project `react-on-rails-t8a9ncxo`) - 255 most-recent active alerts: **248 p90-latency, 4 rps, 3 p50-latency**. - `main` branch: **164 active alerts → 162 p90-latency, 2 rps** (98.8% p90). - #3782 (`dec4b8c`) filed on exactly two p90 alerts — `/client_side_hello_world: Core` p90 `24.20 > upper 23.61` and `/rendered_html: Core` p90 — with no rps/p50/failed_pct alert. - `main`/`github-actions` carries thresholds for `rps, p50-latency, p90-latency, p99-latency, failed-pct` — two more than the code manages. (p99 is dormant: no p99 metric is submitted.) ## Fix Thread the measures we actually track (`THRESHOLDS` names) into `BencherReport`. An active alert on any **other** measure is classified as *filtered* rather than a regression. This reuses #3822's existing `filtered_alert?` exit-normalization, so a p90-only run no longer writes a candidate or files an issue. The fix is fully in-repo and makes the orphaned threshold harmless regardless of its server-side state. - Tracked measures (`rps`, `p50_latency`, `failed_pct`) are unaffected — including the "hidden" `failed_pct` regressions #3822 added coverage for. - Measure-less and benchmark-less alerts keep their existing fail-safe (still counted). - `tracked_measures` defaults to `nil` (track every measure) so non-production callers (`BenchmarkTable`, specs) are unchanged. ## Validation - `bundle exec rspec benchmarks/spec` — **234 examples, 0 failures** (5 new specs for tracked-measure filtering). - `bundle exec rubocop benchmarks/...` — no offenses. - **End-to-end against the real #3782 reports** (fetched from the Bencher API): `regression?` flips `true → false` (`filtered_alert? = true`) for both the Core and Pro p90 alerts, while rps/p50 alerts are untouched. - `script/ci-changes-detector origin/main` → Benchmark scripts → Lint (Ruby + JS). ## Companion cleanup (separate, manual — not in this PR) This neutralizes the orphaned threshold in code. To also stop it polluting the Bencher dashboard (162 cosmetic active alerts) and being cloned to every new branch, delete the server-side thresholds (needs `BENCHER_API_TOKEN`): - `main` p90 `51fb6a47-0083-4e84-a745-60ee42e3bba4`, p99 `6faa7a68-1835-4cd7-96dc-959220737172` - `master` p90 `d4ad2066-74cb-41f7-93bb-f0885358c56c`, p99 `61bf5ae6-77fa-474f-b6d8-ded630bd0c20` ## Relationship to other work Completes the noise fix that #3810 (fresh-runner confirmation) and #3822 (stale-alert filtering) started: #3822 only filters alerts whose metric *recovered*; a **live** p90 crossing (the dominant case) still passed through. Substantially removes the p90 tail noise behind #3169 and the issue explosion researched in #3755. Refs #3755, #3169, #3795 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- > [!NOTE] > **Medium Risk** > Changes which Bencher alerts count as regressions and can affect main-push candidate filing and CI exit behavior, though scoped to benchmark reporting with explicit specs and a nil default for other callers. > > **Overview** > Adds optional **`tracked_measures`** to `BencherReport.parse` / `#initialize` so active Bencher alerts on measures the repo no longer tracks (e.g. orphaned server-side **p90_latency** thresholds) are moved to **`filtered_alert?`** instead of **`regression?`**, reusing the existing exit-code normalization for filtered-only runs. > > **`track_benchmarks.rb`** now passes `THRESHOLDS.map(&:first)` when parsing the JSON report, so p90-only false positives no longer write regression candidates or file issues the summary table cannot flag. Tracked measures, measure-less fail-safe alerts, and callers that omit `tracked_measures` stay backward compatible. > > Specs cover orphaned p90 filtering, slug/name normalization, and a small fix for `BencherReport.new` with a Hash root in a perf-links test. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 7693f64. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>   ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Fixed false regression alerts for measures no longer tracked in performance benchmarks. The system now filters regression reports to only flag alerts for actively monitored metrics, preventing issue creation based on orphaned or retired threshold measurements that are no longer part of the current tracking configuration.  Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* origin/main: Add Pro license header checker RSC: stop serializing props into embedded payload cache key (#3800) Make PR batches skip customer-feedback issues (#3826) Name the regressed benchmark+measure pairs in the issue body (#3830) Clarify agent batch policy handoffs (#3824) Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) (#3829) Fix auto-bundled component pack normalization (#3818) Filter stale Bencher alerts before reporting (#3822) Tighten benchmark confirmation workflow permissions (#3819) Add issue evaluation skill (#3816) Confirm benchmark regressions on a fresh runner before filing the main issue (#3810) Define agent scope and accelerated RC auto-merge policy (#3808) Replace custom MockClient with async-http Mock::Endpoint (#3703) Docs: per-request data sharing in RSC with React.cache() (#3769) Pro RSC: share unstable_cache across renderer workers via Redis (#3705) [codex] Add PR batch planning skill (#3792) Docs: document PR batch operational lessons (#3789) Document dummy Redux state indexing rationale (#3781) Pro RSC: avoid caching failed Flight renders (#3775) # Conflicts: # packages/react-on-rails-pro/tests/getReactServerComponent.client.test.ts

…o-rsc-rspack-ci * origin/main: Add Pro license header checker RSC: stop serializing props into embedded payload cache key (#3800) Make PR batches skip customer-feedback issues (#3826) Name the regressed benchmark+measure pairs in the issue body (#3830) Clarify agent batch policy handoffs (#3824) Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) (#3829) Fix auto-bundled component pack normalization (#3818) Filter stale Bencher alerts before reporting (#3822) Tighten benchmark confirmation workflow permissions (#3819) Add issue evaluation skill (#3816) # Conflicts: # react_on_rails_pro/spec/dummy/config/webpack/clientWebpackConfig.js

* origin/main: (23 commits) Enforce Pro license headers in CI and pre-commit (#3821) Add RSC payload route-data helper (#3783) [Pro] Fix React.cache request dedupe in generated RSC configs (#3813) Docs: clarify RuboCop autofix ownership (#3827) Add Pro license header checker RSC: stop serializing props into embedded payload cache key (#3800) Make PR batches skip customer-feedback issues (#3826) Name the regressed benchmark+measure pairs in the issue body (#3830) Clarify agent batch policy handoffs (#3824) Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) (#3829) Fix auto-bundled component pack normalization (#3818) Filter stale Bencher alerts before reporting (#3822) Tighten benchmark confirmation workflow permissions (#3819) Add issue evaluation skill (#3816) Confirm benchmark regressions on a fresh runner before filing the main issue (#3810) Define agent scope and accelerated RC auto-merge policy (#3808) Replace custom MockClient with async-http Mock::Endpoint (#3703) Docs: per-request data sharing in RSC with React.cache() (#3769) Pro RSC: share unstable_cache across renderer workers via Redis (#3705) [codex] Add PR batch planning skill (#3792) ...

…-floor-fix * origin/main: (29 commits) Docs: align pr-batch closeout confidence handoff (#3835) Align adversarial review CI polling guidance (#3794) CI: add Pro RSC rspack runtime gate (#3817) Make RSCRoute refetch failures recoverable in production (#3786) Fix Pro node renderer license headers (#3834) Docs: fix anti-patterns in RSC tutorials (#3801) fix(pro): add RSC peer compatibility gate (#3831) Enforce Pro license headers in CI and pre-commit (#3821) Add RSC payload route-data helper (#3783) [Pro] Fix React.cache request dedupe in generated RSC configs (#3813) Docs: clarify RuboCop autofix ownership (#3827) Add Pro license header checker RSC: stop serializing props into embedded payload cache key (#3800) Make PR batches skip customer-feedback issues (#3826) Name the regressed benchmark+measure pairs in the issue body (#3830) Clarify agent batch policy handoffs (#3824) Filter Bencher alerts to tracked measures (drop orphaned p90 false positives) (#3829) Fix auto-bundled component pack normalization (#3818) Filter stale Bencher alerts before reporting (#3822) Tighten benchmark confirmation workflow permissions (#3819) ... # Conflicts: # .github/workflows/benchmark.yml

Filter stale Bencher alerts before reporting

c84a4be

justin808 added the benchmark label Jun 9, 2026

claude Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread benchmarks/lib/bencher_report.rb

claude Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread benchmarks/lib/bencher_report.rb

claude Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread benchmarks/track_benchmarks.rb

coderabbitai Bot approved these changes Jun 9, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread benchmarks/lib/bencher_report.rb

Comment thread benchmarks/track_benchmarks.rb

Comment thread benchmarks/lib/bencher_report.rb

justin808 merged commit 38835e9 into main Jun 9, 2026
56 of 59 checks passed

justin808 deleted the codex/b-3795-benchmark-regression-filter branch June 9, 2026 02:04

justin808 mentioned this pull request Jun 9, 2026

Performance Regression Detected on main (8027078) #3799

Closed

justin808 mentioned this pull request Jun 10, 2026

Follow-up: tune accelerated RC agent policy after batch usage #3814

Closed

coderabbitai Bot mentioned this pull request Jun 10, 2026

Remove temporary /posts_page: Pro regression-issue suppression (#3669 revert) #3853

Merged

coderabbitai Bot mentioned this pull request Jun 17, 2026

Local dedicated-hardware benchmark runner (#4073) #4088

Merged

Uh oh!

Uh oh!

Conversation

justin808 commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Agent Merge Confidence

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

claude Bot commented Jun 9, 2026

Code Review — Filter stale Bencher alerts before reporting

Overview

What's solid

Issues

1. .values can yield duplicate Boundary objects in the measure-less path

2. filtered_alert? naming is ambiguous

3. Double-negative condition in normalized_bencher_exit_code

Minor observations

Verdict

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 9, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

Pro Node Renderer Benchmark Summary

Uh oh!

github-actions Bot commented Jun 9, 2026

Core Benchmark Summary

Uh oh!

github-actions Bot commented Jun 9, 2026

Pro (shard 1/2) Benchmark Summary

Uh oh!

github-actions Bot commented Jun 9, 2026

Pro (shard 2/2) Benchmark Summary

Uh oh!

justin808 commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justin808 commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

1. `.values` can yield duplicate `Boundary` objects in the measure-less path

2. `filtered_alert?` naming is ambiguous

3. Double-negative condition in `normalized_bencher_exit_code`