Skip to content

Commit cc3a44b

Browse files
justin808claude
andauthored
docs(benchmarks): add design doc for #3601 summary-table improvements
## Summary Adds the design doc for the benchmark **summary-table improvements** shipped in **#3627** (issue **#3601**). The doc was written during that PR's design phase but accidentally left untracked, so it never landed. This matches the established `docs/superpowers/specs/` convention — prior features committed their design docs the same way (e.g. `2026-05-21-otel-node-renderer-design.md` from #3382, `2026-05-25-main-ci-release-guard-design.md` from #3407). Status header updated to `Implemented (PR #3627)` to mirror those. ## Why keep it The merged code + comments capture the *what*; this doc captures the *why these and not the alternatives* — the decision rationale that code comments don't: - inline baseline + delta **in the same cell** (vs column-doubling) - **one perf link per benchmark name** (vs per-metric-cell) - send `p90_latency` to Bencher **boundary-less** (recorded for a baseline, never alerted) - **drop the Fail% column but keep the `failed_pct` threshold** (alert safety net) …plus the CI-only verification risks that were called out before the first live benchmark run. ## Notes - **Docs-only.** No code, no behavior change. - Base: freshly fetched `main`. Not auto-closing anything — issue #3601 was already closed by #3627; this just backfills the record. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Documentation-only addition with no code or CI behavior changes. > > **Overview** > Adds **`docs/superpowers/specs/2026-06-04-benchmark-table-improvements-design.md`**, backfilling the design record for issue **#3601** / implementation **#3627** under the existing `docs/superpowers/specs/` pattern. > > The doc is marked **Implemented (PR #3627)** and spells out the product choices that drove the merged benchmark table work: inline baseline + delta in metric cells, one Bencher perf link per benchmark name, boundary-less `p90_latency` with value-only fallback, and hiding Fail% while keeping `failed_pct` alerting. It also documents cell rendering rules, the intended code touchpoints, and CI-only verification risks. > > **Docs-only** — no application or benchmark runtime behavior changes in this PR. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 08448d7. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Marked the benchmark table spec as implemented and linked related issues. * Defined four visible behaviors: inline baseline+delta in metric cells; one performance link per benchmark; p90 shown from a boundary-less p90 latency measure with value-only fallback when no baseline; removed visible Fail% column (thresholds still trigger alerts). * Clarified cell rendering rules (nil/non-numeric handling, rounding, significance markers, arrows vs red/green indicators) and included a final example table. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent e26d659 commit cc3a44b

1 file changed

Lines changed: 107 additions & 0 deletions

File tree

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Benchmark summary table improvements (#3601)
2+
3+
- **Status:** Implemented ([PR #3627](https://github.com/shakacode/react_on_rails/pull/3627))
4+
- **Date:** 2026-06-04
5+
- **Follows:** [#3586](https://github.com/shakacode/react_on_rails/issues/3586) — Consume Bencher reports as JSON for summary-table significance
6+
- **Related:** [#3602](https://github.com/shakacode/react_on_rails/issues/3602) — Fix `PostsPage` Pro route (drives the Fail% decision)
7+
8+
## Background
9+
10+
After #3586 the Pro/Core benchmark summary table renders:
11+
12+
```markdown
13+
| Benchmark | RPS | p50(ms) | p90(ms) | Fail% | Status |
14+
| ---------------------------------------- | ------- | ------- | ------- | ----- | --------- |
15+
| Pro Node Renderer: simple_eval (non-RSC) | 2895.81 | 3.13 | 3.99 | 0.0 | 200=86878 |
16+
```
17+
18+
It bolds + tags 🔴/🟢 only the tracked cells that crossed a Bencher boundary. Each
19+
`BencherReport::Boundary` already carries the `baseline`, but the table never shows
20+
it. Issue #3601 asks for four improvements; all four are in scope for this work.
21+
22+
## Decisions
23+
24+
| Decision | Choice |
25+
| ------------ | -------------------------------------------------------------------------------------------------------------------------------- |
26+
| Scope | All four issue items in one PR |
27+
| Layout | Baseline + delta **inline in the same cell** (no column doubling) |
28+
| Links | **One perf link per benchmark name** (not per metric cell) |
29+
| p90 baseline | Send `p90_latency` to Bencher **boundary-less**; value-only fallback |
30+
| Fail% | **Drop the display column, keep the `failed_pct` threshold** (alert safety net; #3602 made Fail% unreliable as a visible column) |
31+
32+
## Final rendered table
33+
34+
```markdown
35+
| Benchmark | RPS | p50(ms) | p90(ms) | Status |
36+
| --------------------------------- | ---------------------------- | ----------------- | ------- | --------- |
37+
| [simple_eval (non-RSC)](perf_url) | 2895.81 ▲2.3% (2830.0) | 3.13 ▼1.3% (3.17) | 3.99 | 200=86878 |
38+
| [react_ssr (non-RSC)](perf_url) | **2472.91** 🔴 8.4% (2700.0) | 3.65 ▲1.0% (3.61) | 4.70 | 200=74197 |
39+
```
40+
41+
## Cell rendering rules (RPS, p50, p90)
42+
43+
| Case | Render |
44+
| ------------------------------------ | ----------------------------------------------------------------------------------------------------- |
45+
| nil value | `` |
46+
| non-numeric rps (`FAILED`/`MISSING`) | plain text, no delta |
47+
| numeric, **no baseline** in report | value only (e.g. `3.99`) — the expected p90 case |
48+
| numeric, baseline, not significant | `value ▲N.N% (baseline)` — ▲ up / ▼ down; `%` = magnitude |
49+
| numeric, baseline, **significant** | `**value** 🔴 N.N% (baseline)` (regression) / `🟢` (improvement) — emoji replaces arrow, value bolded |
50+
| value == baseline exactly | `value 0.0% (baseline)` — explicit exact/near-zero match, no misleading arrow |
51+
52+
- Delta `%` rounded to 1 decimal; baseline rounded to 2.
53+
- Arrow/emoji marks the **direction of the raw value change** (▲ = value rose). For a
54+
significant move the 🔴/🟢 replaces the arrow; combined with the column's
55+
known direction (RPS higher-is-better, latency lower-is-better) it implies up/down.
56+
- The **benchmark name** is the only linked cell → that benchmark's Bencher perf plot.
57+
- Legend: `▲/▼ non-zero change vs baseline · 0.0% exact/near-zero match · 🔴 significant regression · 🟢 significant improvement (tracked measures) · (n) = baseline`.
58+
59+
## Code changes
60+
61+
1. **`benchmarks/lib/bmf_helpers.rb`**`to_bmf` emits `p90_latency`
62+
(boundary-less: Bencher records history, never alerts). Drop the now-unused
63+
`failed_pct` from `display_rows`. Update the p90 "summary-only" comment (it
64+
now reaches Bencher).
65+
2. **`benchmarks/lib/bencher_perf_url.rb` / `benchmarks/lib/bencher_report.rb`**
66+
leniently extract perf-URL primitives (project slug, branch + testbed UUIDs,
67+
per-benchmark + per-measure UUIDs, start/end time) in `BencherPerfUrl`, while
68+
`BencherReport` wires it in via `perf_url(benchmark_name)`. URL building is
69+
defensive (returns `nil` if any primitive is missing → name renders unlinked).
70+
Strict regression/boundary parsing is untouched; URL fields follow the existing
71+
"informational → read leniently, never raise" rule. Expose the per-(name,
72+
measure) baseline for the renderer (via the existing `boundary`).
73+
3. **`benchmarks/lib/benchmark_table.rb`**`COLUMNS` drops Fail%, gives p90 a
74+
`p90_latency` measure key (baseline lookup only — never significant, no
75+
threshold). Name cell links via `perf_url`; metric cells render value + delta +
76+
`(baseline)` from `boundary(name, measure).baseline` and `significance(...)`.
77+
Update legend. Keep the existing Markdown escaping.
78+
4. **`benchmarks/track_benchmarks.rb`**`THRESHOLDS` **unchanged** (rps,
79+
p50_latency, failed_pct still alert; p90 not added). No display column for
80+
failed_pct.
81+
5. **Specs / fixtures** — update `benchmarks/spec/benchmark_table_spec.rb`,
82+
`benchmarks/spec/bencher_report_spec.rb`,
83+
`benchmarks/spec/bencher_perf_url_spec.rb`
84+
(+ `benchmarks/spec/fixtures/bencher_report_sample.json` gains a p90 measure
85+
and URL primitives), `benchmarks/spec/bmf_collector_spec.rb` (p90 in BMF,
86+
display-row shape), `benchmarks/spec/report_table_integration_spec.rb`, and
87+
the `benchmarks/spec/track_benchmarks_spec.rb` lockstep test → `failed_pct` is
88+
**tracked-but-not-displayed** (intentional; rps/p50 still require a
89+
highlightable column).
90+
6. **`.github/workflows/benchmark.yml`** — no functional change; comment only if the
91+
p90/Fail% rationale needs noting.
92+
93+
## Risks / CI-only verification
94+
95+
- **Perf URL format** and whether the JSON report exposes the UUIDs/times can only be
96+
confirmed on the first benchmark CI run — hence the defensive `nil`→unlinked fallback,
97+
so a wrong/absent field never breaks the table.
98+
- **p90 baseline** likely stays `nil` (boundary-less measure) → p90 shows value-only; if
99+
Bencher returns one, the delta appears automatically. Either way the table is correct.
100+
- Sending `p90_latency` creates a new measure in the Bencher project (intended).
101+
102+
## Testing
103+
104+
- `bundle exec rspec benchmarks/spec` and `bundle exec rubocop benchmarks` locally — the
105+
parser/renderer/sidecar are fully unit-testable against fixtures.
106+
- The live `bencher run` link/baseline behavior is validated on the PR's first benchmark
107+
CI run.

0 commit comments

Comments
 (0)