Skip to content

Latest commit

 

History

History
243 lines (167 loc) · 17 KB

File metadata and controls

243 lines (167 loc) · 17 KB

React Rendering Benchmark

Browser-based benchmark for @data-client/react measuring mount/update scenarios. Includes TanStack Query, SWR, and a plain-React baseline for reference. Built with Webpack via @anansi/webpack-config. Results are reported to CI via rhysd/github-action-benchmark.

Comparison to Node benchmarks

The repo has two benchmark suites:

  • examples/benchmark (Node) — Measures the JS engine only: normalize/denormalize, Controller.setResponse/getResponse, reducer throughput. No browser, no React. Use it to validate core and normalizr changes.
  • examples/benchmark-react (this app) — Measures the full React rendering pipeline: same operations driven in a real browser, with layout and paint. Use it to validate @data-client/react changes; other libraries are included for reference.

Methodology

  • What we measure: Wall-clock time from triggering an action (e.g. init(100) or updateUser('user0')) until a MutationObserver detects the expected DOM change in the benchmark container. Optionally we also record React Profiler commit duration and, with BENCH_TRACE=true, Chrome trace duration.
  • Why: Scenarios are chosen to exercise areas where caching strategies differ: shared-entity updates, referential stability, and derived-view memoization. See js-framework-benchmark "How the duration is measured" for a similar timeline-based approach.
  • Statistical: Warmup runs are discarded; we report median and 95% CI (as percentage of median). Timing scenarios (navigation and mutation) use convergent mode: a single page load per scenario, with warmup iterations followed by adaptive measurement iterations where each iteration produces one sample and convergence is checked inline. This eliminates page-reload overhead between samples for faster, lower-variance results. Deterministic scenarios (ref-stability) run once. Memory scenarios use a separate outer loop with a fresh page per round.
  • No CPU throttling: Runs at native speed with more samples for statistical significance rather than artificial slowdown. Convergent timing scenarios use 5 warmup + up to 50 measurement iterations (small) or 3 warmup + up to 40 (large). Early stopping triggers when 95% CI margin drops below the target percentage.

Comparison philosophy

The primary purpose is to track data-client's own performance — catch regressions and validate improvements. Other libraries are included for context; CI runs data-client only.

Scenarios are designed to isolate the data framework layer: fetching, caching, update propagation, and rendering in response to data changes. Real-world applications will have additional performance considerations (routing, animation, third-party scripts, etc.) beyond what is measured here.

All implementations share presentational components, fixture data, fetch functions, and the useBenchState harness. They only diverge where each library's data layer requires it, using idiomatic patterns from that library's documentation. No implementation builds custom state management on top of its library.

Scenario categories

  • Hot path (in CI, data-client only) — JS-only: init (fetch + render), update propagation, ref-stability, sorted-view. No simulated network. CI runs only data-client scenarios to track regressions; other libraries are benchmarked locally.
  • With network (local) — Same shared-author update but with simulated network delay (consistent ms per "request"). Normalized caches propagate via a single store update; query-keyed caches invalidate and refetch affected queries. Not run in CI — run locally with yarn bench (no CI env) to include these.
  • Memory (local only) — Heap delta after repeated mount/unmount cycles.
  • Startup (local only) — FCP and task duration via CDP Performance.getMetrics.

Scenarios

Hot path (CI)

  • Get list (getlist-100, getlist-500) — Time to show a ListView component that auto-fetches 100 or 500 issues from the list endpoint, then renders (unit: ops/s). Exercises the full fetch + normalization + render pipeline.
  • Get list sorted (getlist-500-sorted) — Mount 500 issues through a sorted/derived view. data-client uses useQuery(sortedIssuesQuery) with Query schema memoization; other libraries use useMemo + sort.
  • Update entity (update-entity) — Time to update one issue and propagate to the UI (unit: ops/s).
  • Update entity sorted (update-entity-sorted) — After mounting a sorted view, update one entity. data-client's Query memoization avoids re-sorting when sort keys are unchanged.
  • Update entity multi-view (update-entity-multi-view) — Update one issue that appears simultaneously in a list, a detail panel, and a pinned-cards strip. Normalized caches propagate via a single store write; query-keyed caches invalidate and refetch each query.
  • Update user (scaling) (update-user, update-user-10000) — Update one shared user with 1,000 or 10,000 mounted issues to test subscriber scaling.
  • Ref-stability (ref-stability-issue-changed, ref-stability-user-changed) — Count of components that received a new object reference after an update (unit: count; smaller is better).
  • Invalidate and resolve (invalidate-and-resolve) — data-client only; invalidates a cached endpoint and immediately re-resolves. Measures Suspense boundary round-trip.

With network (local comparison)

  • Update shared user with network (update-shared-user-with-network) — Same as above with a simulated delay (e.g. 50 ms) per "request."

Memory (local only)

  • Memory mount/unmount cycle (memory-mount-unmount-cycle) — Mount 500 issues, unmount, repeat 10 times; report JS heap delta (bytes) via CDP. Surfaces leaks or unbounded growth.

Startup (local only)

  • Startup FCP (startup-fcp) — First Contentful Paint time via CDP Performance.getMetrics.
  • Startup task duration (startup-task-duration) — Total main-thread task duration via CDP (proxy for TBT).

Expected results

Illustrative relative results with baseline = 100% (plain React useState/useEffect, no data library). For throughput rows, each value is (library ops/s ÷ baseline ops/s) × 100 — higher is faster. For ref-stability rows, the ratio uses the “refs changed” count — lower is fewer components that saw a new object reference. Figures are rounded from the Latest measured results table below (network simulation on); absolute ops/s will vary by machine, but library-to-library ratios are usually similar.

Category Scenarios (representative) data-client tanstack-query swr baseline
Navigation getlist-100, getlist-500, getlist-500-sorted ~98% ~99% ~99% 100%
Navigation list-detail-switch-10 ~2381% ~225% ~218% 100%
Mutations update-entity, update-user, update-entity-sorted, update-entity-multi-view, unshift-item, delete-item, move-item ~8672% ~97% ~99% 100%
Scaling (10k items) update-user-10000 ~9290% ~96% ~100% 100%

Latest measured results (network simulation on)

Median ops/s per scenario; range is approximate 95% CI margin from the runner (stats.ts). Network simulation uses response-size-based delays (NETWORK_SIM_CONFIG in bench/scenarios.ts: 40 ms base latency + 1 ms per 20 records) so list refetches after an author update pay extra latency compared to normalized propagation.

Run: 2026-03-22, Linux (WSL2), yarn build:benchmark-react, static preview + env -u CI npx tsx bench/runner.ts --network-sim true (all libraries; memory scenarios not included). Numbers are machine-specific; use them for relative comparison between libraries, not as absolutes.

Scenario data-client tanstack-query swr baseline
Navigation
getlist-100 20.45 ± 2.3% 20.62 ± 0.8% 20.73 ± 0.2% 20.73 ± 0.5%
getlist-500 12.53 ± 2.8% 12.80 ± 0.2% 12.71 ± 0.3% 12.84 ± 0.2%
getlist-500-sorted 12.92 ± 5.1% 12.93 ± 1.1% 12.90 ± 0.7% 13.16 ± 3.6%
list-detail-switch-10 17.38 ± 8.7% 1.64 ± 1.7% 1.59 ± 1.4% 0.73 ± 0.1%
Mutations
update-entity 666.67 ± 9.0% 6.98 ± 0.4% 7.09 ± 0.4% 7.23 ± 0.8%
update-user 801.28 ± 9.4% 7.04 ± 0.5% 7.18 ± 0.1% 7.24 ± 1.3%
update-entity-sorted 625.00 ± 10.8% 7.10 ± 0.0% 7.10 ± 1.2% 7.29 ± 0.9%
update-entity-multi-view 645.83 ± 7.6% 7.14 ± 0.2% 7.16 ± 0.1% 7.29 ± 0.3%
update-user-10000 144.93 ± 1.7% 1.49 ± 0.6% 1.56 ± 1.7% 1.56 ± 1.5%
unshift-item 465.37 ± 3.6% 6.90 ± 0.4% 7.18 ± 0.2% 7.21 ± 0.3%
delete-item 833.33 ± 6.0% 6.93 ± 0.1% 7.17 ± 0.7% 7.19 ± 0.7%
move-item 333.33 ± 8.9% 6.76 ± 0.6% 6.99 ± 0.3% 6.97 ± 0.2%

[Measured on a Ryzen 9 7950X; 64 GB RAM; Ubuntu (WSL2); Node 24.12.0; Chromium (Playwright)]

Expected variance

Category Scenarios Typical run-to-run spread
Stable getlist-*, update-entity, update-entity-sorted, ref-stability-* 2-5%
Moderate update-user-*, update-entity-multi-view, list-detail-switch-10 5-10%
Volatile memory-mount-unmount-cycle, startup-*, (react commit) suffixes 10-25%

Regressions >5% on stable scenarios or >15% on volatile scenarios are worth investigating.

Interpreting results

  • Higher is better for throughput (ops/s). Lower is better for ref-stability counts and heap delta (bytes).
  • Ref-stability: issueRefChanged and userRefChanged count how many components received a new object reference. Normalized caches preserve referential equality for unchanged entities; query-keyed caches typically create new references on each cache write.
  • React commit: Reported as (react commit) suffix entries. These measure React Profiler actualDuration and isolate React reconciliation cost from layout/paint.
  • Report viewer: Toggle the "Base metrics", "React commit", and "Trace" checkboxes to filter the comparison table. Use "Load history" to compare multiple runs over time.

Adding a new library

  1. Add a new app under src/<lib>/index.tsx (e.g. src/urql/index.tsx).
  2. Implement the BenchAPI interface on window.__BENCH__: init, updateEntity, updateUser, unmountAll, getRenderedCount, captureRefSnapshot, getRefStabilityReport, and optionally mountUnmountCycle, mountSortedView. Use the shared presentational IssuesRow from @shared/components and fixtures from @shared/data. The harness (useBenchState) provides default init, unmountAll, mountUnmountCycle, getRenderedCount, and ref-stability methods; libraries only need to supply updateEntity, updateUser, and any overrides.
  3. Add the library to LIBRARIES in bench/scenarios.ts.
  4. Add a webpack entry in webpack.config.cjs for the new app and an HtmlWebpackPlugin entry so the app is served at /<lib>/.
  5. Add the dependency to package.json and run yarn install.

Running locally

  1. Install system dependencies (Linux / WSL) Playwright needs system libraries to run Chromium. If you see "Host system is missing dependencies to run browsers":

    sudo env PATH="$PATH" npx playwright install-deps chromium

    The env PATH="$PATH" is needed because sudo doesn't inherit your shell's PATH (where nvm-managed node/npx live).

  2. Build and run

    yarn build:benchmark-react
    yarn workspace example-benchmark-react preview &
    sleep 5
    cd examples/benchmark-react && yarn bench

    Or from repo root after a build: start preview in one terminal, then in another run yarn workspace example-benchmark-react bench.

  3. Without React Compiler

    The default build includes React Compiler. To measure impact without it:

    cd examples/benchmark-react
    yarn build:no-compiler     # builds without babel-plugin-react-compiler
    yarn preview &
    sleep 5
    yarn bench:no-compiler     # labels results with [no-compiler] suffix

    Or as a single command: yarn bench:run:no-compiler.

    Results are labelled [no-compiler] so you can compare side-by-side with the default run by loading both JSON files into the report viewer's history feature.

    Env vars for custom combinations:

    • REACT_COMPILER=false — disables the Babel plugin at build time
    • BENCH_LABEL=<tag> — appends [<tag>] to all result names at bench time
    • BENCH_PORT=<port> — port for preview server and bench runner (default 5173)
    • BENCH_BASE_URL=<url> — full base URL override (takes precedence over BENCH_PORT)
  4. Filtering scenarios

    The runner supports CLI flags (with env var fallbacks) to select a subset of scenarios:

    CLI flag Env var Description
    --lib <names> BENCH_LIB Comma-separated library names (e.g. data-client,swr)
    --size <small|large> BENCH_SIZE Run only small (cheap, full rigor) or large (expensive, reduced runs) scenarios
    --action <group|action> BENCH_ACTION Filter by action group (mount, update, mutation, memory) or exact action name. Memory is not run by default; use --action memory to include.
    --scenario <pattern> BENCH_SCENARIO Substring filter on scenario name

    CLI flags take precedence over env vars. Examples:

    yarn bench --lib data-client                # only data-client
    yarn bench --size small                      # only cheap scenarios (full warmup/measurement)
    yarn bench --action mount                    # init, mountSortedView
    yarn bench --action memory                   # memory-mount-unmount-cycle (heap delta; opt-in category)
    yarn bench --action update --lib swr         # update scenarios for swr only
    yarn bench --scenario sorted-view            # only sorted-view scenarios

    Convenience scripts:

    yarn bench:small       # --size small
    yarn bench:large       # --size large
    yarn bench:dc          # --lib data-client
  5. Scenario sizes

    Scenarios are classified as small or large based on their cost:

    • Small (convergent: 5 warmup + 5–50 measurement iterations): getlist-100, update-entity, invalidate-and-resolve, unshift-item, delete-item
    • Small (deterministic, single run): ref-stability-*
    • Large (convergent: 3 warmup + 5–40 measurement iterations): getlist-500, getlist-500-sorted, update-user, update-user-10000, update-entity-sorted, update-entity-multi-view, list-detail-switch-10
    • Memory (opt-in, 1 warmup + 3 measurement rounds): memory-mount-unmount-cycle — run with --action memory

    Timing scenarios use convergent mode (single page load, inline convergence per scenario). Each group uses its own warmup/measurement config. Use --size to run only one group.

Output

The runner prints a JSON array in customBiggerIsBetter format (name, unit, value, range) to stdout. In CI this is written to react-bench-output.json and sent to the benchmark action.

To view results locally, open bench/report-viewer.html in a browser and paste the JSON (or upload react-bench-output.json) to see a comparison table and bar chart.

Profiling

Chrome trace (timeline duration)

Set BENCH_TRACE=true when running the bench to enable Chrome tracing for duration scenarios. Trace files are written to disk; parsing and reporting trace duration is best-effort and may require additional tooling for the trace zip format.

V8 opt/deopt investigation

For the same granularity of V8 optimization investigation available in examples/benchmark (Node), two modes pass --js-flags to Chromium via Playwright:

yarn bench:trace    # --trace-opt --trace-deopt → v8-trace.log
yarn bench:deopt    # --prof → v8-logs/v8-<pid>.log

Both default to --lib data-client --size small for focused, fast investigation. Add other flags as needed (e.g. --scenario update-entity).

bench:trace (BENCH_V8_TRACE=true) launches Chromium with --js-flags="--trace-opt --trace-deopt". The runner uses Playwright launchServer (not launch) so the root browser process stdout/stderr can be piped to v8-trace.logBrowser from launch() does not expose process(). This is the browser equivalent of examples/benchmark's start:trace — look for optimization and deoptimization lines for functions of interest.

bench:deopt (BENCH_V8_DEOPT=true) launches Chromium with --js-flags="--prof". V8 writes per-process profiling logs to v8-logs/v8-<pid>.log. Chromium is multi-process, so several files are created; the renderer log (typically the largest) contains the benchmark's hot path. Process it with:

node --prof-process v8-logs/v8-<pid>.log > processed.txt

Both env vars can be combined for simultaneous trace output and profiling logs. The convenience scripts can be overridden:

BENCH_V8_TRACE=true yarn bench --lib data-client --scenario update-entity
BENCH_V8_DEOPT=true yarn bench --lib data-client --size large