You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
internal(bench): Reduce benchmark variance for tighter CI results (#3880)
* internal(bench-react): Reduce benchmark variance for tighter CI results
Tighten convergent config (15/10 warmup, 80/60 max iterations, 2%/3% CI
targets), add Chromium stability flags, double-GC between scenarios with
longer pauses, tune CI system (CPU governor, swap off, robust server wait).
Made-with: Cursor
* internal(bench): Add system tuning to Node benchmark CI
Same CPU governor and swap tuning as bench-react for consistent results.
Made-with: Cursor
* internal(bench): Pin benchmarks to CPU cores via taskset
Config tuning alone didn't reduce variance — CI runner noise from CPU
migration and shared-infrastructure scheduling is the dominant factor.
Pin benchmark processes to cores 0,1 via taskset to eliminate L1/L2
cache thrashing from core migration. Moderate warmup/iteration counts
back to reasonable levels since extra iterations can't fix environmental
noise.
Made-with: Cursor
Regressions >5% on stable scenarios or >15% on volatile scenarios are worth investigating.
70
+
CI convergence targets: 2% (small scenarios), 3% (large scenarios). Reported margins should not exceed 5%. Regressions >5% on stable scenarios or >10% on moderate scenarios are worth investigating.
Copy file name to clipboardExpand all lines: examples/benchmark-react/README.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The repo has two benchmark suites:
14
14
-**What we measure:** Wall-clock time from triggering an action (e.g. `init(100)` or `updateUser('user0')`) until a MutationObserver detects the expected DOM change in the benchmark container. Optionally we also record React Profiler commit duration and, with `BENCH_TRACE=true`, Chrome trace duration.
15
15
-**Why:** Scenarios are chosen to exercise areas where caching strategies differ: shared-entity updates, referential stability, and derived-view memoization. See [js-framework-benchmark "How the duration is measured"](https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured) for a similar timeline-based approach.
16
16
-**Statistical:** Warmup runs are discarded; we report median and 95% CI (as percentage of median). Timing scenarios (navigation and mutation) use **convergent mode**: a single page load per scenario, with warmup iterations followed by adaptive measurement iterations where each iteration produces one sample and convergence is checked inline. This eliminates page-reload overhead between samples for faster, lower-variance results. Deterministic scenarios (ref-stability) run once. Memory scenarios use a separate outer loop with a fresh page per round.
17
-
-**No CPU throttling:** Runs at native speed with more samples for statistical significance rather than artificial slowdown. Convergent timing scenarios use 5 warmup + up to 50 measurement iterations (small) or 3 warmup + up to 40 (large). Early stopping triggers when 95% CI margin drops below the target percentage.
17
+
-**No CPU throttling:** Runs at native speed with more samples for statistical significance rather than artificial slowdown. Convergent timing scenarios use 8 warmup + up to 60 measurement iterations (small) or 5 warmup + up to 50 (large). Early stopping triggers when 95% CI margin drops below the target percentage (2% small / 3% large in CI). CI pins the benchmark to dedicated CPU cores via `taskset` to reduce scheduling noise.
Regressions >5% on stable scenarios or >15% on volatile scenarios are worth investigating.
105
+
CI convergence targets: 2% (small scenarios), 3% (large scenarios). Reported margins should not exceed 5%. Regressions >5% on stable scenarios or >10% on moderate scenarios are worth investigating.
106
106
107
107
## Interpreting results
108
108
@@ -197,9 +197,9 @@ Regressions >5% on stable scenarios or >15% on volatile scenarios are worth inve
197
197
198
198
Scenarios are classified as `small` or `large` based on their cost:
-**Memory** (opt-in, 1 warmup + 3 measurement rounds): `memory-mount-unmount-cycle` — run with `--action memory`
204
204
205
205
Timing scenarios use convergent mode (single page load, inline convergence per scenario). Each group uses its own warmup/measurement config. Use `--size` to run only one group.
0 commit comments