Bench(reactivity): Add five hardening benches (#203)

jlukic · web-flow · commit 1abc9573d360 · 2026-05-14T13:48:36.000-04:00
diff --git a/ai/plans/reactivity-hardening.md b/ai/plans/reactivity-hardening.md
@@ -112,7 +112,7 @@ for (const r of toRun) {
 }
 ```
 
-Benchmark (lands alongside in PR A): `flush-fanout-allocation-200x500` — 200 flush cycles, 500 invalidations each. Captures the fan-out shape. Expected: measurable improvement, allocation count drops sharply.
+Benchmark precursor: `flush-fanout-allocation-1000x500` — 1000 flush cycles, 500 invalidations each. Captures the fan-out shape. Expected: measurable improvement, allocation count drops sharply.
 
 ### Item 6: `boundRun` removal + shared `setContext` helper
 
@@ -124,26 +124,26 @@ Benchmark (lands alongside in PR A): `flush-fanout-allocation-200x500` — 200 f
 - Drop `boundRun`. `Reaction.create` calls `reaction.run()` directly.
 - Extract shared `mergeContext(target, additional, defaults)` helper. Each class passes its own seed values (`{ value }` for Signal, `{ firstRun }` for Reaction, raw bag for Dependency).
 
-Benchmark addition: `reaction-create-stop-200kx10` — page-render shape with many short-lived reactions. Expected: small but measurable improvement in `sub-unsub-100k`.
+Benchmark: existing `sub-unsub-100k` measures this directly. Expected: small but measurable improvement. (Currently runs at 22ms on CI — near the noise floor; if the win is small the bench may need amplification to clear σ.)
 
 ### Item 7: Benchmark additions
 
-These workloads aren't gating Items 1-6 (those land on correctness merits), but they baseline the perf claims and gate the larger Item 9 rewrite. Each follows the existing pattern in `bench-signal.js` — `performance.mark` + `performance.measure`, sink-anchored, iteration counts sized to clear the σ-floor.
+These workloads aren't gating Items 1-6 (those land on correctness merits), but they baseline the perf claims and gate the larger Item 9 rewrite. Each follows the existing pattern in `bench-signal.js` — `performance.mark` + `performance.measure`, sink-anchored, iteration counts grounded in actual CI durations of existing benches to clear the σ-floor with headroom.
+
+Ships as a precursor PR to main so the `tip-of-tree` side of tachometer-CI emits the same measurements as `this-change` when the hardening PR runs.
 
 Stable-dependency churn (gates Item 9):
-- `reactive-stable-fanout-5000x500` — 5000 reactions each reading the same single signal, 500 invalidations
-- `reactive-stable-deps-3reads-50kx200` — 50k reactions × 3 signals × 200 cycles (median templating shape)
-- `reaction-stable-deps-10kx1k` — 10k reactions × 1 signal × 1000 invalidations
+- `reactive-stable-fanout-5000x100` — 5000 reactions each reading the same single signal, 100 invalidations
+- `reactive-stable-deps-3reads-5000x100` — 5000 reactions × 3 signals × 100 cycles (median templating shape)
 
 Computed lifecycle (informs Item 8):
-- `computed-unobserved-1000x1000` — 1000 computed signals derived from a root, no external subscriber, root updated 1000 times
-- `computed-subscribe-unsubscribe-50k` — create computed, attach subscriber, detach, repeat
+- `computed-unobserved-200x500` — 200 computed signals derived from a root, no external subscriber, root updated 500 times
+- `computed-subscribe-unsubscribe-10k` — create computed, attach subscriber, detach, repeat 10k times
 
 Scheduler allocation (verifies Item 5):
-- `flush-fanout-allocation-200x500` — already specified under Item 5
+- `flush-fanout-allocation-1000x500` — 1000 flush cycles, 500-subscriber fanout each
 
-Reaction lifecycle (verifies Item 6):
-- `reaction-create-stop-200kx10` — already specified under Item 6
+**Dropped from original list:** `reaction-stable-deps-10kx1k` (companion measurement redundant with the wide-fan + median-shape pair) and `reaction-create-stop-200kx10` (overlaps existing `sub-unsub-100k`). Five new benches; Item 6 cites existing `sub-unsub-100k` directly.
 
 ### Item 8: Unify `derive` / `computed` with lazy reference counting
 
@@ -165,8 +165,8 @@ Tests:
 - Existing `computed-chain-10x60k` benchmark stays flat or improves (subscribers exist throughout the run)
 
 Acceptance criteria (vs Item 7's baselines):
-- `computed-unobserved-1000x1000` improves dramatically — near-zero work for the unobserved case
-- `computed-subscribe-unsubscribe-50k` shows acceptable reference-counting overhead
+- `computed-unobserved-200x500` improves dramatically — near-zero work for the unobserved case
+- `computed-subscribe-unsubscribe-10k` shows acceptable reference-counting overhead
 
 ### Item 9: Dependency-tracking rewrite (gated)
 
@@ -175,8 +175,8 @@ Acceptance criteria (vs Item 7's baselines):
 **Counter:** `Set.delete` + `Set.add` on small sets is fast and allocation-free on modern V8. The existing `reaction-dep-diff-45k` benchmark measures the changing-dependency case; nothing measures stable-deps churn today. The hypothesis may not survive measurement.
 
 **Gating:** Item 7's stable-dep benchmarks must show meaningful headroom before this PR lands. Specific thresholds:
-- `reactive-stable-fanout-5000x500` shows ≥2× headroom attributable to Set churn
-- `reactive-stable-deps-3reads-50kx200` confirms in the median shape
+- `reactive-stable-fanout-5000x100` shows ≥2× headroom attributable to Set churn
+- `reactive-stable-deps-3reads-5000x100` confirms in the median shape
 
 **If proceeding — versioned mark-and-sweep edges:**
 - Each reaction has an iteration counter, incremented per run
@@ -187,8 +187,8 @@ Acceptance criteria (vs Item 7's baselines):
 Side benefit: this gives natural transactional error recovery. If the callback throws, the partial sweep is skipped and dependencies remain intact for the next run.
 
 **Acceptance criteria:**
-- `reactive-stable-fanout-5000x500` improves ≥2×
-- `reactive-stable-deps-3reads-50kx200` improves ≥1.5×
+- `reactive-stable-fanout-5000x100` improves ≥2×
+- `reactive-stable-deps-3reads-5000x100` improves ≥1.5×
 - `reaction-dep-diff-45k` flat or better (changing-set case must not regress)
 - `sub-unsub-100k` flat (creation/teardown path unchanged)
 
diff --git a/packages/reactivity/bench/tachometer/bench-signal.js b/packages/reactivity/bench/tachometer/bench-signal.js
@@ -325,6 +325,121 @@ let sink = null;
   r.stop();
 }
 
+/*******************************
+      Reactivity Hardening — Items 5 / 8 / 9
+*******************************/
+
+// Stable-dependency churn — gates Item 9 dep-tracking rewrite.
+// N reactions × stable deps × M invalidations. Existing reaction-dep-diff-45k
+// measures the changing-dependency case; these isolate the stable-set churn
+// hypothesized to dominate per-expression workloads.
+
+// reactive-stable-fanout-5000x100 — wide-fan case, single stable dep per reaction
+{
+  const sig = new Signal(0);
+  const reactions = new Array(5000);
+  for (let i = 0; i < 5000; i++) {
+    reactions[i] = Reaction.create(() => {
+      sink = sig.get();
+    });
+  }
+  // purpose: 5000 reactions × 1 signal × 100 invalidations. Per-run Set.delete + add on a stable dep edge.
+  performance.mark(startMark('reactive-stable-fanout-5000x100'));
+  for (let i = 0; i < 100; i++) {
+    sig.set(i + 1);
+    Reaction.flush();
+  }
+  performance.measure('reactive-stable-fanout-5000x100', startMark('reactive-stable-fanout-5000x100'));
+  for (let i = 0; i < 5000; i++) { reactions[i].stop(); }
+}
+
+// reactive-stable-deps-3reads-5000x100 — median templating shape, 3 stable deps per reaction
+{
+  const sigA = new Signal(0);
+  const sigB = new Signal(0);
+  const sigC = new Signal(0);
+  const reactions = new Array(5000);
+  for (let i = 0; i < 5000; i++) {
+    reactions[i] = Reaction.create(() => {
+      sink = sigA.get() + sigB.get() + sigC.get();
+    });
+  }
+  // purpose: 5000 reactions × 3 signals × 100 cycles. Each run clears + re-adds 3 stable dep edges.
+  performance.mark(startMark('reactive-stable-deps-3reads-5000x100'));
+  for (let i = 0; i < 100; i++) {
+    sigA.set(i + 1);
+    Reaction.flush();
+  }
+  performance.measure('reactive-stable-deps-3reads-5000x100', startMark('reactive-stable-deps-3reads-5000x100'));
+  for (let i = 0; i < 5000; i++) { reactions[i].stop(); }
+}
+
+// Computed lifecycle — informs Item 8 lazy refcounted computed.
+
+// computed-unobserved-200x500 — eager-recompute baseline.
+// Under the current code, computeds re-run on every source change regardless
+// of whether anyone observes them. Post-Item-8 this drops to near-zero — the
+// computed stays dormant without subscribers.
+{
+  const root = new Signal(0);
+  const computeds = new Array(200);
+  for (let i = 0; i < 200; i++) {
+    computeds[i] = Signal.computed(() => root.get() + i);
+  }
+  // anchor to keep the array live for DCE; values read outside any reaction so no subscriber attaches
+  let preamble = 0;
+  for (let i = 0; i < 200; i++) { preamble += computeds[i].get(); }
+  sink = preamble;
+  // purpose: 200 unobserved computed signals, root updated 500 times. Measures the eager-recompute cost the refcount removes.
+  performance.mark(startMark('computed-unobserved-200x500'));
+  for (let i = 0; i < 500; i++) {
+    root.set(i + 1);
+    Reaction.flush();
+  }
+  performance.measure('computed-unobserved-200x500', startMark('computed-unobserved-200x500'));
+}
+
+// computed-subscribe-unsubscribe-10k — refcount machinery overhead on the subscribe/unsubscribe path.
+// Function-scoped fixture so each cycle's computed + observer are GC-eligible.
+{
+  const root = new Signal(0);
+  const cycle = () => {
+    const c = Signal.computed(() => root.get() + 1);
+    const r = Reaction.create(() => {
+      sink = c.get();
+    });
+    r.stop();
+  };
+  // purpose: 10000 create-computed + attach-observer + detach cycles. Lifecycle cost the refcount path must keep acceptable.
+  performance.mark(startMark('computed-subscribe-unsubscribe-10k'));
+  for (let i = 0; i < 10_000; i++) { cycle(); }
+  performance.measure('computed-subscribe-unsubscribe-10k', startMark('computed-subscribe-unsubscribe-10k'));
+}
+
+// Scheduler allocation — verifies Item 5 set-swap.
+
+// flush-fanout-allocation-1000x500 — amplified per-flush spread cost.
+// Existing reactive-fanout-500x1200 measures the same shape at 90ms; this
+// runs more flushes (1000) so per-flush array allocation is proportionally
+// more of the total. Set-swap eliminates the spread.
+{
+  const sig = new Signal(0);
+  const reactions = new Array(500);
+  for (let i = 0; i < 500; i++) {
+    reactions[i] = Reaction.create(() => {
+      sink = sig.get();
+    });
+  }
+  // purpose: 500 subscribers fanout across 1000 flush cycles. Each flush spreads pendingReactions; tests per-flush allocation churn.
+  performance.mark(startMark('flush-fanout-allocation-1000x500'));
+  for (let i = 0; i < 1000; i++) {
+    sig.set(i + 1);
+    Reaction.flush();
+  }
+  performance.measure('flush-fanout-allocation-1000x500', startMark('flush-fanout-allocation-1000x500'));
+  for (let i = 0; i < 500; i++) { reactions[i].stop(); }
+}
+
 /*******************************
       Results
 *******************************/
diff --git a/packages/reactivity/bench/tachometer/tachometer-ci-signal.json b/packages/reactivity/bench/tachometer/tachometer-ci-signal.json
@@ -22,7 +22,12 @@
         { "mode": "performance", "entryName": "sub-unsub-100k" },
         { "mode": "performance", "entryName": "reaction-flush-noop-5m" },
         { "mode": "performance", "entryName": "reaction-coalesce-400x100" },
-        { "mode": "performance", "entryName": "reaction-dep-diff-45k" }
+        { "mode": "performance", "entryName": "reaction-dep-diff-45k" },
+        { "mode": "performance", "entryName": "reactive-stable-fanout-5000x100" },
+        { "mode": "performance", "entryName": "reactive-stable-deps-3reads-5000x100" },
+        { "mode": "performance", "entryName": "computed-unobserved-200x500" },
+        { "mode": "performance", "entryName": "computed-subscribe-unsubscribe-10k" },
+        { "mode": "performance", "entryName": "flush-fanout-allocation-1000x500" }
       ]
     },
     {
@@ -42,7 +47,12 @@
         { "mode": "performance", "entryName": "sub-unsub-100k" },
         { "mode": "performance", "entryName": "reaction-flush-noop-5m" },
         { "mode": "performance", "entryName": "reaction-coalesce-400x100" },
-        { "mode": "performance", "entryName": "reaction-dep-diff-45k" }
+        { "mode": "performance", "entryName": "reaction-dep-diff-45k" },
+        { "mode": "performance", "entryName": "reactive-stable-fanout-5000x100" },
+        { "mode": "performance", "entryName": "reactive-stable-deps-3reads-5000x100" },
+        { "mode": "performance", "entryName": "computed-unobserved-200x500" },
+        { "mode": "performance", "entryName": "computed-subscribe-unsubscribe-10k" },
+        { "mode": "performance", "entryName": "flush-fanout-allocation-1000x500" }
       ]
     }
   ]

Original file line number	Diff line number	Diff line change
`@@ -22,7 +22,12 @@`
`22`	`22`	`{ "mode": "performance", "entryName": "sub-unsub-100k" },`
`23`	`23`	`{ "mode": "performance", "entryName": "reaction-flush-noop-5m" },`
`24`	`24`	`{ "mode": "performance", "entryName": "reaction-coalesce-400x100" },`
`25`		`- { "mode": "performance", "entryName": "reaction-dep-diff-45k" }`
	`25`	`+ { "mode": "performance", "entryName": "reaction-dep-diff-45k" },`
	`26`	`+ { "mode": "performance", "entryName": "reactive-stable-fanout-5000x100" },`
	`27`	`+ { "mode": "performance", "entryName": "reactive-stable-deps-3reads-5000x100" },`
	`28`	`+ { "mode": "performance", "entryName": "computed-unobserved-200x500" },`
	`29`	`+ { "mode": "performance", "entryName": "computed-subscribe-unsubscribe-10k" },`
	`30`	`+ { "mode": "performance", "entryName": "flush-fanout-allocation-1000x500" }`
`26`	`31`	`]`
`27`	`32`	`},`
`28`	`33`	`{`
`@@ -42,7 +47,12 @@`
`42`	`47`	`{ "mode": "performance", "entryName": "sub-unsub-100k" },`
`43`	`48`	`{ "mode": "performance", "entryName": "reaction-flush-noop-5m" },`
`44`	`49`	`{ "mode": "performance", "entryName": "reaction-coalesce-400x100" },`
`45`		`- { "mode": "performance", "entryName": "reaction-dep-diff-45k" }`
	`50`	`+ { "mode": "performance", "entryName": "reaction-dep-diff-45k" },`
	`51`	`+ { "mode": "performance", "entryName": "reactive-stable-fanout-5000x100" },`
	`52`	`+ { "mode": "performance", "entryName": "reactive-stable-deps-3reads-5000x100" },`
	`53`	`+ { "mode": "performance", "entryName": "computed-unobserved-200x500" },`
	`54`	`+ { "mode": "performance", "entryName": "computed-subscribe-unsubscribe-10k" },`
	`55`	`+ { "mode": "performance", "entryName": "flush-fanout-allocation-1000x500" }`
`46`	`56`	`]`
`47`	`57`	`}`
`48`	`58`	`]`