fix(bench): widen WASM_TIMING_THRESHOLD to 0.75 and add pts-param to TECHNIQUE_MAP

carlos-alm · carlos-alm · commit a2e599d9e6a5 · 2026-06-09T05:42:36.000-06:00
Observed 71% WASM Build ms/file runner variance (18.7 → 32ms) on byte-identical
code, exceeding the prior 70% ceiling. The WASM_TIMING_THRESHOLD was designed to
absorb WASM runner jitter structurally so per-version KNOWN_REGRESSIONS entries
are not needed. Widen to 0.75 to match the empirical maximum observed (71%) with
adequate headroom; native engine stays at strict 25%/50% thresholds.

Also adds pts-param to TECHNIQUE_MAP so inline-array spread edges are correctly
attributed to the points-to technique bucket rather than falling through to other.
diff --git a/tests/benchmarks/regression-guard.test.ts b/tests/benchmarks/regression-guard.test.ts
@@ -72,21 +72,22 @@ const NOISY_METRICS = new Set<string>(['No-op rebuild', '1-file rebuild', 'fnDep
  * than native and dominated by interpreter + GC overhead. The same ±10–20ms
  * of shared-runner jitter therefore lands as a much larger *percentage* swing
  * than on native. Empirically, WASM timing metrics on the publish runner swing
- * run-to-run by +27–67% on byte-identical code (No-op rebuild 15→25 = +67%,
+ * run-to-run by +27–71% on byte-identical code (No-op rebuild 15→25 = +67%,
  * Query time 32.5→44.2 = +36%, fnDeps depth 3/5 ~+31%, Full build 7664→9833
- * = +28%), which previously required a per-version KNOWN_REGRESSIONS entry for
- * each metric on every release — an endless whack-a-mole.
+ * = +28%, Build ms/file 18.7→32 = +71%), which previously required a
+ * per-version KNOWN_REGRESSIONS entry for each metric on every release — an
+ * endless whack-a-mole.
  *
  * Why this is safe: the native engine shares all extraction, resolution, and
  * query logic with WASM (the WASM path only swaps the parser/runtime), so any
  * *real* algorithmic regression shows up on the native numbers too — and native
  * keeps the strict 25% / 50% thresholds. Native is the canary. WASM timing only
  * needs to catch gross WASM-specific catastrophes (the 100–220% blowups seen in
- * v3.0.1–3.4.0), which 70% still flags, while absorbing the ≤67% shared-runner
+ * v3.0.1–3.4.0), which 75% still flags, while absorbing the ≤71% shared-runner
  * jitter. Size metrics (DB bytes/file) are engine-independent and excluded from
  * this widening via SIZE_METRICS below — they keep the strict threshold.
  */
-const WASM_TIMING_THRESHOLD = 0.7;
+const WASM_TIMING_THRESHOLD = 0.75;
 
 /**
  * Metric labels that measure size/count rather than wall-clock time. These are
diff --git a/tests/benchmarks/resolution/resolution-benchmark.test.ts b/tests/benchmarks/resolution/resolution-benchmark.test.ts
@@ -97,6 +97,7 @@ const TECHNIQUE_MAP: Record<string, string> = {
   'pts-set': 'points-to',
   'pts-array-from': 'points-to',
   'pts-spread': 'points-to',
+  'pts-param': 'points-to',
   'define-property': 'ts-native',
 };