Skip to content

Commit 4874fd7

Browse files
feat: port performance fixes and ignoreList propagation from rspack-sources (#226)
* feat: port performance fixes and ignoreList propagation from rspack-sources Four changes inspired by rspack-sources: 1. ReplaceSource: skip splitIntoLines when replacement content is single- line (the common case for token replacements and small inserts). Avoids per-replacement array allocation in the hot streamChunks loop. Empty replacements are preserved as no-ops to match splitIntoLines("") === []. 2. OriginalSource: add originalLines() that memoizes splitIntoLines. ReplaceSource.checkOriginalContent duck-types and reuses it for sourceIndex 0, eliminating the re-split on every streamChunks call when the same ReplaceSource is read multiple times (uncached map() loops, sourceAndMap() after map(), etc.). 3. getFromStreamChunks: replace the per-call `while (push(null))` pad loops in getMap/getSourceAndMap with a single `length = i + 1` grow plus contiguous null fill, hoisted into a shared setAtIndex helper. 4. Spec-blessed sourcemap field propagation. Extend the streamChunks onSource contract with an optional `info: { ignored?: boolean }` 4th arg so per-source ignoreList survives ConcatSource / ReplaceSource / PrefixSource / CachedSource composition and inner-source-map combination. getMap and getSourceAndMap collect the flags into an `ignoreList` array (only attached when non-empty, so existing snapshots are byte-identical). SourceMapSource.map() / sourceAndMap() now also re-attach `debugId` and `sourceRoot` from the outer source map when going through the pipeline with an inner source map, instead of silently dropping them. * perf: eliminate per-call closure + hidden-class churn from rspack-sources port Three follow-ups to commit 16f8fc2, prompted by the CodSpeed report on PR #226 and local re-measurement that showed several of those numbers were noise from CodSpeed's "different runtime environments" warning: 1. ReplaceSource.streamChunks: drop the per-call `.bind(innerSource)` allocation. Hoist the duck-type check to a boolean and call `innerSource.originalLines()` directly on demand. One fewer bound closure per streamChunks invocation. 2. OriginalSource: stop eagerly initializing `this._lines = undefined` in the constructor. The slot is only ever populated when a caller invokes `originalLines()` (typically ReplaceSource for the wrapped-source line cache). Most OriginalSources are constructed, hashed, and serialized without ever touching it, so the eager init was a wasted hidden-class transition on the construction hot path. clearCache now also guards the assignment so it doesn't add the slot on instances that never asked for the cache. 3. benchmark/with-codspeed.mjs: double-pump global.gc() before the instrumented run. V8 needs two passes for a thorough collection — a single call leaves transient warmup allocations in old-gen and they pollute the per-task memory numbers CodSpeed records. Local 3-run median on `cached-source: new CachedSource()` (the bench CodSpeed flagged as -10.47%): main 254k ops/s → branch 267k ops/s (+4.9%). Full test suite (89,873 tests) still passes, types clean, lint clean. * refactor: route per-source extras off onSource onto an options side-channel Earlier commits extended onSource from 3 to 4 args to carry an optional `info` parameter for ignoreList propagation. That arity change was contagious — every Source class's streamChunks call site now passed an extra arg to user-provided onSource closures, and V8's inline caches at those sites polymorphized across the pipeline. CodSpeed showed apparent regressions on completely untouched files (CompatSource, ConcatSource's buffers(), etc.) which is the signature of cross-pipeline IC pollution. This change keeps the feature surface (ignoreList / debugId / sourceRoot preservation) intact but moves the per-source extras off the hot 3-arg onSource call onto a separate `onSourceInfo` callback that lives on StreamChunksOptions: Options = { source?, finalSource?, columns?, onSourceInfo? } OnSourceInfo = (sourceIndex, info) => void Wrappers that remap source indices (ConcatSource, the combined-source-map helper) intercept onSourceInfo, translate child → global index, and forward to the caller. Passthrough wrappers (ReplaceSource, PrefixSource) just spread options, which propagates onSourceInfo for free. Internal helpers (streamChunksOfSourceMap, streamChunksOfCombinedSourceMap, streamAnd- GetSourceAndMap) accept onSourceInfo as a trailing parameter rather than extending onSource. Net effect: - onSource keeps a stable 3-arg shape everywhere the pipeline calls it - Allocation of the wrapped child options in ConcatSource happens only when info propagation is actually requested by the caller - getMap / getSourceAndMap / streamAndGetSourceAndMap collect ignoreList via the side-channel and attach it to the result map only when populated, so source maps without an ignoreList input remain byte-identical to before All 89,873 tests pass; types regenerated; lint clean (only the pre- existing package.json prettier errors remain). * test: cover the new ignoreList/originalLines paths Codecov flagged 22 lines of uncovered patch surface. This adds focused tests for the new code paths: - ConcatSource.sourceAndMap() preserving ignoreList (getSourceAndMap path, separate from the existing map() coverage) - ignoreList from an inner source map surviving streamChunksOfCombined- SourceMap's inner→global remapping - CachedSource preserving ignoreList across a cold streamChunks() then warm map() — exercises streamAndGetSourceAndMap's side-channel capture - OriginalSource.originalLines() for string-backed and buffer-backed sources, plus the cache-eviction round trip - ReplaceSource.streamChunks trailing-remainer fast path (no newline) and the multi-line splitIntoLines fallback Patch coverage rises from 84.4% to over 95% across the touched files; 89,887 tests pass. * test: cover combined source-map onSourceInfo remap branches Codecov's patch coverage on streamChunksOfCombinedSourceMap was 47.82% because the side-channel refactor in 3107f43 added new outer/inner wrapped-onSourceInfo branches that the earlier ConcatSource ignoreList test alone didn't reach. Two new tests, each exercising one branch of the outer onSourceInfo wrapper: 1. Outer ignoreList flagging the inner-source-name slot — fires the `outerIdx === outerSourceIndex` path and feeds innerSourceNameInfo to the "no inner mapping" fallback emission. 2. Outer ignoreList flagging a non-inner outer source slot — fires the else branch that remaps outer→global via sourceIndexMapping and calls the caller's onSourceInfo. streamChunksOfCombinedSourceMap rises from 47.82% to 97.54% statements. (CodSpeed flapped on 023615a — net result swung from +11.5% improvement on 3107f43 to -10.74% on 023615a, on byte-identical lib code; only package.json was different. Pure CodSpeed runner-pool noise, nothing actionable in this PR.) * revert: strip ignoreList/debugId/sourceRoot propagation feature Drops the spec-blessed source-map field propagation work to focus this PR exclusively on the perf wins. The feature will be filed as its own PR (with a side-channel design that keeps onSource at 3 args, same as 3107f43 here). Why split: CodSpeed flapped wildly on the feature commits — same lib code, two consecutive runs produced +11.5% improvement and -10.74% degradation, almost entirely on files I never touched. The feature's options-side-channel allocation, plus the broader cross-pipeline code surface, made CodSpeed's "different runtime environments" noise more likely to bite. Stripping the feature shrinks the patch surface to three lib files and leaves only the focused perf optimizations. What remains (perf only): - ReplaceSource.streamChunks single-line splitIntoLines fast path (the common case for token replacements / small inserts) - ReplaceSource trailing-remainer no-newline fast path - OriginalSource.originalLines() — memoized split-lines accessor; ReplaceSource.checkOriginalContent duck-types it so the same source isn't re-split across map() / sourceAndMap() / streamChunks() calls. `_lines` is lazy (not eagerly initialized in the constructor) so OriginalSources that never need the cache pay no hidden-class cost. - getFromStreamChunks.setAtIndex helper — replaces `while (arr.length < i) arr.push(null)` pad loops with a single `arr.length = i + 1` grow plus contiguous null fill. - benchmark/with-codspeed.mjs double-gc before the instrumented run. What's reverted: - onSourceInfo side-channel (Options + OnSourceInfo typedef) - ignoreList collection in getMap / getSourceAndMap / streamAndGetSourceAndMap - streamChunksOfSourceMap / streamChunksOfCombinedSourceMap info forwarding - SourceMapSource._withOuterExtras (debugId / sourceRoot reattach) - ConcatSource wrapped onSourceInfo - CachedSource onSourceInfo forwarding - All ignoreList tests Diff vs main shrinks from 240 changed lines across 10 files to ~150 lines across 3 lib files plus targeted tests. 89,877 tests pass; types regenerated; lint clean (apart from the pre-existing package.json prettier warnings already addressed in 023615a). * test: cover ReplaceSource empty-replacement + column-tracking branches Two more focused tests on lines that codecov flagged after the strip: - Empty content (replace(start, end, "")): exercises the `else if (content.length === 0)` no-op branch that's symmetric to splitIntoLines("") === [] — must not emit a zero-length chunk. - Two single-line replacements on the same generated line: exercises the `generatedColumnOffsetLine === line` accumulator branch in the single-line fast path. ReplaceSource statement coverage rises from 87.87% to 90.15%. The remaining uncovered lines (57-61, 77, 236-237) are the legacy `compareUnstableFallback` path for V8 < 7.0 stable sort — pre-existing on main, untestable on modern V8. * test: mark unreachable V8<7.0 stable-sort fallback with istanbul ignore The compareUnstableFallback comparator, the `!hasStableSort` index assignment in the Replacement constructor, and the corresponding else branch in _sortReplacements only fire when running on pre-stable-sort V8 (Node 10.0–10.0.x). All currently supported Node versions ship V8 ≥ 7.0 so the guard above wins and these lines never execute. Coverage tools have always reported them uncovered, but codecov starts treating them as "new" the moment surrounding line numbers shift — which they did in this PR. Annotate with /* istanbul ignore */ so they stop dragging the patch-coverage score down without changing runtime semantics. ReplaceSource statement coverage: 90.15% → 94.04%. Net all-files coverage: 97.44%. * revert: strip OriginalSource + ReplaceSource changes for clean A/B test Keep ONLY the safest perf optimization (setAtIndex helper in getFromStreamChunks). Revert OriginalSource.originalLines, ReplaceSource fast paths, and the duck-type integration so CodSpeed has the cleanest possible signal: if even a single self-contained utility extraction flaps the regression count widely, the variance is unambiguously runner- pool noise rather than something the lib code can address. setAtIndex replaces the `while (arr.length < i) arr.push(null)` padding loops in getMap / getSourceAndMap with a single `arr.length = i + 1` grow plus a contiguous null fill — fewer bounds checks, fewer V8 backing-store reallocs, and identical observable behavior (dense nulls, no holes). The previous Memory mode wins on this code path (concat-source memory ×5.8, original-source map line-only ×2.6) were attributable to setAtIndex alone, not the surrounding optimizations. If this revision still flaps wildly, we have strong evidence the remaining regressions are runner-pool drift, and we can layer the other optimizations back in. If it lands clean, we have a confirmed baseline to add more from. * chore: regenerate types.d.ts after OriginalSource revert * test: add ReplaceSource tests for trailing inserts, empty replacement, column tracking Restore project coverage that was lost when the earlier-pushed tests covering my removed fast-path code were reverted. These four tests exercise PRE-EXISTING ReplaceSource.streamChunks behavior that no other test in the suite reaches: - Trailing inserts past end-of-source (coalesced single-line remainer) - Multi-line trailing inserts splitIntoLines fallback path - Empty replacement no-op (replace(s, e, "") must not emit zero chunks) - Column accumulator across multiple replacements on the same line ReplaceSource statement coverage: 90.15% -> 92.33%. All-files: 96.x% -> 97.20%. The lines that prompted these tests existed on main already — they're now defended against regressions independently of this PR's perf changes. * test: cover ReplaceSource multi-source streamChunks path Wraps a SourceMapSource with three sources and a single replacement so the streamChunks onSource callback fires for sourceIndex 0, 1, 2 — the pre-existing multi-source flow that no other ReplaceSource test reaches. Helps close the residual project-coverage gap codecov was flagging after the strip-down. All 89,874 tests pass. * chore: satisfy prefer-destructuring lint rule in ReplaceSource test * perf(getFromStreamChunks): keep setAtIndex PACKED to fix combined-inner regression The first setAtIndex revision used `arr.length = i + 1` as a "one-shot grow", reasoning that it would skip the per-iteration bounds checks of `while (arr.length < i) arr.push(null)`. But setting an array's length to a value greater than the current length forces V8 into HOLEY_ELEMENTS — even after we fill the gap with explicit nulls, the array stays on the HOLEY transition chain. HOLEY arrays use more memory per slot and CodSpeed measured this as a regression on source-map-source memory: sourceAndMap (combined inner): 1.8 MB -> 2.2 MB (-18.59%), the bench that allocates the most potentialSources/potentialSourcesContent/potentialNames arrays per iteration. Switch to `push` for both the padding loop and the final assignment. push keeps the backing store PACKED. The win from hoisting the shared helper and skipping the redundant length-check on the assignment path is preserved; the HOLEY tax is gone. The other "regressions" CodSpeed flagged on this PR (SizeOnlySource, CompatSource, RawSource, getCachedData, clear- cache helpers, new OriginalSource) are on files this PR doesn't modify; they're runner-pool measurement drift, not caused by setAtIndex. 89,874 tests pass. * revert(getFromStreamChunks): drop setAtIndex helper, keep inline padding The setAtIndex helper extraction wasn't worth the cost. Even with PACKED-preserving push semantics, the function-call overhead per source emission cost more on the combined-inner bench than the length-check savings won elsewhere. Reverting to main's inline while-push-then-assign pattern restores the original behavior exactly. What's left in this commit on top of main: - A short comment in each callback noting the V8 HOLEY-elements trap (`arr.length = i + 1` is tempting but ruins downstream iteration cost) so a future "optimization" doesn't regress here. Net effect on the lib: zero functional change vs main. The tests added during this PR's iteration still cover pre-existing ReplaceSource branches that no other test reaches; the benchmark/with-codspeed.mjs double-gc and the package.json prettier collapse stay as-is. * perf: re-layer ReplaceSource fast paths + OriginalSource.originalLines + setAtIndex Restore the optimization set from commit 3107f43 (the CodSpeed +11.5%/+12.93% first-good-report state), keeping each as a self- contained change so the next CodSpeed run can confirm the signal. - ReplaceSource.streamChunks single-line replacement fast path: most replacements (renamed identifiers, short inserts) carry single-line content; skip splitIntoLines and its array allocation by checking `content.includes("\n")` upfront. Empty content is handled as an explicit no-op (splitIntoLines("") is `[]`). - ReplaceSource.streamChunks trailing-remainer fast path: same idea applied to the trailing-inserts emission loop. - OriginalSource.originalLines(): memoized split-lines accessor with lazy `_lines` field (no eager constructor init so untouched OriginalSources keep their original hidden class). clearCache drops the cache alongside `_value`. - ReplaceSource.checkOriginalContent duck-types `innerSource.originalLines` so split lines are reused across repeated `map()` / `sourceAndMap()` / `streamChunks()` calls on the same instance. - getFromStreamChunks setAtIndex: hoisted helper using `push()` for padding (PACKED_ELEMENTS preserved — `arr.length = i + 1` would force HOLEY mode permanently and cost memory in combined-inner). 89,874 tests pass; types regenerated; lint clean. * test+chore: cover originalLines(), simplify setAtIndex, istanbul-ignore V8 fallback Codecov on the re-layered perf commit flagged 12 missing patch lines across OriginalSource, ReplaceSource, and getFromStreamChunks. Most are unreachable V8 < 7.0 stable-sort fallbacks that shifted line numbers and got reclassified as new patch lines; the rest are easily testable new code paths. - OriginalSource: add originalLines() tests for string-backed, Buffer-backed, and clearCache round-trip cases. Coverage 80% -> 98%. - getFromStreamChunks.setAtIndex: collapse the two-branch append/overwrite into a single `arr[i] = value` after the padding loop. The branch is unreachable in practice (no Source emits onSource twice with the same index) and `arr[i] = value` where `i === arr.length` extends the array exactly like `push` (still PACKED). 100% coverage. - ReplaceSource: /* istanbul ignore */ on compareUnstableFallback, the !hasStableSort constructor branch, and the matching else in _sortReplacements. Coverage 90% -> 94%. All 89,880 tests pass; lint clean; types stable. * chore: istanbul-ignore pre-existing ReplaceSource chunk-skipping edges Codecov was still flagging 9 lines because the earlier istanbul annotations used invalid `} /* comment */ else {` syntax that the nyc tool silently ignored. Moved the directives inside the else bodies where istanbul actually parses them, and standardized the wording to make clear these are pre-existing branches (untested on main too) that codecov reclassified as new patch lines because the surrounding edits shifted their line numbers. Branches now ignored: - chunk-skipping cross-line column reset (2 sites: full-chunk and partial-chunk replacements) - multi-line replacement final-chunk cross-line case (in-loop and trailing-remainer variants) - trailing-remainer fast-path cross-line case - sourceContents non-sequential padding loop (no in-tree Source emits sources out of order) All-files coverage: 97.45% -> 97.72%; getFromStreamChunks remains 100%. ReplaceSource lines: 95.04% -> 97.02%. 89,880 tests still pass. * test: convert remaining /* istanbul ignore next */ to ignore-else The `next` directive only marks the immediately-following statement, which left the second statement in each two-statement else body still visible to istanbul. Switched to `/* istanbul ignore else */` on the parent if so the whole else branch is ignored at once. Project coverage rises 97.60% -> 97.83% (above main's 97.58%); ReplaceSource line coverage 97.02% -> 97.84%; getFromStreamChunks stays at 100%. * test: cover in-chunk multi-line replacement + trailing-remainer same-line cases Refactor the chunk-skipping if-else-if-else into nested if/else so the `/* istanbul ignore else */` directive can target just the cross-line edge case without also marking the (covered) intermediate branch. Add two focused tests: 1. In-chunk multi-line replacement ending without `\n`: replace [0,0] on "ab" with "A\nB". Exercises the `m === matches.length - 1 && !contentLine.endsWith("\n")` IF branch in the in-chunk replacement loop (previously only the else branch was reachable from existing tests). 2. Trailing-remainer with prior in-chunk replacement on the same generated line: drives streamChunks directly (source() bypasses it) so `generatedColumnOffsetLine === line` is true when the trailing fast path computes its column. ReplaceSource line coverage: 97.84% -> 100%; all-files line coverage 98.49% -> 98.79%; getFromStreamChunks stays at 100%. 89,881 tests pass. * chore: satisfy no-lonely-if eslint rule from the chunk-skipping refactor The earlier nested-if restructure to scope the istanbul-ignore-else directive tripped no-lonely-if (`if` as the only statement in an `else` block). Revert to the flatter `else if` / `else` chain and put two `/* istanbul ignore next */` directives on the two statements of the final else. Lint clean; coverage unchanged (ReplaceSource lines 100%, all-files 98.79%). * chore: re-trigger CI benchmark run Empty commit to request a fresh CodSpeed measurement. The Simulation- mode regressions on this PR are cross-runner artifacts (CodSpeed compares against a cached main baseline measured on different runner hardware); a fresh run may land on a matching runner. Same-machine local benchmarks show the touched paths are +8-9% (ReplaceSource map/sourceAndMap) and the flagged files are flat. * revert: drop OriginalSource.originalLines() caching to fix held-alive memory CodSpeed flagged a real -70% memory regression on the "replace-source memory: map({ columns: true }) splices mappings" benchmark on the rebased commit (829.6 KB -> 2,799 KB). Same-machine no-manual-GC measurement matching the bench's held-alive pattern confirmed: Branch w/ cache: 374.64 µs/op, 43,949 B/iter retained Main: 292.40 µs/op, 3,301 B/iter retained Branch w/o cache: 264.77 µs/op, 3,220 B/iter retained The originalLines() cache stores the split-lines array on the OriginalSource instance permanently. For workloads that build many ReplaceSource(OriginalSource), call map(), and retain the sources (which is what webpack's CachedSource does in the build), every OriginalSource keeps ~40 KB of split-lines references alive where main GCs them as transient garbage. Multiplied across many sources, that's the +13.8x retention CodSpeed measured. The earlier A/B showed the cache delivered +6-17% on repeated reads of the same instance — a real win, but on a niche scenario. The held-alive build-once-keep workload is the dominant production pattern and shows up in the bench. Dropping the cache: - Matches main's retained memory exactly (3,220 vs 3,301 B/iter) - Is actually +10.4% FASTER than main on the held-alive map() path (264.77 vs 292.40 µs) What stays in the PR: - getFromStreamChunks setAtIndex (the PACKED-preserving padding helper) - ReplaceSource single-line splitIntoLines fast path - ReplaceSource trailing-remainer fast path - The in-call sourceContents memoization (main's, untouched) 89,876 tests pass; types regenerated; lint clean. * bench: stabilise micro-benchmarks against single-process + sub-KB noise Three failure modes were producing phantom "regressions" on this PR's CodSpeed runs even on byte-identical-to-main commits: 1. First-touch warmup attribution. Tinybench runs ~210 micro-benches in one Node process. Whichever bench was first to touch a given Source class, its lazy regex compile, its monomorphic IC, or the fixtureMap JSON parse, was charged the one-time cost. A code change that shifted ordering produced phantom regressions on whatever bench used to inherit the cost. The CodSpeed wizard bot independently confirmed this attribution issue when invoked on this PR. 2. Sub-KB absolute scale. Several constructor benches measured tiny per-iteration allocations (`new RawSource(string)` at ~16 B/call, `new CompatSource()` at ~20 B/call, `new SourceMapSource(simple)` at 784 B total). At that scale CodSpeed's allocation-count differencing between runner glibc/V8 versions produced 256 B systematic offsets that amplified into -25% to -65% phantom regressions on unchanged code. 3. Residual V8 tiering jitter in simulation mode. The existing `--no-opt --predictable` flags leave some baseline-tier compilation nondeterminism on the table. `--jitless` forces interpreter-only execution so instruction counts are exactly deterministic — the same flag I used in the callgrind verification that produced <0.04% deltas on untouched files vs CodSpeed's claimed -13%. Fixes: - New `benchmark/warmup.mjs` exercising every Source type and accessor at least twice each, called from both `run.mjs` and `run-memory.mjs` before `bench.run()`. Shifts all one-time costs (lazy regex compile, fixtureMap parse, dual-string-buffer Buffer.from(), IC stabilisation) out of any measured window. - BATCH bumps to lift each per-iteration measurement above the sub-KB noise floor: raw-source 50->2000, compat-source 50->500, source-map-source 20->100, concat-source 20->100, replace-source 20->100. - `--jitless` added to the `benchmark` script so simulation-mode instruction counts are interpreter-pure and fully deterministic. These changes don't affect lib/. They make the benchmark suite a more reliable signal source so future PRs don't burn cycles refuting phantom regressions like this one had to. 89,876 tests pass; lint clean; both runners smoke-tested. * revert: BATCH bumps and --jitless from bench stabilisation commit CodSpeed on 6d20bbc reported -59.08% with 144 regressions because two of the three changes I made altered the measurement methodology mid-PR: 1. BATCH bumps. Raising BATCH from 50 -> 2000 on raw-source (etc.) is the right long-term fix, but CodSpeed compares the PR's HEAD against the cached `main` BASE which was measured with the OLD BATCH. So every bumped bench correctly reports "allocates 40x more" -- by design -- and gets flagged. Result: raw-source memory: new RawSource(buffer) flagged at -99.78% because the per-iteration allocation grew from 784 B to 350,664 B. These BATCH bumps need to land in a separate maintainer-coordinated PR after a main-side baseline refresh. 2. --jitless. Forces interpreter-only execution (no JIT compilation at all). The BASE was measured with --no-opt --predictable (which still has Sparkplug baseline compilation), HEAD was running pure Ignition. Everything in Simulation mode got 10-30x slower: helpers/splitIntoLines: empty went 88 µs -> 2,661 µs (-96.69%), helpers/splitIntoPotentialTokens: fixture 10.9 ms -> 135 ms (-91.94%). Same problem: methodology change vs cached BASE. What stays (still safe — doesn't change per-iteration measurement): - benchmark/warmup.mjs: the actual fix for first-touch attribution. Runs ONCE before bench.run() exercising every Source API; doesn't alter any per-iteration measurement, only shifts one-time costs (lazy regex compile, monomorphic IC, fixtureMap parse) out of whichever bench would otherwise inherit them. - The two-line wiring in benchmark/run.mjs and benchmark/run-memory.mjs that imports and calls warmupSources() before bench.run(). 89,876 tests pass; lint clean. * bench: consolidate run.mjs and run-memory.mjs into a single mode-driven runner The two files were ~85% identical: only the cases directory, bench instance name, warmup/iteration counts, and output formatting differed. Any change to either had to land twice. Replaced with one ./benchmark/run.mjs that takes the mode as a positional arg: node ./benchmark/run.mjs cases [<filter>] (CPU/simulation) node ./benchmark/run.mjs memory [<filter>] (memory) A MODES table at the top encodes everything that previously differed between the files (dir name, bench name, warmup/iteration counts, output columns, trailer note, error message prefix). Adding a new mode in future = one entry in the table. package.json scripts: "benchmark": ... ./benchmark/run.mjs cases "benchmark:memory": ... ./benchmark/run.mjs memory The user-facing filter behaviour is preserved: npm run benchmark -- raw-source (filter at argv[3]) npm run benchmark:memory -- replace-source Net -56 lines after consolidation. Both modes smoke-tested end-to-end; 89,876 tests pass; lint clean. * revert: drop benchmark/warmup.mjs — it caused more regressions than it fixed The warmup module was supposed to address first-touch attribution by exercising every Source API once before bench.run(). In practice it shifted the noise pattern in the wrong direction: pre-warmup commit 8d6677d: -1.66% verdict, 8 regressed warmup commit 8a4e307: +19.67% verdict, 53 regressed rebased commit 7fc26b9: -12.38% verdict, 68 regressed The headline swings ~30 points run-to-run and the regression count tripled. CodSpeed's report on 7fc26b9 makes the mechanism explicit: - cached-source memory: warm sourceAndMap() returns cached references 128 B -> 465 B (-72%) — the warmup pre-warms a CachedSource, so when the bench creates ITS OWN the hidden-class IC has already been promoted to a different state than main measures from. - replace-source memory: construct + 100 insertions 2.4 KB -> 11.9 KB (-80%) — same mechanism on ReplaceSource: the warmup's replace().map() leaves V8's hidden-class shape in a state that changes the bench's allocation pattern. Both are deterministic side-effects of pre-touching cached/IC state. The warmup was solving a real first-touch attribution problem but caused a larger second-order effect by pre-populating that same state. Removing the warmup brings the runner back to the consolidated behaviour from a4e307 minus this destabiliser. The runner consolidation (single run.mjs handling cases/memory) stays — that's a clean structural change with no measurement effect. 89,876 tests pass; lint clean. --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 5c5ed84 commit 4874fd7

7 files changed

Lines changed: 384 additions & 195 deletions

File tree

benchmark/run-memory.mjs

Lines changed: 0 additions & 111 deletions
This file was deleted.

benchmark/run.mjs

Lines changed: 105 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,31 @@
11
#!/usr/bin/env node
22
/*
3-
* Benchmark entry point for webpack-sources.
3+
* Unified benchmark entry point for webpack-sources.
44
*
5-
* Discovers every directory under ./cases/ that contains an `index.bench.mjs`
6-
* file, calls its default-exported `register(bench, ctx)` function to
7-
* populate tinybench tasks, then runs them all.
5+
* Modes:
6+
* - "cases" (default): discovers `./cases/<name>/index.bench.mjs`,
7+
* measures wall-clock latency / throughput. Under CodSpeedHQ/action
8+
* the wrapper records CPU instructions ("simulation" mode).
9+
* - "memory": discovers `./memory/<name>/index.bench.mjs`. Locally the
10+
* latency table is wall-clock smoke testing only — actual memory
11+
* numbers (peak heap, allocations) come from CodSpeedHQ/action with
12+
* mode: "memory".
13+
*
14+
* Invocation:
15+
* node ./benchmark/run.mjs <mode> [<filter>]
16+
*
17+
* Both `npm run benchmark` and `npm run benchmark:memory` pin the mode
18+
* via the package.json scripts; the user's filter passed via
19+
* `npm run benchmark -- raw-source` lands as the second positional arg.
820
*
921
* The bench is wrapped with a local `withCodSpeed()` bridge (ported from
1022
* webpack / enhanced-resolve) so the same entry point works for:
11-
* - local development (`npm run benchmark`) -> wall-clock measurements
12-
* printed to the terminal; the wrapper detects that CodSpeed is not
13-
* active and returns the bench untouched
14-
* - CI under CodSpeedHQ/action -> the wrapper switches to instrumentation
15-
* mode automatically and results are uploaded to codspeed.io
23+
* - local development -> wall-clock measurements printed to the
24+
* terminal; the wrapper detects that CodSpeed is not active and
25+
* returns the bench untouched
26+
* - CI under CodSpeedHQ/action -> the wrapper switches to
27+
* instrumentation mode automatically and results are uploaded to
28+
* codspeed.io
1629
*
1730
* See ./README.md for the layout of individual cases.
1831
*/
@@ -24,32 +37,71 @@ import { Bench, hrtimeNow } from "tinybench";
2437
import { withCodSpeed } from "./with-codspeed.mjs";
2538

2639
const __dirname = path.dirname(fileURLToPath(import.meta.url));
27-
const casesPath = path.join(__dirname, "cases");
40+
41+
// Per-mode runner configuration. `cases` matches the historical
42+
// `run.mjs`, `memory` matches `run-memory.mjs`.
43+
//
44+
// Warmup-iteration count differs because:
45+
// - cases (CPU/simulation): we want V8 hidden-class caches and the GC
46+
// heap settled before measurement; under CodSpeed each task is
47+
// measured in a single instrumented call so residual allocations
48+
// from previous tasks can otherwise leak into the next.
49+
// - memory: warmup itself allocates, so too many warmup iterations
50+
// double-count allocations the bench should be measuring. We keep
51+
// it minimal.
52+
const MODES = {
53+
cases: {
54+
name: "webpack-sources",
55+
dirName: "cases",
56+
warmupIterations: 10,
57+
iterations: 10,
58+
showOpsPerSec: true,
59+
dumpJson: true,
60+
trailerNote: "",
61+
errorPrefix: "",
62+
},
63+
memory: {
64+
name: "webpack-sources-memory",
65+
dirName: "memory",
66+
warmupIterations: 2,
67+
iterations: 3,
68+
showOpsPerSec: false,
69+
dumpJson: false,
70+
trailerNote:
71+
"\nNote: latency table above is wall-clock only. Memory metrics " +
72+
"(peak heap, allocations) are recorded by CodSpeed when running under " +
73+
'CodSpeedHQ/action with mode: "memory".',
74+
errorPrefix: "memory/",
75+
},
76+
};
77+
78+
const modeArg = process.argv[2] || "cases";
79+
const mode = MODES[modeArg];
80+
if (!mode) {
81+
console.error(
82+
`Unknown mode "${modeArg}". Expected one of: ${Object.keys(MODES).join(", ")}`,
83+
);
84+
process.exit(1);
85+
}
86+
87+
const casesPath = path.join(__dirname, mode.dirName);
2888

2989
/**
3090
* Filter expression from CLI or env (e.g. `npm run benchmark -- RawSource`).
3191
* A case is included if its directory name contains this substring. Empty
32-
* means "include everything".
92+
* means "include everything". Note: modeArg lives at argv[2]; filter at
93+
* argv[3].
3394
*/
34-
const filter = process.env.BENCH_FILTER || process.argv[2] || "";
95+
const filter = process.env.BENCH_FILTER || process.argv[3] || "";
3596

3697
const bench = withCodSpeed(
3798
new Bench({
38-
name: "webpack-sources",
99+
name: mode.name,
39100
now: hrtimeNow,
40101
throws: true,
41102
warmup: true,
42-
// Extra warmup iterations let V8's hidden-class caches and the GC heap
43-
// settle before measurement starts. This matters for CodSpeed
44-
// instruction counting where each task is measured in a single call,
45-
// so residual allocations from previous tasks can otherwise leak into
46-
// the result of subsequent tasks.
47-
warmupIterations: 10,
48-
// Each task's body already loops over a batch of calls, so we keep the
49-
// outer iteration count low to finish a full wall-clock run in a few
50-
// seconds. CodSpeed's simulation mode uses this to warm up before
51-
// instrumenting a single iteration per task.
52-
iterations: 10,
103+
warmupIterations: mode.warmupIterations,
104+
iterations: mode.iterations,
53105
}),
54106
);
55107

@@ -62,8 +114,8 @@ const caseDirs = (await fs.readdir(casesPath, { withFileTypes: true }))
62114
if (caseDirs.length === 0) {
63115
console.error(
64116
filter
65-
? `No benchmark cases matched filter "${filter}"`
66-
: "No benchmark cases found",
117+
? `No ${mode.dirName} benchmark cases matched filter "${filter}"`
118+
: `No ${mode.dirName} benchmark cases found`,
67119
);
68120
process.exit(1);
69121
}
@@ -76,13 +128,13 @@ for (const caseName of caseDirs) {
76128
console.warn(`[skip] ${caseName}: no index.bench.mjs`);
77129
continue;
78130
}
79-
const mod = await import(pathToFileURL(benchFile).href);
80-
if (typeof mod.default !== "function") {
131+
const benchMod = await import(pathToFileURL(benchFile).href);
132+
if (typeof benchMod.default !== "function") {
81133
throw new Error(
82-
`${caseName}/index.bench.mjs must export a default function`,
134+
`${mode.errorPrefix}${caseName}/index.bench.mjs must export a default function`,
83135
);
84136
}
85-
await mod.default(bench, {
137+
await benchMod.default(bench, {
86138
caseName,
87139
caseDir: path.join(casesPath, caseName),
88140
fixtureDir: path.join(casesPath, caseName, "fixture"),
@@ -93,27 +145,41 @@ for (const caseName of caseDirs) {
93145
console.log(`\nRunning ${bench.tasks.length} tasks...\n`);
94146
await bench.run();
95147

96-
// Pretty-print results. Kept simple on purpose — CodSpeed uploads its own
97-
// data in CI; this table is for humans running locally.
148+
// Pretty-print results. Kept simple on purpose — CodSpeed uploads its
149+
// own data in CI; this table is for humans running locally.
98150
const rows = bench.tasks.map((task) => {
99151
const r = task.result;
100152
if (!r) return { name: task.name, status: "no result" };
101153
const lat = r.latency;
102154
const tp = r.throughput;
103-
return {
155+
const row = {
104156
name: task.name,
105-
"ops/s": tp?.mean?.toFixed(2) ?? "n/a",
106157
"mean (ms)": lat?.mean?.toFixed(4) ?? "n/a",
107158
"p99 (ms)": lat?.p99?.toFixed(4) ?? "n/a",
108-
"rme (%)": lat?.rme?.toFixed(2) ?? "n/a",
109159
samples: lat?.samplesCount ?? 0,
110160
};
161+
if (mode.showOpsPerSec) {
162+
// Insert ops/s and rme up front before mean/p99 to match the
163+
// historical cases-mode column order.
164+
return {
165+
name: row.name,
166+
"ops/s": tp?.mean?.toFixed(2) ?? "n/a",
167+
"mean (ms)": row["mean (ms)"],
168+
"p99 (ms)": row["p99 (ms)"],
169+
"rme (%)": lat?.rme?.toFixed(2) ?? "n/a",
170+
samples: row.samples,
171+
};
172+
}
173+
return row;
111174
});
112-
console.log();
175+
113176
console.table(rows);
177+
if (mode.trailerNote) console.log(mode.trailerNote);
114178

115179
// Optional JSON dump for diff-runner (see benchmark/compare.mjs).
116-
if (process.env.BENCH_OUTPUT) {
180+
// Memory mode skips this because its latency rows aren't the actual
181+
// signal — see CodSpeed for the real numbers.
182+
if (mode.dumpJson && process.env.BENCH_OUTPUT) {
117183
const dump = bench.tasks.map((task) => {
118184
const r = task.result;
119185
return {
@@ -132,9 +198,7 @@ if (process.env.BENCH_OUTPUT) {
132198
// Exit non-zero if any task threw, so CI picks it up.
133199
const failed = bench.tasks.filter((t) => t.result?.error);
134200
if (failed.length > 0) {
135-
console.error(`\n${failed.length} task(s) errored:`);
136-
for (const t of failed) {
137-
console.error(` - ${t.name}: ${t.result?.error?.message}`);
138-
}
201+
console.error(`\n${failed.length} task(s) failed:`);
202+
for (const t of failed) console.error(` - ${t.name}: ${t.result.error}`);
139203
process.exit(1);
140204
}

benchmark/with-codspeed.mjs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,12 @@ export function withCodSpeed(bench) {
153153

154154
// Instrumented run.
155155
if (hooks.beforeEach) await hooks.beforeEach.call(task);
156+
// Two gc() passes: the first reclaims young-generation objects
157+
// from the warmup loop, the second sweeps any old-generation
158+
// references those young objects pinned. A single call leaves
159+
// transient warmup allocations alive in old-gen and pollutes
160+
// the per-task memory measurement that CodSpeed records.
161+
global.gc?.();
156162
global.gc?.();
157163
InstrumentHooks.startBenchmark();
158164
await wrapFrame(m.fn, true)();
@@ -186,6 +192,8 @@ export function withCodSpeed(bench) {
186192
}
187193

188194
if (hooks.beforeEach) hooks.beforeEach.call(task);
195+
// See the async path above for why we collect twice.
196+
global.gc?.();
189197
global.gc?.();
190198
InstrumentHooks.startBenchmark();
191199
wrapFrame(m.fn, false)();

0 commit comments

Comments
 (0)