System Info
System:
OS: macOS 26.4.1
CPU: (16) arm64 Apple M4 Max
Memory: 147.38 MB / 64.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 24.12.0 - /Users/ebidelman/.nvm/versions/node/v24.12.0/bin/node
Yarn: 4.13.0 - /Users/ebidelman/.nvm/versions/node/v24.12.0/bin/yarn
npm: 11.6.2 - /Users/ebidelman/.nvm/versions/node/v24.12.0/bin/npm
bun: 1.3.13 - /opt/homebrew/bin/bun
Watchman: 2026.05.04.00 - /opt/homebrew/bin/watchman
Browsers:
Chrome: 147.0.7727.138
Safari: 26.4
npmPackages:
@rspack/cli: 2.0.2 => 2.0.2
@rspack/core: 2.0.2 => 2.0.2
@rspack/dev-server: 2.0.1 => 2.0.1
@rspack/plugin-react-refresh: 2.0.0 => 2.0.0
Details
Actual behavior
On a large project with ~3,900 .less files going through less-loader (parallel: true), cold builds regressed from ~49s to ~110s (2.3x) when upgrading from rspack 1.7.11 to 2.0.2. The regression is entirely in the make phase (module resolution + loaders). The seal and emit phases are unchanged or slightly improved.
Expected Behavior
rspack 2.0 cold builds should be at least as fast as 1.7.11 for the same project and configuration. The make phase should not regress by 2.8x.
Profiling Data
The regression reproduces consistently across 5+ runs with cleared cache (cache: false or fresh persistent cache).
Build times (5 runs each, cold cache)
| Run |
rspack 1.7.11 |
rspack 2.0.2 |
| 1 |
50.92s |
112.92s |
| 2 |
48.09s |
108.88s |
| 3 |
48.43s |
109.67s |
| 4 |
47.62s |
114.65s |
| 5 |
48.54s |
107.71s |
| Avg |
48.7s |
110.8s |
Phase breakdown
| Phase |
rspack 1.7.11 |
rspack 2.0.2 |
Δ |
| make |
32.1s |
89.6s |
+57.5s (2.8x) |
| seal |
15.3s |
12.7s |
-2.6s (improved) |
| emit |
3.6s |
3.5s |
unchanged |
The make phase accounts for the entire regression.
less-loader wall clock (first file start → last file end)
| Metric |
rspack 1.7.11 |
rspack 2.0.2 |
| Wall clock |
31.3s |
90.3s |
| % of build |
64% |
82% |
Root Cause Analysis
We exhaustively tested every possible userland optimization to isolate whether the regression is in the loader, the resolver, the parallelism strategy, or rspack's core:
| Experiment |
Build time |
Conclusion |
Baseline (parallel: true, 15 workers) |
~110s |
— |
parallel: false (serial) |
~152s |
Parallelism still helps, but less |
parallel: { maxWorkers: 4 } |
~118s |
Worker count doesn't matter |
parallel: { maxWorkers: 8 } |
~116s |
Worker count doesn't matter |
webpackImporter: false (skip async resolver) |
~118s |
Resolver isn't the bottleneck |
| Pre-resolve cache plugin (eliminate resolver round-trips) |
~126s |
Resolution isn't the bottleneck |
useAtomics: true (SharedArrayBuffer for tinypool) |
deadlock |
Broken |
| less-loader 12.3.2 upgrade |
~110s |
No change |
Custom 0ms loader (pre-compiled CSS, parallel: false) |
~120s |
Loader speed doesn't matter |
incremental: { buildModuleGraph: false } |
~120s |
Disabling incremental graph doesn't help |
incremental: false |
~108s |
-2s (noise) |
cache: false |
~107s |
-3s (noise) |
The 0ms loader test (decisive)
We wrote a plugin that:
- Pre-compiles all 4,701
.less files in beforeCompile using worker_threads (completes in 6.5s)
- Stores results in an in-memory
Map
- Replaces
less-loader with a custom loader that does only callback(null, map.get(this.resourcePath)) — a synchronous Map lookup taking ~0ms per file
- Runs with
parallel: false (no tinypool dispatch at all)
Result: still ~120s. This proves the bottleneck is not the loader execution, not the tinypool worker dispatch, and not the import resolution. It's in rspack's Rust-side per-module graph processing — the overhead of resolving, creating, scheduling, and inserting each of 3,900 modules through the build pipeline is ~3x slower in 2.0 vs 1.7.
What Changed Between 1.7 and 2.0
Based on the profiling data, the regression appears to be in the module-building pipeline itself:
- Module resolution internals
- Module graph orchestration (how modules are scheduled for building)
- Per-module hook dispatch overhead
- Possibly: expanded tree-shaking analysis per module, refactored parser hooks
The regression is proportional to module count — a project with fewer .less files would see a smaller absolute regression but the same ~2.3x multiplier.
Additional Context
incremental: 'advance' is enabled but does not help on cold builds (expected)
cache: { type: 'persistent' } is enabled; the regression is measured with fresh cache
- Disabling both (
incremental: false, cache: false) saves ~8s each but does not explain the 60s regression
- The
experiments.parallelLoader flag (removed in 2.0) was replaced by per-loader parallel: true, which we confirmed is working correctly (15 workers, max 165 concurrent compilations observed)
less-loader version: 12.3.2 (also tested 12.2.0, no difference)
Reproduce link
No response
Reproduce Steps
Minimal config shape:
// rspack.config.js
module.exports = {
experiments: { css: false },
module: {
rules: [
{
test: /\.less$/,
type: "javascript/auto",
use: [
CssExtractRspackPlugin.loader,
"css-loader",
"lightningcss-loader",
{ loader: "less-loader", parallel: true },
],
},
],
},
};
System Info
Details
Actual behavior
On a large project with ~3,900
.lessfiles going throughless-loader(parallel: true), cold builds regressed from ~49s to ~110s (2.3x) when upgrading from rspack 1.7.11 to 2.0.2. The regression is entirely in the make phase (module resolution + loaders). The seal and emit phases are unchanged or slightly improved.Expected Behavior
rspack 2.0 cold builds should be at least as fast as 1.7.11 for the same project and configuration. The make phase should not regress by 2.8x.
Profiling Data
The regression reproduces consistently across 5+ runs with cleared cache (
cache: falseor freshpersistentcache).Build times (5 runs each, cold cache)
Phase breakdown
The make phase accounts for the entire regression.
less-loader wall clock (first file start → last file end)
Root Cause Analysis
We exhaustively tested every possible userland optimization to isolate whether the regression is in the loader, the resolver, the parallelism strategy, or rspack's core:
parallel: true, 15 workers)parallel: false(serial)parallel: { maxWorkers: 4 }parallel: { maxWorkers: 8 }webpackImporter: false(skip async resolver)useAtomics: true(SharedArrayBuffer for tinypool)parallel: false)incremental: { buildModuleGraph: false }incremental: falsecache: falseThe 0ms loader test (decisive)
We wrote a plugin that:
.lessfiles inbeforeCompileusingworker_threads(completes in 6.5s)Mapless-loaderwith a custom loader that does onlycallback(null, map.get(this.resourcePath))— a synchronous Map lookup taking ~0ms per fileparallel: false(no tinypool dispatch at all)Result: still ~120s. This proves the bottleneck is not the loader execution, not the tinypool worker dispatch, and not the import resolution. It's in rspack's Rust-side per-module graph processing — the overhead of resolving, creating, scheduling, and inserting each of 3,900 modules through the build pipeline is ~3x slower in 2.0 vs 1.7.
What Changed Between 1.7 and 2.0
Based on the profiling data, the regression appears to be in the module-building pipeline itself:
The regression is proportional to module count — a project with fewer
.lessfiles would see a smaller absolute regression but the same ~2.3x multiplier.Additional Context
incremental: 'advance'is enabled but does not help on cold builds (expected)cache: { type: 'persistent' }is enabled; the regression is measured with fresh cacheincremental: false,cache: false) saves ~8s each but does not explain the 60s regressionexperiments.parallelLoaderflag (removed in 2.0) was replaced by per-loaderparallel: true, which we confirmed is working correctly (15 workers, max 165 concurrent compilations observed)less-loaderversion: 12.3.2 (also tested 12.2.0, no difference)Reproduce link
No response
Reproduce Steps
Minimal config shape: