Skip to content

[Bug]: 2.0 regression: ~2.3x slower cold builds for projects with many JS-loader modules #13996

@ebidel

Description

@ebidel

System Info

System:
  OS: macOS 26.4.1
  CPU: (16) arm64 Apple M4 Max
  Memory: 147.38 MB / 64.00 GB
  Shell: 5.9 - /bin/zsh
Binaries:
  Node: 24.12.0 - /Users/ebidelman/.nvm/versions/node/v24.12.0/bin/node
  Yarn: 4.13.0 - /Users/ebidelman/.nvm/versions/node/v24.12.0/bin/yarn
  npm: 11.6.2 - /Users/ebidelman/.nvm/versions/node/v24.12.0/bin/npm
  bun: 1.3.13 - /opt/homebrew/bin/bun
  Watchman: 2026.05.04.00 - /opt/homebrew/bin/watchman
Browsers:
  Chrome: 147.0.7727.138
  Safari: 26.4
npmPackages:
  @rspack/cli: 2.0.2 => 2.0.2
  @rspack/core: 2.0.2 => 2.0.2
  @rspack/dev-server: 2.0.1 => 2.0.1
  @rspack/plugin-react-refresh: 2.0.0 => 2.0.0

Details

Actual behavior

On a large project with ~3,900 .less files going through less-loader (parallel: true), cold builds regressed from ~49s to ~110s (2.3x) when upgrading from rspack 1.7.11 to 2.0.2. The regression is entirely in the make phase (module resolution + loaders). The seal and emit phases are unchanged or slightly improved.

Expected Behavior

rspack 2.0 cold builds should be at least as fast as 1.7.11 for the same project and configuration. The make phase should not regress by 2.8x.

Profiling Data

The regression reproduces consistently across 5+ runs with cleared cache (cache: false or fresh persistent cache).

Build times (5 runs each, cold cache)

Run rspack 1.7.11 rspack 2.0.2
1 50.92s 112.92s
2 48.09s 108.88s
3 48.43s 109.67s
4 47.62s 114.65s
5 48.54s 107.71s
Avg 48.7s 110.8s

Phase breakdown

Phase rspack 1.7.11 rspack 2.0.2 Δ
make 32.1s 89.6s +57.5s (2.8x)
seal 15.3s 12.7s -2.6s (improved)
emit 3.6s 3.5s unchanged

The make phase accounts for the entire regression.

less-loader wall clock (first file start → last file end)

Metric rspack 1.7.11 rspack 2.0.2
Wall clock 31.3s 90.3s
% of build 64% 82%

Root Cause Analysis

We exhaustively tested every possible userland optimization to isolate whether the regression is in the loader, the resolver, the parallelism strategy, or rspack's core:

Experiment Build time Conclusion
Baseline (parallel: true, 15 workers) ~110s
parallel: false (serial) ~152s Parallelism still helps, but less
parallel: { maxWorkers: 4 } ~118s Worker count doesn't matter
parallel: { maxWorkers: 8 } ~116s Worker count doesn't matter
webpackImporter: false (skip async resolver) ~118s Resolver isn't the bottleneck
Pre-resolve cache plugin (eliminate resolver round-trips) ~126s Resolution isn't the bottleneck
useAtomics: true (SharedArrayBuffer for tinypool) deadlock Broken
less-loader 12.3.2 upgrade ~110s No change
Custom 0ms loader (pre-compiled CSS, parallel: false) ~120s Loader speed doesn't matter
incremental: { buildModuleGraph: false } ~120s Disabling incremental graph doesn't help
incremental: false ~108s -2s (noise)
cache: false ~107s -3s (noise)

The 0ms loader test (decisive)

We wrote a plugin that:

  1. Pre-compiles all 4,701 .less files in beforeCompile using worker_threads (completes in 6.5s)
  2. Stores results in an in-memory Map
  3. Replaces less-loader with a custom loader that does only callback(null, map.get(this.resourcePath)) — a synchronous Map lookup taking ~0ms per file
  4. Runs with parallel: false (no tinypool dispatch at all)

Result: still ~120s. This proves the bottleneck is not the loader execution, not the tinypool worker dispatch, and not the import resolution. It's in rspack's Rust-side per-module graph processing — the overhead of resolving, creating, scheduling, and inserting each of 3,900 modules through the build pipeline is ~3x slower in 2.0 vs 1.7.

What Changed Between 1.7 and 2.0

Based on the profiling data, the regression appears to be in the module-building pipeline itself:

  • Module resolution internals
  • Module graph orchestration (how modules are scheduled for building)
  • Per-module hook dispatch overhead
  • Possibly: expanded tree-shaking analysis per module, refactored parser hooks

The regression is proportional to module count — a project with fewer .less files would see a smaller absolute regression but the same ~2.3x multiplier.

Additional Context

  • incremental: 'advance' is enabled but does not help on cold builds (expected)
  • cache: { type: 'persistent' } is enabled; the regression is measured with fresh cache
  • Disabling both (incremental: false, cache: false) saves ~8s each but does not explain the 60s regression
  • The experiments.parallelLoader flag (removed in 2.0) was replaced by per-loader parallel: true, which we confirmed is working correctly (15 workers, max 165 concurrent compilations observed)
  • less-loader version: 12.3.2 (also tested 12.2.0, no difference)

Reproduce link

No response

Reproduce Steps

Minimal config shape:

// rspack.config.js
module.exports = {
  experiments: { css: false },
  module: {
    rules: [
      {
        test: /\.less$/,
        type: "javascript/auto",
        use: [
          CssExtractRspackPlugin.loader,
          "css-loader",
          "lightningcss-loader",
          { loader: "less-loader", parallel: true },
        ],
      },
    ],
  },
};

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions