Skip to content

perf: stream potential tokens in OriginalSource, avoid discarded slices#246

Open
alexander-akait wants to merge 4 commits into
mainfrom
claude/perf-comparison-release-main-gtxf3i
Open

perf: stream potential tokens in OriginalSource, avoid discarded slices#246
alexander-akait wants to merge 4 commits into
mainfrom
claude/perf-comparison-release-main-gtxf3i

Conversation

@alexander-akait

Copy link
Copy Markdown
Member

Summary

OriginalSource.streamChunks (and therefore map() / sourceAndMap()) built the full splitIntoPotentialTokens array of substrings and iterated it — even though getMap / getSourceAndMap run streamChunks with finalSource: true, where every chunk substring is dropped (chunk = finalSource ? undefined : match). On the dominant map() / sourceAndMap() paths the code was allocating the whole token array and every per-token slice, only to discard them.

This refactors splitIntoPotentialTokens into a streaming core, eachPotentialToken(str, onToken), that reports each token by [start, end) offset instead of materialising substrings. The array-returning splitIntoPotentialTokens becomes a thin wrapper over it (its unit test and benchmark are unchanged). OriginalSource consumes the streaming core and slices a chunk only when one is actually emitted — never on the final-source map() / sourceAndMap() paths.

This is the same class of fix as #240 (lookup-table token classification) and #226 (single-line ReplaceSource fast path), sitting right on top of the path #240 just optimized.

Measured impact

In-process interleaved A/B (both lib versions loaded in one process, alternated each round so shared-host CPU drift cancels in the ratio; CPU = min/median over 80 rounds, allocation = single-call gc() heap delta). This methodology was chosen because separate-process wall-clock has a ±17% noise floor on the measurement host.

Path CPU Allocation
OriginalSource.map() +15–18% −38…−46%
OriginalSource.sourceAndMap() +37–40% −38…−46% (−4.1 MB/call on a 20k-line source)
streamChunks() (chunks genuinely needed, non-final) +30% no intermediate array

Correctness

  • All 89,876 tests pass, including the randomized Fuzzy suite and 1373 snapshots — output is byte-identical.
  • lint (eslint + tsc + types generation) is clean.
  • A changeset is included (patch).

Notes

I also prototyped extending the same idea to the non-final streamChunksOfSourceMap variants, but measured it as perf-neutral (0.0% allocation, <2% CPU) and dropped it: V8 represents String.slice of long strings as a zero-copy SlicedString, so splitIntoLines's array was already cheap, and in the non-final path the emitted chunks are retained into the concatenated output. The gain here is specifically from not producing strings that get discarded (finalSource mode), which is unique to the OriginalSource map/sourceAndMap paths.

🤖 Generated with Claude Code


Generated by Claude Code

OriginalSource.streamChunks built the full splitIntoPotentialTokens array
of substrings and iterated it, even though map()/sourceAndMap() run with
finalSource:true and discard every chunk substring.

Refactor splitIntoPotentialTokens into a streaming core eachPotentialToken
that reports each token by [start,end) offset; the array-returning helper
becomes a thin wrapper (its unit test and benchmark are unchanged).
OriginalSource consumes the streaming core and slices only when a chunk is
actually emitted — never on the final-source map/sourceAndMap paths.

Measured (interleaved in-process A/B vs current main):
  OriginalSource.map()         ~+15-18% CPU, -38..-46% allocation
  OriginalSource.sourceAndMap() ~+37-40% CPU, -38..-46% allocation
  streamChunks (slices needed)  ~+30% CPU (no intermediate array)

All 89,876 tests (incl. Fuzzy + 1373 snapshots) pass; output is byte-identical.
@changeset-bot

changeset-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: ec58f39

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
webpack-sources Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 26, 2026

Copy link
Copy Markdown

CLA Signed
The committers listed above are authorized under a signed CLA.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.13%. Comparing base (0872bcd) to head (ec58f39).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #246      +/-   ##
==========================================
+ Coverage   98.11%   98.13%   +0.01%     
==========================================
  Files          25       25              
  Lines        2068     2089      +21     
  Branches      669      674       +5     
==========================================
+ Hits         2029     2050      +21     
  Misses         37       37              
  Partials        2        2              
Flag Coverage Δ
integration 98.13% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codspeed-hq

codspeed-hq Bot commented Jun 26, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 35 improved benchmarks
❌ 41 regressed benchmarks
✅ 135 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory replace-source memory: construct + 100 insertions 2.4 KB 11.8 KB -80.01%
Memory cached-source memory: warm sourceAndMap() returns cached references 96 B 256 B -62.5%
Memory clear-cache memory: shared modules (visited set — single allocation) 177.8 KB 453 KB -60.75%
Memory concat-source memory: source() concatenates children 115.2 KB 195.8 KB -41.17%
Memory source-map-source memory: new SourceMapSource(simple) 520 B 784 B -33.67%
Memory raw-source memory: new RawSource(string) 784 B 1,168 B -32.88%
Simulation helpers/stringBufferUtils: internString (enabled) 66.4 µs 91.9 µs -27.7%
Memory compat-source memory: delegated source() + map() through wrapper 22.4 KB 30.4 KB -26.23%
Memory source-map-source memory: map({ columns: true }) 784 B 1,040 B -24.62%
Simulation size-only-source: new SizeOnlySource() 127.1 µs 168.6 µs -24.61%
Simulation original-source: new OriginalSource(string) 86.2 µs 111.1 µs -22.4%
Simulation original-source: new OriginalSource(buffer) 86.5 µs 111.2 µs -22.23%
Simulation size-only-source: size() 166.9 µs 212.1 µs -21.33%
Simulation original-source: source() 94.1 µs 118.9 µs -20.89%
Simulation source-map-source: new (buffer map) 174.6 µs 219.4 µs -20.38%
Simulation source-map-source: new (object map) 174.9 µs 219.6 µs -20.34%
Simulation raw-source: new RawSource(buffer, true) 103.6 µs 129 µs -19.7%
Simulation raw-source: new RawSource(buffer) 103.8 µs 129 µs -19.56%
Simulation original-source: buffers() (from buffer) 101.8 µs 126.4 µs -19.49%
Simulation raw-source: new RawSource(string) 105.4 µs 130.9 µs -19.43%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/perf-comparison-release-main-gtxf3i (ec58f39) with main (0872bcd)

Open in CodSpeed

…sion)

The previous commit turned the array-returning splitIntoPotentialTokens
into a thin wrapper over eachPotentialToken, driving it with a per-token
callback. CodSpeed flagged the helper's own benchmark regressing ~11-15%
(instruction count): the callback indirection stops V8 inlining the
slice/push of the hot scan.

Restore the standalone direct loop for splitIntoPotentialTokens and keep
eachPotentialToken separate for OriginalSource. They share only the small
classification table, so there is no behavioural duplication. The
OriginalSource map()/sourceAndMap() win is unchanged (it uses
eachPotentialToken), and the helper benchmark is back to parity.
splitIntoPotentialTokens is now a standalone loop (production goes through
eachPotentialToken), so its phase-2 end-of-string break and phase-3 newline
branches were no longer exercised by the OriginalSource suite, dropping
coverage. Add round-trip and branch-targeted cases so the helper is fully
covered on its own.
@alexander-akait alexander-akait force-pushed the claude/perf-comparison-release-main-gtxf3i branch from 5bbef09 to 8d47d1f Compare June 26, 2026 15:31

Copy link
Copy Markdown
Member Author

/easycla


Generated by Claude Code

@alexander-akait alexander-akait force-pushed the claude/perf-comparison-release-main-gtxf3i branch from 8d47d1f to 62d5e9b Compare June 26, 2026 16:13
CodSpeed flags "Different runtime environments detected" and shows phantom
regressions across untouched benchmarks because `ubuntu-latest` migrates
between underlying images (22.04 -> 24.04), so the stored main BASE and the
PR HEAD can run on different system libraries.

Pin the OS image to ubuntu-24.04 for both benchmark jobs so base and head
share an identical runtime environment. Node is deliberately left at
`lts/*` rather than pinned: main and PRs resolve it to the same release on
a given day, whereas pinning a specific Node would itself create a
base/head mismatch until main is re-benchmarked under the pin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant