Skip to content

[wasm-split] Do multi-split at once#7956

Merged
aheejin merged 11 commits into
WebAssembly:mainfrom
aheejin:wasm_split_multi_once
Oct 10, 2025
Merged

[wasm-split] Do multi-split at once#7956
aheejin merged 11 commits into
WebAssembly:mainfrom
aheejin:wasm_split_multi_once

Conversation

@aheejin
Copy link
Copy Markdown
Member

@aheejin aheejin commented Oct 8, 2025

This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once.

This reduces the running time on a reproducer provided by @biggs0125 before (to fix #7725) from 232s to 80s on my machine, reducing it by around 66%.

Some side-products of this PR are:

  • Now we only create a single table to host placeholders (or ref.nulls in case of --no-placeholders) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables.
  • The names of trampoline functions have been changed in the tests, but semantically they are the same. (e.g. in test/lit/wasm-split/multi-split.wast) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhancing readability.

Some detailed analysis run using the reproducer of #7725, a case where we split a module into 301 (1 primary + 300 secondary) modules:

  • Before this PR:

Time: 232.9s

Task breakdown:

Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               62139.5770          29.26%
classifyFunctions                                  42377.2849          19.95%
removeUnusedSecondaryElements                      33615.1043          15.83%
indirectReferencesToSecondaryFunctions             28047.9460          13.21%
indirectCallsToSecondaryFunctions                  25184.7922          11.86%
moveSecondaryFunctions                             14473.2277           6.82%
setupTablePatching                                  3126.4863           1.47%
initExportedPrimaryFuncs                            1674.4963           0.79%
exportImportCalledPrimaryFunctions                   936.1570           0.44%
thunkExportedSecondaryFunctions                      789.3036           0.37%
---------------------------------------------------------------------------
Overall Total                                     212364.3753         100.00%
  • After this PR: (After addressing comments)

Time : 80.1s

Task breakdown:

Task                                          Total Time (ms)      Percentage
------------------------------------------------------------------------------
shareImportableItems                               35386.0000          48.46%
removeUnusedSecondaryElements                      27730.3000          37.98%
moveSecondaryFunctions                              4541.4800           6.22%
writeModule_secondary                               3176.7382           4.35%
writeModule_primary                                  997.8480           1.37%
exportImportCalledPrimaryFunctions                   649.7930           0.89%
indirectReferencesToSecondaryFunctions               269.4650           0.37%
indirectCallsToSecondaryFunctions                    136.9090           0.19%
classifyFunctions                                     88.8081           0.12%
setupTablePatching                                    40.6237           0.06%
thunkExportedSecondaryFunctions                        2.5536           0.00%
initExportedPrimaryFuncs                               0.5960           0.00%
------------------------------------------------------------------------------
Overall Total                                      73021.1146         100.00%

We can see time taken in classifyFunctions, indirectReferencesToSecondaryFunctions, and indirectCallsToSecondaryFunctions has reduced basically to nothing. This is because now we can scan all functions only once in those functions, where we used to scan the functions n times or similar.

Now shareImportableItems and moveSecondaryFunctions take up around 85% of the execution time. The reason shareImportableItems takes so long is the reproducer has 90k globals.

Analysis of shareImportableItems:
Sub-Task                                      Total Time (ms)      Percentage
-----------------------------------------------------------------------------
globals                                            41166.4904          98.35%
tables                                               535.5134           1.28%
tags                                                  10.3355           0.02%
memories                                               7.5937           0.02%
exports                                                1.5482           0.00%
-----------------------------------------------------------------------------
Total                                              41857.1000         100.00%

('exports' meaning processing existing exports)

We can probably improve this by selectively importing module items, as already noted by the existing TODO.

moveSecondaryFunctions basically just runs RemoveUnusedModuleElements on each module. But if we export only used items in shareImportableItems, running this pass may become unnecessary after all.

This does multi-splitting of modules at once, rather than splitting them
one by one by doing 2-way split n times. Previously when we did
multi-splitting we split the 1st module as the "secondary" module
assuming all other functions belonging to 2nd-nth modules as "primary"
module. And then we repeat the same task for the 2nd module, assuming
3rd-nth module functions belong to the "primary" module. This
unnecessarily repeated some tasks that could have been done once.

This reduces the running time on a reproducer provided by @biggs0125
before (to fix WebAssembly#7725) from 236s to 88s, reducing it by around 63%.

Some side-products of this PR are:
- Now we only create a single table to host placeholders (or `ref.null`s
  in case of `--no-placeholders`) even when reference-types is enabled.
  Previously we created a table per secondary module, resulting in n
  tables.
- The names of trampoline functions have been changed in the tests, but
  semantically they are the same. (e.g. in
  `test/lit/wasm-split/multi-split.wast`) The reason for the change is,
  previously we split modules one by one, by the time we split the first
  module, it assumed functions belonging to other secondary modules were
  primary functions, but they later changed to trampolines as well. Now
  they are all named as trampolines, arguably enhacing readability.

---

Some detailed analysis run using the reproducer of WebAssembly#7725, a case where
we split a module into 301 (1 primary + 300 secondary) modules:

- Before this PR:

Time: 236.8s

Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               62661.1860          28.24%
classifyFunctions                                  42366.7451          19.09%
removeUnusedSecondaryElements                      33083.6602          14.91%
indirectReferencesToSecondaryFunctions             27852.3143          12.55%
indirectCallsToSecondaryFunctions                  25091.4263          11.31%
moveSecondaryFunctions                             14159.9166           6.38%
writeModule_secondary                               9331.1667           4.20%
setupTablePatching                                  3099.9597           1.40%
initExportedPrimaryFuncs                            1657.0465           0.75%
writeModule_primary                                  901.6800           0.41%
exportImportCalledPrimaryFunctions                   892.0132           0.40%
thunkExportedSecondaryFunctions                      826.8599           0.37%
initSecondary                                          0.2241           0.00%
---------------------------------------------------------------------------
Overall Total                                     221924.1985         100.00%
```

- After this PR:

Time : 88.40207334437098

Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               40176.7000          50.38%
removeUnusedSecondaryElements                      28635.2000          35.91%
moveSecondaryFunctions                              5998.9600           7.52%
writeModule_secondary                               2611.0099           3.27%
writeModule_primary                                  935.7750           1.17%
exportImportCalledPrimaryFunctions                   646.9860           0.81%
indirectReferencesToSecondaryFunctions               318.2980           0.40%
classifyFunctions                                    238.5780           0.30%
indirectCallsToSecondaryFunctions                    139.1730           0.17%
setupTablePatching                                    44.1466           0.06%
thunkExportedSecondaryFunctions                        3.9405           0.00%
initExportedPrimaryFuncs                               0.6870           0.00%
---------------------------------------------------------------------------
Overall Total                                      79749.4539         100.00%
```

We can see time taken in `classifyFunctions`,
`indirectReferencesToSecondaryFunctions`, and
`indirectCallsToSecondaryFunctions` has reduced basically to nothing.
This is because now we can all functions only once in those functions,
where we used to scan the functions n times or similar.

Now `shareImportableItems` and `moveSecondaryFunctions` take up around
85% of the execution time. The reason `shareImportableItems` takes so
long is the reproducer has 90k globals.
```
Analysis of shareImportableItems:
Sub-Task                                      Total Time (ms)      Percentage
---------------------------------------------------------------------------
globals                                            41166.4904          98.35%
tables                                               535.5134           1.28%
tags                                                  10.3355           0.02%
memories                                               7.5937           0.02%
exports                                                1.5482           0.00%
---------------------------------------------------------------------------
Total                                              41857.1000         100.00%
```
('exports' meaning processing existing exports)

We can probably improve this by selectively importing module items, as
already noted by the existing TODO.

`moveSecondaryFunctions` basically just runs RemoveUnusedModuleElements
on each module. We can also consider parallelizing
`moveSecondaryFunctions` by modules but not sure how much improvements
it can bring given that the pass is already parallized in function
granularity. But if we export only used items in `shareImportableItems`,
running this pass may become unnecessary after all.
@aheejin aheejin requested a review from tlively October 8, 2025 01:06
@aheejin aheejin force-pushed the wasm_split_multi_once branch from 2089525 to 9bfbbe6 Compare October 8, 2025 01:08
@aheejin
Copy link
Copy Markdown
Member Author

aheejin commented Oct 8, 2025

"Hide whitespace" will make the diff easier to view.

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This is a really nice improvement.

Comment thread src/ir/module-splitting.h Outdated
Comment thread src/ir/module-splitting.h Outdated
Comment thread src/tools/wasm-split/wasm-split.cpp Outdated
Comment thread src/tools/wasm-split/wasm-split.cpp Outdated
Comment thread src/ir/module-splitting.cpp Outdated
Comment thread src/ir/module-splitting.cpp Outdated
Comment thread src/ir/module-splitting.cpp Outdated
Comment thread src/ir/module-splitting.cpp
Comment thread src/ir/module-splitting.cpp
Comment thread test/lit/wasm-split/multi-split-escape-names.wast
aheejin and others added 3 commits October 8, 2025 08:44
- Remove old comments + fix comments
- Rename a variable
- Take primary module's symbolmap and placeholdermap writing out of the
  for loop
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Comment thread src/ir/module-splitting.cpp
@aheejin
Copy link
Copy Markdown
Member Author

aheejin commented Oct 10, 2025

After addressing comments, the execution time went down from 88s to 80s, and the improvement compared to main is around 66%. Probably thanks to less map lookups 😀

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread src/tools/wasm-split/wasm-split.cpp Outdated
Comment thread src/ir/module-splitting.cpp
Comment thread src/ir/module-splitting.cpp
Comment thread src/ir/module-splitting.cpp
Comment thread test/lit/wasm-split/multi-split-escape-names.wast
Co-authored-by: Thomas Lively <tlively123@gmail.com>
@aheejin aheejin merged commit dd4dc47 into WebAssembly:main Oct 10, 2025
16 checks passed
@aheejin aheejin deleted the wasm_split_multi_once branch October 10, 2025 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[wasm-split] Not splitting correctly after wasm-opt

2 participants