[wasm-split] Do multi-split at once#7956
Merged
Merged
Conversation
This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once. This reduces the running time on a reproducer provided by @biggs0125 before (to fix WebAssembly#7725) from 236s to 88s, reducing it by around 63%. Some side-products of this PR are: - Now we only create a single table to host placeholders (or `ref.null`s in case of `--no-placeholders`) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables. - The names of trampoline functions have been changed in the tests, but semantically they are the same. (e.g. in `test/lit/wasm-split/multi-split.wast`) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhacing readability. --- Some detailed analysis run using the reproducer of WebAssembly#7725, a case where we split a module into 301 (1 primary + 300 secondary) modules: - Before this PR: Time: 236.8s Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 62661.1860 28.24% classifyFunctions 42366.7451 19.09% removeUnusedSecondaryElements 33083.6602 14.91% indirectReferencesToSecondaryFunctions 27852.3143 12.55% indirectCallsToSecondaryFunctions 25091.4263 11.31% moveSecondaryFunctions 14159.9166 6.38% writeModule_secondary 9331.1667 4.20% setupTablePatching 3099.9597 1.40% initExportedPrimaryFuncs 1657.0465 0.75% writeModule_primary 901.6800 0.41% exportImportCalledPrimaryFunctions 892.0132 0.40% thunkExportedSecondaryFunctions 826.8599 0.37% initSecondary 0.2241 0.00% --------------------------------------------------------------------------- Overall Total 221924.1985 100.00% ``` - After this PR: Time : 88.40207334437098 Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 40176.7000 50.38% removeUnusedSecondaryElements 28635.2000 35.91% moveSecondaryFunctions 5998.9600 7.52% writeModule_secondary 2611.0099 3.27% writeModule_primary 935.7750 1.17% exportImportCalledPrimaryFunctions 646.9860 0.81% indirectReferencesToSecondaryFunctions 318.2980 0.40% classifyFunctions 238.5780 0.30% indirectCallsToSecondaryFunctions 139.1730 0.17% setupTablePatching 44.1466 0.06% thunkExportedSecondaryFunctions 3.9405 0.00% initExportedPrimaryFuncs 0.6870 0.00% --------------------------------------------------------------------------- Overall Total 79749.4539 100.00% ``` We can see time taken in `classifyFunctions`, `indirectReferencesToSecondaryFunctions`, and `indirectCallsToSecondaryFunctions` has reduced basically to nothing. This is because now we can all functions only once in those functions, where we used to scan the functions n times or similar. Now `shareImportableItems` and `moveSecondaryFunctions` take up around 85% of the execution time. The reason `shareImportableItems` takes so long is the reproducer has 90k globals. ``` Analysis of shareImportableItems: Sub-Task Total Time (ms) Percentage --------------------------------------------------------------------------- globals 41166.4904 98.35% tables 535.5134 1.28% tags 10.3355 0.02% memories 7.5937 0.02% exports 1.5482 0.00% --------------------------------------------------------------------------- Total 41857.1000 100.00% ``` ('exports' meaning processing existing exports) We can probably improve this by selectively importing module items, as already noted by the existing TODO. `moveSecondaryFunctions` basically just runs RemoveUnusedModuleElements on each module. We can also consider parallelizing `moveSecondaryFunctions` by modules but not sure how much improvements it can bring given that the pass is already parallized in function granularity. But if we export only used items in `shareImportableItems`, running this pass may become unnecessary after all.
2089525 to
9bfbbe6
Compare
Member
Author
|
"Hide whitespace" will make the diff easier to view. |
tlively
reviewed
Oct 8, 2025
Member
tlively
left a comment
There was a problem hiding this comment.
Great work! This is a really nice improvement.
- Remove old comments + fix comments - Rename a variable - Take primary module's symbolmap and placeholdermap writing out of the for loop
Co-authored-by: Thomas Lively <tlively123@gmail.com>
aheejin
commented
Oct 9, 2025
Member
Author
|
After addressing comments, the execution time went down from 88s to 80s, and the improvement compared to |
tlively
approved these changes
Oct 10, 2025
Co-authored-by: Thomas Lively <tlively123@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This does multi-splitting of modules at once, rather than splitting them one by one by doing 2-way split n times. Previously when we did multi-splitting we split the 1st module as the "secondary" module assuming all other functions belonging to 2nd-nth modules as "primary" module. And then we repeat the same task for the 2nd module, assuming 3rd-nth module functions belong to the "primary" module. This unnecessarily repeated some tasks that could have been done once.
This reduces the running time on a reproducer provided by @biggs0125 before (to fix #7725) from 232s to 80s on my machine, reducing it by around 66%.
Some side-products of this PR are:
ref.nulls in case of--no-placeholders) even when reference-types is enabled. Previously we created a table per secondary module, resulting in n tables.test/lit/wasm-split/multi-split.wast) The reason for the change is, previously we split modules one by one, by the time we split the first module, it assumed functions belonging to other secondary modules were primary functions, but they later changed to trampolines as well. Now they are all named as trampolines, arguably enhancing readability.Some detailed analysis run using the reproducer of #7725, a case where we split a module into 301 (1 primary + 300 secondary) modules:
Time: 232.9s
Task breakdown:
Time : 80.1s
Task breakdown:
We can see time taken in
classifyFunctions,indirectReferencesToSecondaryFunctions, andindirectCallsToSecondaryFunctionshas reduced basically to nothing. This is because now we can scan all functions only once in those functions, where we used to scan the functions n times or similar.Now
shareImportableItemsandmoveSecondaryFunctionstake up around 85% of the execution time. The reasonshareImportableItemstakes so long is the reproducer has 90k globals.('exports' meaning processing existing exports)
We can probably improve this by selectively importing module items, as already noted by the existing TODO.
moveSecondaryFunctionsbasically just runs RemoveUnusedModuleElements on each module. But if we export only used items inshareImportableItems, running this pass may become unnecessary after all.