Fix/tiling stack scoping and tiling information corruption#177
Merged
runwangdl merged 3 commits intoMar 26, 2026
Conversation
…e and after L3 code for each layer to reduce stack usage - Previosuly the tiling information was corrupted after each run, because the generated code put the next element in the tiling information array to current location. So after one runnetwork run, we will see that the last element in the tiling array goes to the first location, corrupting the tiling information. The fix in tilingvariablereplacement fix this by pointer the reference to the new location, instead of assigning value into the reference. - The C code generated by Deeploy has been causing big stack usage. This is because all variables defined in Runnetwork lives in stack, including all the tiling pointers and call arguements. By adding bracket before and after each layer in RunNetwork, this makes the call args only live for one layer, thus significantly reduce the stack usage. The tiling pointers still live in stack, they need to be moved as well. But this require more changes
Victor-Jung
approved these changes
Mar 23, 2026
Victor-Jung
left a comment
Member
There was a problem hiding this comment.
I like it, can you quickly check that it runs well with the --profileUntiled flag on please?
runwangdl
added a commit
to runwangdl/Deeploy
that referenced
this pull request
Apr 10, 2026
…nceCode Upstream PR pulp-platform#177 (13113de, "Fix/tiling stack scoping and tiling information corruption", Pu DENG) wraps each layer's emitted code in a C block: layerCode = reduce(lambda a, b: a + b, sections, "") callStack += "{\n" + layerCode + "\n}\n" so that per-layer call args become short-lived stack variables and RunNetwork's overall stack footprint goes down. cc1f68b silently reverted this hunk during a merge from devel — the training-platform branch was based on a pre-pulp-platform#177 snapshot and the conflict resolution went the wrong way. Restore the wrapping verbatim. Verified on Siracusa: simplemlp_train passes 0/4 (diff=0.000000 at every step) in both non-tiled and tiled runs.
runwangdl
pushed a commit
to runwangdl/Deeploy
that referenced
this pull request
Apr 29, 2026
…form#177) * [Deeploy PR] put the tiling information into layer code as well * [Deeploy PR] Fix the tiling information corruption. Add bracket before and after L3 code for each layer to reduce stack usage - Previosuly the tiling information was corrupted after each run, because the generated code put the next element in the tiling information array to current location. So after one runnetwork run, we will see that the last element in the tiling array goes to the first location, corrupting the tiling information. The fix in tilingvariablereplacement fix this by pointer the reference to the new location, instead of assigning value into the reference. - The C code generated by Deeploy has been causing big stack usage. This is because all variables defined in Runnetwork lives in stack, including all the tiling pointers and call arguements. By adding bracket before and after each layer in RunNetwork, this makes the call args only live for one layer, thus significantly reduce the stack usage. The tiling pointers still live in stack, they need to be moved as well. But this require more changes * Update CHANGELOG.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the intent of your PR here.
Added
Changed
Fixed
This pull request address two issues:
1) Big stack usage in C code generated by Deeploy
When the
RunNetworkfunction is called, all the variables defined in theRunNetworkfunction lives in stack. This include the tiling pointers such asuint8_t bu_DeeployNetwork_TILING_CODEGEN_L1_<layer name>_tileIdxPtrandDeeployNetwork__<layer name>_closure_L3_args. Originally, all tiling pointers and L3 clousure call arguements have scope acroos the entireRunNetworkfunction, which means tiling pointers and args for all layers live in the stack for the entireRunNetworkfunction. This causes extensive stack usage when entering theRunNetwork. This stack usage scale for number of layers, more layers you have, more tiling pointers and call args you have, higher the stack usage is. (5KB in the L1 for a 90 layer network)This fix is very simple. First, move the tiling pointers' declearation into layer code. Second, wrap each layer with a
{}. So each layer the generated RunNetwork looks like:In this way, the tiling pointers and call arg's scope are constrained for the respective layer and we don't need tiling pointers and call args for all layers live in stack.
In my network with 93 layers, wrapping the call args with
{}reduce the stack increment when enteringRunNetworkfrom 5KB to 0.7KB. Putting the tiling pointers into{}as well further reduce the increment to 0.6KB.2) The tiling information is corrupted after the first
RunnetworkrunOriginally, in the L1 tiling pointer code, the tiling pointer is updated by:
*DeeployNetwork_TILING_CODEGEN_L1_<layer name>_size_ref = DeeployNetwork_TILING_CODEGEN_L1_<name>_size[TILING_I];This assign the next value from the tiling information array
DeeployNetwork_TILING_CODEGEN_L1_<name>_size[TILING_I]to the referene. However, the refrence is defined pointer to the first element of the tiling information array. Thus, the next element in the tiling array is moved to the first element in the tiling information array, corrupting the tiling information.Easy fix, simply pointer the reference to the next element instead of assigning value fix everything.
DeeployNetwork_TILING_CODEGEN_L1_<layer name>_size_ref = &DeeployNetwork_TILING_CODEGEN_L1_<name>_size[TILING_I];PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.