Commit 950b1fd
authored
fix(tiling): MiniMalloc cost variable + SGD rank-2 DMA constraint (#18)
* fix(tiling): force SGD weight spatial dims to full size for rank-2 DMA
SGDTileConstraint now pins dim_i == shape[i] for i >= 2 (kernel_h, kernel_w).
minimizeRectangle can then collapse the trailing dims so every L2<->L3 DMA
tile is rank-2, avoiding the AnydimAsyncDmaTransferAdapter for-loop that
emitted 4 096 blocking pi_cl_ram_copy_2d calls per L2 tile for [128,128,3,3]
weights (~49x slowdown on ResNet8 optimizer with MiniMalloc).
* fix(tiling): force InPlaceAccumulatorV2 weight-grad spatial dims to full size
InPlaceAccumulatorV2TileConstraint now pins dim_i == shape[i] for i >= 2
(kH, kW) on the accum_buffer tensor. BOPTileConstraint already ties
gradient and data_out dims to accum_buffer, so one pin is enough.
Without this, MiniMalloc tiles [C_out, C_in, kH, kW] weight-gradient
tensors along all four dims. For ResNet8 layer3.conv2 [128,128,3,3]
this produced an explicit for-loop of 4096 iterations inside the L3 DMA
closure (pi_cl_ram_copy_2d(4 B) + pi_cl_ram_copy_wait per iteration),
resulting in ~73 k blocking L3 DMA calls per training step.
After the fix minimizeRectangle collapses kH×kW -> rank-2 tiles so each
L2->L3 transfer is a single contiguous pi_cl_ram_copy_2d (~41 KB).
Verified: 0 blocking DMA for-loops in ResNet8 TrainingNetwork.c.1 parent 5b4a394 commit 950b1fd
2 files changed
Lines changed: 28 additions & 0 deletions
File tree
- Deeploy/Targets/PULPOpen/TileConstraints
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
34 | 43 | | |
35 | 44 | | |
36 | 45 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
5 | 8 | | |
| 9 | + | |
6 | 10 | | |
7 | 11 | | |
8 | 12 | | |
| |||
11 | 15 | | |
12 | 16 | | |
13 | 17 | | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
14 | 33 | | |
15 | 34 | | |
16 | 35 | | |
| |||
0 commit comments