Commit 5b26788
committed
Fix CCT_2 / AnomalyDetection promote correctness via maxBufferBytes cap
Root cause (Phase-1 \xc2\xa710.1 bisection): when a ConstantBuffer is promoted
L3->L2, the L3 closure that would have refreshed the L2 staging buffer
per tile via cl_ram_copy_2d is elided (the tiler sees the buffer at L2
and emits no L3->L2 transfer). The L1 closure still reads the source
from a fixed L2 address with a +0 offset, so every tile re-reads the
first tile_stride bytes of the static PI_L2 symbol. For
broadcast-per-tile constants (LayerNorm scales/biases, small Gemm
biases) this is correct; for tiled-across-channels weights (Conv,
fused proj_bias DUPLICATEs, positional embedding) it is not, and the
output corrupts.
Adds maxBufferBytes (default 2048) to PromoteTensorsToL2Greedy;
threaded through testMVP.py and deeployRunner.py as
--promoteMaxBufferBytes. Buffers larger than the cap are kept in L3
(where the per-tile staging refill correctly advances the source
pointer).
Verified on the full Phase-1 sweep:
* CCT_2_32_32_128 @ L2=400 KB: PASS 0/10 (was FAIL 10/10)
* MLPerf/AnomalyDetection @ L2=200 KB: PASS 0/640, -15.3 %% cycles
(was FAIL 169/640)
* MLPerf/ImageClassification @ L2=100 KB: PASS 0/10, -3.9 %% cycles
* MLPerf/ImageClassification @ L2=120 KB: PASS 0/10, -3.9 %% cycles
* microLlama/microLlama1 @ L2=100 KB: PASS 0/2112, -17.8 %% cycles
Set --promoteMaxBufferBytes=0 to disable the cap once the codegen
handles tiled L2-resident weights correctly (Phase-2).1 parent 678f4ba commit 5b26788
3 files changed
Lines changed: 43 additions & 12 deletions
File tree
- DeeployTest
- testUtils
- Deeploy/MemoryLevelExtension/OptimizationPasses
Lines changed: 22 additions & 11 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
102 | | - | |
| 102 | + | |
| 103 | + | |
103 | 104 | | |
104 | 105 | | |
105 | 106 | | |
| |||
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
121 | | - | |
122 | | - | |
123 | | - | |
124 | | - | |
125 | | - | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
126 | 135 | | |
127 | 136 | | |
128 | 137 | | |
| |||
183 | 192 | | |
184 | 193 | | |
185 | 194 | | |
186 | | - | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
187 | 198 | | |
188 | 199 | | |
189 | 200 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
117 | 120 | | |
118 | 121 | | |
119 | 122 | | |
| |||
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
127 | | - | |
| 130 | + | |
| 131 | + | |
128 | 132 | | |
129 | 133 | | |
130 | 134 | | |
| |||
248 | 252 | | |
249 | 253 | | |
250 | 254 | | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
251 | 265 | | |
252 | 266 | | |
253 | 267 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
197 | 201 | | |
198 | 202 | | |
199 | 203 | | |
| |||
283 | 287 | | |
284 | 288 | | |
285 | 289 | | |
| 290 | + | |
| 291 | + | |
286 | 292 | | |
287 | 293 | | |
288 | 294 | | |
| |||
0 commit comments