Skip to content

Commit 1a8ffa3

Browse files
committed
Phase-2: unblock VariableBuffer promotion (IndexError fix + IO/arena filter)
Two structural fixes that together let PromoteTensorsToL2Greedy run without --promoteOnlyConstants: 1. TilerExtension.computeMemoryMap was indexing tilingSolution[idx].nodeConstraints[0] for every entry of the combined memoryMap, which is inner_scheduler_entries (one per pattern, len == len(tilingSolution)) followed by exactly one outer_scheduler entry holding global/constant-style buffers. The outer entry has no per-pattern node constraint. With --promoteOnlyConstants the outer L2 entry was empty so the 'if len(...) != 0' check skipped it; promoting a VariableBuffer into L2 populated the outer entry and the loop crashed with IndexError at tilingSolution[len(tilingSolution)].nodeConstraints[0]. Pass None for the outer entry, same convention as the default-memory-level branch already used. 2. PromoteTensorsToL2Greedy was including graph IO tensors (input_0, output_0) and the MEMORYARENA_L3 meta-buffer in the candidate set when --promoteOnlyConstants was off. Promoting those produced a binary that crashed gvsoc with 'Platform returned an error (exitcode: 1)' (the runtime contract for IO tensors and the arena pointer requires a fixed memory level). Filter them out at candidate-selection time. End-to-end verified on the Phase-1 sweep, 2 KB cap kept on: * IC100: const 2 389 072 cyc / var 2 389 051 cyc / PASS * AD200: const 509 738 cyc / var 509 464 cyc / PASS * ml1: const 3 475 925 cyc / var 3 475 840 cyc / PASS * CCT_2: const 309 860 361 / var 309 890 036 / PASS * miniMobileNet @ 16 KB L2: var 134 946 cyc / PASS Cycle Δ between modes is in noise because the Phase-1 2 KB cap also filters out VariableBuffer activations (all > 2 KB on these models). The structural blocker is now removed; the practical perf benefit of VariableBuffer promotion (where reuse > 1 lets cycle-aware actually outrank smallest) awaits the codegen-side fix that lets the cap be lifted.
1 parent 5b26788 commit 1a8ffa3

2 files changed

Lines changed: 34 additions & 4 deletions

File tree

Deeploy/MemoryLevelExtension/OptimizationPasses/MemoryLevelAnnotationPasses.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,19 @@ def apply(self, ctxt: NetworkContext, graph: gs.Graph) -> Tuple[NetworkContext,
180180
used += _bufferSizeBytes(buf)
181181

182182
from Deeploy.DeeployTypes import ConstantBuffer
183+
184+
# Build the set of names we must NOT promote even if they live at the
185+
# source level: graph IO and the arena meta-buffer. These have
186+
# external semantics (the runtime delivers the input / collects the
187+
# output / treats the arena as a malloc bump pointer); flipping their
188+
# memory level is meaningless and breaks the runtime contract.
189+
ioNames = set()
190+
try:
191+
for tensor in list(graph.inputs) + list(graph.outputs):
192+
ioNames.add(tensor.name)
193+
except Exception:
194+
pass
195+
183196
candidates: List[Tuple[int, int, int, str, VariableBuffer]] = []
184197
for name, buf in ctxt.globalObjects.items():
185198
if not isinstance(buf, VariableBuffer):
@@ -192,6 +205,12 @@ def apply(self, ctxt: NetworkContext, graph: gs.Graph) -> Tuple[NetworkContext,
192205
continue
193206
if self.onlyConstants and not isinstance(buf, ConstantBuffer):
194207
continue
208+
# Never promote arena meta-buffers, IO tensors, or buffers that the
209+
# graph already excluded from deployment.
210+
if "MEMORYARENA" in name:
211+
continue
212+
if name in ioNames or buf.name in ioNames:
213+
continue
195214
if self.excludeNamePatterns and any(pat in name for pat in self.excludeNamePatterns):
196215
continue
197216
if self.maxBufferBytes is not None and _bufferSizeBytes(buf) > self.maxBufferBytes:

Deeploy/TilingExtension/TilerExtension.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -528,11 +528,22 @@ def computeMemoryMap(self, ctxt: NetworkContext, tilingSolution: TilingSolution)
528528
memoryMap[memoryLevel][-1], ctxt, None,
529529
self.memoryHierarchy.memoryLevels[memoryLevel].size - constantTensorOffset, memoryLevel)
530530
else:
531+
# memoryMap[memoryLevel] = inner_scheduler_entries (one per pattern,
532+
# length == len(tilingSolution)) followed by exactly one
533+
# outer_scheduler entry holding global/constant-style buffers.
534+
# The outer entry has no per-pattern node constraint — pass None
535+
# for it (same convention the default-memory-level branch uses).
536+
# Without this, promoting a VariableBuffer into this level
537+
# populated the outer entry and the loop crashed with IndexError
538+
# at tilingSolution[len(tilingSolution)].nodeConstraints[0].
531539
for idx, memMap in enumerate(memoryMap[memoryLevel]):
532-
if len(memoryMap[memoryLevel][idx]) != 0:
533-
memoryMap[memoryLevel][idx] = self.minimalloc(
534-
memMap, ctxt, tilingSolution[idx].nodeConstraints[0],
535-
self.memoryHierarchy.memoryLevels[memoryLevel].size - constantTensorOffset, memoryLevel)
540+
if len(memoryMap[memoryLevel][idx]) == 0:
541+
continue
542+
nodeConstraint = (None
543+
if idx >= len(tilingSolution) else tilingSolution[idx].nodeConstraints[0])
544+
memoryMap[memoryLevel][idx] = self.minimalloc(
545+
memMap, ctxt, nodeConstraint,
546+
self.memoryHierarchy.memoryLevels[memoryLevel].size - constantTensorOffset, memoryLevel)
536547
log.info(f" {SUCCESS_MARK} Memory allocation successful!")
537548

538549
return memoryMap

0 commit comments

Comments
 (0)