Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
49cddd2
Deeploy Microbenchmark with GVSoC CSR and Demo on GEMM
runwangdl Feb 12, 2026
b260e4e
Add float concat and Change padding pattern of ConV
runwangdl Feb 13, 2026
7b55d0a
Merge branch 'devel' into sleepvit
runwangdl Feb 15, 2026
5a79c79
Support SleepViT on Gap9
runwangdl Feb 15, 2026
609179c
Add microbenchmark to codepass
runwangdl Feb 15, 2026
bedec91
Add GAP9 Container Support
Xeratec Feb 12, 2026
97c2d2b
Fix spelling mistakes and remove dependencies from fork
Xeratec Feb 4, 2026
3423c54
Fix Missing Version Link
Xeratec Feb 6, 2026
d6b6ac9
Temporarily disable GAP9 on forks
Xeratec Feb 8, 2026
daf8cda
Add Shell Format pre-commit
Xeratec Feb 12, 2026
f1c7d57
Update to GAP9 SDK `v5.21.1-staging-1`
Xeratec Feb 12, 2026
646563d
Print memory usage by default
Xeratec Feb 12, 2026
b2b43a5
Cleanup Makefile
Xeratec Feb 13, 2026
1a075d0
Try to fix private GAP9 SDK access issue
Xeratec Feb 13, 2026
2bb1bf5
Use pre-build GAP9 GCC
Xeratec Feb 16, 2026
af405a2
Fix Typos
Xeratec Feb 16, 2026
c6bc2c6
Partially revert a16c1c757928ed37f89d2744609fd97bb3ddd702
Xeratec Feb 16, 2026
d325f79
Build AutoTiler
Xeratec Feb 16, 2026
53b4bb9
CodeAIRabbit Feedback
Xeratec Feb 16, 2026
6d1c8c3
Update Changelog
Xeratec Feb 17, 2026
6db1c52
CodeAIRabbit Feedback
Xeratec Feb 17, 2026
ba6c1e8
Add single kernel tests for random perturbations
JanCSEM Feb 19, 2026
90edc44
Add ZO model tests
JanCSEM Feb 19, 2026
4bcefd3
Cherry picked NCHW->NHWC transform issue
JanCSEM Jan 13, 2026
44699b2
FIX: Bug in conv2D parser assigning kernel shape regardless of channe…
JanCSEM Feb 19, 2026
2b5f611
Fix compilation bugs caused by ZO nodes
JanCSEM Feb 20, 2026
846d4dd
Add option to deploy on the board for the GAP9 platform
Victor-Jung Jan 21, 2026
ae3c4d1
WIP: Better error message when attaching usbip and vid + pid of the n…
Victor-Jung Feb 20, 2026
02ac6c9
remove prints
JanCSEM Feb 20, 2026
5911375
Remove debugging test, add Eggroll test+
JanCSEM Feb 21, 2026
03b6aa9
Add implementation of Eggroll kernel
JanCSEM Feb 21, 2026
874e777
Add proper D flag for GAP9 board
Victor-Jung Feb 23, 2026
fbbea5f
Fix gapy cmd and add comment to dockerfile
Victor-Jung Feb 23, 2026
6b68b73
Live print of the simulator cmd
Victor-Jung Feb 23, 2026
e541c87
Initial Training platform
runwangdl Feb 25, 2026
8fbf03b
Updated training update with gradient accumulation and optimizer update
runwangdl Feb 26, 2026
54f5832
Add MLP_Train Test
runwangdl Feb 26, 2026
675dc33
Fix imports
Victor-Jung Feb 26, 2026
c999a2e
Added support for Eggroll ZO perturbation
JanCSEM Feb 26, 2026
4df0282
Add training test harness for GAP9
Victor-Jung Feb 27, 2026
65aa38c
Debug eggroll kernel
JanCSEM Feb 28, 2026
93724ee
Port Noise kernels to GAP9
JanCSEM Feb 28, 2026
c705477
HACK: explicitly import PULP kernels into network.c
JanCSEM Feb 28, 2026
49a9de6
HACK: Explicitly import PULP kernels into GAP9's network.c
JanCSEM Feb 28, 2026
af7f5d4
HACK: Explicitly import PULP kernels into GAP9's network.c
JanCSEM Feb 28, 2026
69ebf5a
merge sleepvit
JanCSEM Feb 28, 2026
962d809
Fix commented code
Victor-Jung Mar 2, 2026
3bdc0ff
Temporal Changes for Multi-Ouput Kernels to fit the new testtraining …
runwangdl Mar 2, 2026
4f949b0
Add Small Conv+Transformer Test for training untiled platform
runwangdl Mar 2, 2026
a93b6d1
Enable Transformer training for Siracusa GVSoC
Victor-Jung Mar 2, 2026
a77ead9
Alloc failure on board during first inference + bp
Victor-Jung Mar 2, 2026
6210437
RISCV-SUMMIT Demo
runwangdl Mar 3, 2026
e98b7c1
Avoid generation redundant memory copy for the same input during mult…
runwangdl Mar 2, 2026
6916a80
Add relu grad
runwangdl Mar 3, 2026
b060e67
Wrong Free of aliased_input
runwangdl Mar 3, 2026
99bbbf6
LATEST DEMO for RISCV SUBMIT
runwangdl Mar 3, 2026
fd3722e
[GAP9] Loop properly around dateset in train harness
Victor-Jung Mar 4, 2026
565125a
[GAP9] Add relugrad + add debugprint fp32 +
Victor-Jung Mar 4, 2026
9c918bc
Set layer norm esp to 0.0001
Victor-Jung Mar 4, 2026
1f82f07
Adding noise kernel microbenchmarks
JanCSEM Mar 6, 2026
a1a5369
Optimized kernels with function call removals and loop unrolling
JanCSEM Mar 9, 2026
90d75ad
Fix redundant seed mixing
JanCSEM Mar 9, 2026
7a7564e
.gitignore
Mar 10, 2026
f45542a
fix merge conflicts
Mar 10, 2026
43a798b
Optimize Ziggurat and Box-muller kernels
Mar 13, 2026
43902a6
update LiteCNN tests, and added QLiteCNN tests
JanCSEM Mar 13, 2026
143ad15
add labels to ZO graph inputs
JanCSEM Mar 13, 2026
c18af62
Fix inputs.npz for ZO graphs
JanCSEM Mar 13, 2026
66888da
update inputs.npz
JanCSEM Mar 13, 2026
769abe7
Rename loss to log prob to not upset deeploy
JanCSEM Mar 13, 2026
0f7c1a2
ZO softmax cross entropy debug
Mar 14, 2026
12f311a
Fix issues with shapes being integer
JanCSEM Mar 14, 2026
8483963
Fix QLiteCNN, add support for Tiled Quant/Dequant nodes
JanCSEM Mar 14, 2026
701e60c
Noise test updates
JanCSEM Mar 15, 2026
871d07b
Add integer Random Noise kernels
JanCSEM Mar 15, 2026
1b01b67
Add RQS Kernel tests
JanCSEM Mar 15, 2026
b68ad2d
Bug fixes on quantized noise graphs
JanCSEM Mar 16, 2026
7c50c5c
Debug RQSPerturb tiling and type checks for quantized nodes
JanCSEM Mar 17, 2026
c0ab98c
Add Passing QLiteCNN, QSleepViT and ZO versions
JanCSEM Mar 17, 2026
61ebc20
update zo_models_benchmarks
JanCSEM Mar 21, 2026
849f54a
Optimize QSleepViT memory
JanCSEM Mar 21, 2026
ebb7228
update tests with non-tiled versions matching python model
JanCSEM May 8, 2026
bd4d0e4
Add exception o convert requant to cmsis if onputs come from rqspertu…
JanCSEM May 8, 2026
be86c5a
Separate quantization kernels
JanCSEM May 8, 2026
729454c
update perturbation templates
JanCSEM May 8, 2026
0cdba94
Workaround o3 optimization issue for randomnoise.c
JanCSEM May 8, 2026
239d657
Add cores argument to DeeployRunner instead of using default
JanCSEM May 8, 2026
d588548
Update C perturbation kernels
JanCSEM May 8, 2026
fe366d1
restore rounding in adddrequantmergepAss
JanCSEM May 8, 2026
dcece0b
Add ReLU6 support
JanCSEM Jun 8, 2026
b20f8c1
Add MCUNet and TSDR model tests, Fp32, Int8, ZO
JanCSEM Jun 8, 2026
12dfe73
Remove integer logic in FloatReduceMeanTemplate
JanCSEM Jun 8, 2026
842c877
Add number of corse information print
JanCSEM Jun 8, 2026
6166ba3
Fix GAP9 c kernels imports
JanCSEM Jun 8, 2026
5bb97a1
Add mchan 7 api for data transfers
JanCSEM Jun 8, 2026
274ce1b
Fix datatype in GAP9L3DMA node template
JanCSEM Jun 8, 2026
36c6679
Allow channel-wise tiling in addition to geometric tiling for DWconv,…
JanCSEM Jun 8, 2026
4d897c4
FIX fp64 to fp32 representability check
JanCSEM Jun 8, 2026
397c345
On board L3 Bug due to wrong gapy command sequencing and duplicate in…
runwangdl Jun 9, 2026
7148813
Update Changelog
runwangdl Jun 9, 2026
e5d20af
Fix typo in DWConvTileConstraint
JanCSEM Jun 9, 2026
e739c43
Add support for per tile random seeding
JanCSEM Jun 9, 2026
faca7c3
Improve ZO perturbation scheduling
JanCSEM Jun 9, 2026
e1f36d9
update excution.py
JanCSEM Jun 9, 2026
fcc61c6
Merge GAP9 L3 fix*
JanCSEM Jun 9, 2026
24f7f59
Setup script for on-board exp
JanCSEM Jun 9, 2026
f614244
fix sys import missing
JanCSEM Jun 9, 2026
b77f480
Fix RelU6 parser
JanCSEM Jun 9, 2026
5ba9477
Post GAP9 sdk fix merge leftover commit
JanCSEM Jun 9, 2026
92db04b
make output checks more verbose
JanCSEM Jun 9, 2026
f3a8bb0
temp L3 conv debug logic
JanCSEM Jun 9, 2026
f770d6e
Add debug MCUNet nets
JanCSEM Jun 9, 2026
0c35e14
REplace broken pi_cl_copy_2d by 1D
JanCSEM Jun 9, 2026
ddfcc7b
revert L3Dma change
JanCSEM Jun 10, 2026
83ee103
Debug L3 logic
JanCSEM Jun 10, 2026
a2d347d
Zero init L1/L2
JanCSEM Jun 10, 2026
a9235bc
Zero init L3 debug
JanCSEM Jun 10, 2026
6dbd489
Check flash -> L3 loading
JanCSEM Jun 10, 2026
79ae082
debug prints of pre transposed Conv
JanCSEM Jun 10, 2026
6ef25f9
debug prints of pre transposed Conv
JanCSEM Jun 10, 2026
2411c2c
L3 debug: weight corruption in conv layers
JanCSEM Jun 11, 2026
87d8f81
L3 weigths debug : dory mem.
JanCSEM Jun 11, 2026
e799af5
L3 Debug weights, flash -> L3
JanCSEM Jun 11, 2026
6e2fd28
Debug: L3 -flash continued
JanCSEM Jun 11, 2026
207474a
Add gap9 board runner patch for last SPI flash write
JanCSEM Jun 11, 2026
b2158e1
Remove Debug Logic after Fix in GAP9 SDK
JanCSEM Jun 11, 2026
f4d11ab
Fix DWTileConstraint and replace perturbation for fp32 ZO MCUNet and …
JanCSEM Jun 11, 2026
968094d
Optimize QMCUNet to have a quantized input
JanCSEM Jun 11, 2026
c9b118e
Debug hang in QSTDR Dequant
JanCSEM Jun 11, 2026
d749c50
Debug Hang QTSDR continued
JanCSEM Jun 11, 2026
af9fa5f
Debug QSTDR
JanCSEM Jun 11, 2026
e0b9cc8
Debug hang QSTDR: pulp BEacon
JanCSEM Jun 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci-platform-gap9-tiled.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ concurrency:

jobs:
select-env:
if: github.repository == 'pulp-platform/Deeploy'
uses: ./.github/workflows/_select-env.yml
with:
docker_image_deeploy: ${{ github.event.inputs.docker_image_deeploy || 'ghcr.io/pulp-platform/deeploy-gap9:devel' }}
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/ci-platform-gap9.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ concurrency:

jobs:
select-env:
if: github.repository == 'pulp-platform/Deeploy'
uses: ./.github/workflows/_select-env.yml
with:
docker_image_deeploy: ${{ github.event.inputs.docker_image_deeploy || 'ghcr.io/pulp-platform/deeploy-gap9:devel' }}
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,12 @@ DeeployTest/Tests/**/*.json
DeeployTest/Tests/**/generateTest.py
DeeployTest/out.txt

venv/
**/.venv/
CHANGELOG_GEN.md

# Container Artifacts
.pyusbip/
.cache/

CLAUDE.md
CLAUDE.md
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
- Update CLI interface Across Project, Fix Tutorial, and Remove Legacy Test [#157](https://github.com/pulp-platform/Deeploy/pull/157)
- Fix for python error when using python 3.12.11 [#189]( https://github.com/pulp-platform/Deeploy/pull/189)
- Add support for Operators for Generic target needed in MAGIA [#193]( https://github.com/pulp-platform/Deeploy/pull/193)
- Fix GAP9 L3 Board Tests: readfs Flash Ordering and Duplicate Input Data [#196](https://github.com/pulp-platform/Deeploy/pull/196)

### Added
- Add many missing docstrings
Expand All @@ -42,6 +43,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
- PULP-NN moved to TargetLibraries third-party folder
- Aligned CLI commands across the project
- Added @runwangdl as a code owner
- Skip emitting duplicate `testInputVector` data for inputs placed in L3 (loaded at runtime from the readfs hex instead), reducing test binary size

### Fixed
- Add missing `shell: bash` directive to CI cache generation steps to ensure correct shell execution
Expand All @@ -54,6 +56,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
- Fix tiling variable replacement corrupting static arrays by changing pointer update from value copy to address reassignment
- Reduce RunNetwork stack usage by scoping per-layer variables with braces and moving tileIdxPtr allocation into per-layer execution blocks
- Fix invalid escape sequence python error in DeeployTypes.py: appearing when using pytest to launch regressions
- Fix GAP9 board tests with `--defaultMemLevel L3` reading garbage inputs: place all gapy `--flash-property` options before the positional subcommand and use `image flash run` so the readfs partition (input hex files) is flashed to the device

### Removed
- `testDMA.py` was an old test; we now have `test_dmas.py` instead.
Expand Down
2 changes: 0 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ elseif(platform STREQUAL PULPOpen)
elseif(platform STREQUAL GAP9)
message(STATUS "Building for platform 'GAP9'")
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})

# Select SDK config based on simulator type
if(SIMULATOR STREQUAL "board")
set(ENV{KCONFIG_CONFIG} DeeployTest/Platforms/GAP9/sdk_board.config)
Expand All @@ -45,7 +44,6 @@ elseif(platform STREQUAL GAP9)
set(ENV{KCONFIG_CONFIG} DeeployTest/Platforms/GAP9/sdk_gvsoc.config)
message(STATUS "[GAP9] Using GVSoC SDK configuration")
endif()

include($ENV{GAP_SDK_HOME}/utils/cmake/setup.cmake)
elseif(platform STREQUAL Generic)
message(STATUS "Building for platform 'Generic'")
Expand Down
27 changes: 21 additions & 6 deletions Deeploy/AbstractDataTypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,13 +289,28 @@ def checkValue(cls, value: Union[float, Iterable[float], np.ndarray], ctxt: Opti
continue

# Check if exponent is representable.
if (cls.typeExponentOffset + exponent) > cls.typeExponentMax or (cls.typeExponentOffset + exponent) < 0:
return False

# Check if mantissa is representable. Implicit assumption is that cls.typeMantissa < 52 (like in FP64)
truncated_mantissa = 1 + math.floor((2**cls.typeMantissa) * (mantissa - 1)) / (2**cls.typeMantissa)
if math.fabs(truncated_mantissa - mantissa) > 0.0:
biased_exp = cls.typeExponentOffset + exponent
if biased_exp > cls.typeExponentMax:
return False
elif biased_exp >= 1:
# Normal number: check if mantissa is representable.
# Implicit assumption is that cls.typeMantissa < 52 (like in FP64)
truncated_mantissa = 1 + math.floor((2**cls.typeMantissa) * (mantissa - 1)) / (2**cls.typeMantissa)
if math.fabs(truncated_mantissa - mantissa) > 0.0:
return False
else:
# Subnormal candidate (biased_exp <= 0).
# Minimum subnormal has biased_exp = 1 - typeMantissa (one ULP above zero).
if biased_exp < (1 - cls.typeMantissa):
return False
# Value = mantissa * 2^exponent must be an integer multiple of the subnormal LSB
# (2^(1 - typeExponentOffset - typeMantissa)). The number of LSBs is:
# mantissa * 2^(biased_exp - 1 + typeMantissa)
# which must be an exact integer for the value to be representable.
shift = biased_exp - 1 + cls.typeMantissa
mantissa_bits_float = mantissa * (2**shift)
if math.fabs(mantissa_bits_float - round(mantissa_bits_float)) > 0.0:
return False

return True

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ def _convert_requant_to_cmsis_fun(graph: gs.Graph, match: Match, name: str):
if 'Emulate_CMSIS_RequantShift' in rqs.attrs:
return graph

# Skip if inputs are not constants (e.g., when modified by perturbation nodes)
if not isinstance(rqs.inputs[-1], gs.Constant) or not isinstance(rqs.inputs[-2], gs.Constant):
return graph

# WIESEP: Because CMSIS performs add-multiply-divide and we normally do multiply-add-divide
# we can emulate the same behavior by rounding the MUL value
rqs.inputs[-1].values = np.round(copy.deepcopy(rqs.inputs[-1].values) /
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -245,8 +245,15 @@ def _NCHWtoNHWC_fun(graph: gs.Graph, match: Match, name: str, default_channels_f
if node.op in ["Conv", "RequantizedConv"]:
# In the case of Conv: [weights, opt. bias], RequantizedConv: [weights, mul, add, opt. shift]
for tensor in node.inputs[1:]:
_transformLayoutConst(tensor, spatialDims, default_channels_first)

# Standard case: The weight is a direct constant input.
if isinstance(tensor, gs.Constant):
_transformLayoutConst(tensor, spatialDims, default_channels_first)

# MeZO case: The weight is produced by a Perturb node.
elif isinstance(tensor, gs.Variable):
if len(tensor.shape) > 1:
permute_temp = _transformLayoutPermutation(len(tensor.shape), spatialDims, default_channels_first)
graph.nodes.append(_appendTranspose(tensor, node, permute_temp))
node.attrs["channels_first"] = default_channels_first

return graph
Expand Down
38 changes: 34 additions & 4 deletions Deeploy/DeeployTypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ def has_live_aliases(self, ctxt: NetworkContext) -> bool:
next = queue.pop()
buffNext = ctxt.lookup(next)
assert isinstance(buffNext, VariableBuffer)
live |= buffNext._live
live |= buffNext._live or (next in ctxt.globalObjects)
visited.add(next)
queue |= buffNext.aliases - visited
return live
Expand All @@ -359,7 +359,10 @@ def sizeInBytes(self) -> int:
Size of this VariableBuffer in bytes

"""
return (math.prod(self.shape) * (self._type.referencedType.typeWidth)) // 8
if isinstance(self.shape, int):
return (self.shape * (self._type.referencedType.typeWidth)) // 8
else:
return (math.prod(self.shape) * (self._type.referencedType.typeWidth)) // 8


class TransientBuffer(VariableBuffer):
Expand Down Expand Up @@ -1322,6 +1325,10 @@ def typeCheckNodeInputs(self, ctxt: NetworkContext, node: gs.Node) -> bool:
reference._instance = _type(inputNode.name, ctxt)
else:
retCheck &= reference._type.referencedType == _type.referencedType

# if node.name == "GradientAccumulator1_InPlaceAccumulatorV2_backward" and retCheck == False:
# import IPython; IPython.embed()

return retCheck

def typeInferGlobalCtxt(self, ctxt: NetworkContext, node: gs.Node) -> NetworkContext:
Expand Down Expand Up @@ -2099,11 +2106,15 @@ def bind(self, ctxt: NetworkContext) -> Tuple[NetworkContext, bool]:
# Update shapes and types of tensors in onnx graph based on type inference after binding
for node in (self.node.inputs + self.node.outputs):
if ctxt.is_local(node.name):
if not hasattr(ctxt.localObjects[node.name], '_type'):
continue # skip untyped buffers (e.g. ReduceSum axes, MaxPool mask)
node.shape = ctxt.localObjects[node.name].shape
npType = self._broadcastToNpType(ctxt.localObjects[node.name]._type)
if npType is not None:
node.dtype = npType
elif ctxt.is_global(node.name):
if not hasattr(ctxt.globalObjects[node.name], '_type'):
continue # skip untyped global buffers
npType = self._broadcastToNpType(ctxt.globalObjects[node.name]._type)
if isinstance(ctxt.globalObjects[node.name], ConstantBuffer):
if isinstance(node, gs.Constant):
Expand Down Expand Up @@ -2854,6 +2865,12 @@ def generateInferenceInitializationCode(self) -> str:
if isinstance(node, StructBuffer):
continue

# Skip local buffers that were registered but never typed (e.g. optional ONNX
# outputs like the MaxPool indices/mask tensor). These are not referenced by any
# template and must not be emitted as C declarations.
if not hasattr(node, '_type'):
continue

name = node.name
node.name = self.ctxt._mangle(node.name)
callStack += node.init()
Expand Down Expand Up @@ -2898,10 +2915,11 @@ def generateIOBufferInitializationCode(self) -> str:

callStack += "static const uint32_t " + self.ctxt._mangle("num_inputs") + f" = {len(inputs)};"
callStack += "static const uint32_t " + self.ctxt._mangle("num_outputs") + f" = {len(outputs)};"

callStack += "static const uint32_t seed = 12345;" # fixed seed for reproducibility
callStack += "static const uint32_t perturbation_sign = 1;" # fixed sign for reproducibility
callStack += "extern void* " + self.ctxt._mangle("inputs") + f"[{len(inputs)}];"
callStack += "extern void* " + self.ctxt._mangle("outputs") + f"[{len(outputs)}];"

callStack += "static const uint32_t " + self.ctxt._mangle("inputs_bytes") + f"[{len(inputs)}] = " + "{"

numBytes = []
Expand Down Expand Up @@ -2954,6 +2972,8 @@ def generateBufferInitializationCode(self) -> str:
callStack = ''
for node in ctxt.globalObjects.values():
if isinstance(node, VariableBuffer) and not isinstance(node, StructBuffer):
if not hasattr(node, '_type'):
continue # skip untyped buffers (e.g. ReduceSum axes constants)
assert issubclass(node._type, Pointer), f"Global VariableBuffer {node.name} is not a Pointer!"
if node._deploy:
name = node.name
Expand Down Expand Up @@ -2999,6 +3019,8 @@ def generateBufferAllocationCode(self) -> str:

for node in ctxt.globalObjects.values():
if isinstance(node, VariableBuffer) and not isinstance(node, StructBuffer):
if not hasattr(node, '_type'):
continue # skip untyped buffers (e.g. ReduceSum axes constants)
assert issubclass(node._type, Pointer), f"Global VariableBuffer {node.name} is not a Pointer!"
if node._deploy:
name = node.name
Expand Down Expand Up @@ -3063,6 +3085,8 @@ def generateIncludeString(self) -> str:
for engine in self.Platform.engines:
for include in engine.includeList:
includeStr += ["#include \"" + include + "\""]
if engine.name == "GAP9Cluster":
includeStr += ["#include \"kernel/RandomNoise.h\""]
return ("\n").join(includeStr)

def generateEngineInitializationCode(self) -> str:
Expand Down Expand Up @@ -3124,6 +3148,10 @@ def _exportGraph(self, folderPath, fileName):
if tensor.dtype != tensor.export_dtype:
tensor.values = tensor.values.astype(tensor.export_dtype)

# JANSNO: Shapes of tensors should never be an int.
for tensor in self.graph.tensors().values():
if tensor.shape is not None and isinstance(tensor.shape, int):
tensor.shape = tensor.shape = [tensor.shape]
model = gs.export_onnx(self.graph)

# Annotate additional information in doc_string of tensors
Expand Down Expand Up @@ -3536,6 +3564,8 @@ def _printMemorySummary(self):
if isinstance(_buffer, ConstantBuffer) or (isinstance(_buffer, VariableBuffer) and _buffer._deploy):
# SCHEREMO: We only
if (hasattr(_buffer, "_memoryLevel") and _buffer._memoryLevel == level) or level == "None":
if not hasattr(_buffer, '_type'):
continue # skip untyped buffers (e.g. ReduceSum axes constants)
staticSize += int((np.prod(_buffer.shape) * _buffer._type.referencedType.typeWidth // 8))
else:
log.warning(f"Buffer {_buffer.name} does not have a valid memory level")
Expand Down
Loading