Skip to content

Commit e010a75

Browse files
committed
[GAP9] gate NE16AdjustGEMMWeightLayoutPass on engine == "NE16"
The pass was added during the NE16 Linear PR integration (6c8ae2b) and matches every Gemm/RequantizedGemm node without checking the engine attribute, so cluster-bound GEMMs (e.g. MLPerf AnomalyDetection's 10 Gemm+RQ layers — they never run on NE16) had their mul/bias rewritten into NE16 scale/scale_n/shift-diff layout. The cluster pulp_nn_linear kernel then consumed the rewritten constants under its original integer contract and produced ±1 mismatches versus the int8 reference outputs. Mirror the existing NE16AdjustWeightMemoryLayoutPass: bail out for nodes whose engine attr isn't "NE16". Pure-GAP9 cluster Gemms keep Deeploy's Generic + PULPGEMMRequantMergePass layout (including the bias += div/2 rounding compensation), matching the reference. gvsoc gap9.evk (Models/MLPerf/AnomalyDetection L1=64000): - before: 33/640 errors (all ±1), Runtime 89110 cycles - after: 0/640 errors, Runtime 79332 cycles - devel base 3b011bb (where bug doesn't exist): 0/640, 78500 cycles gap9_tiled L2 single-buffer models goes from 9/11 → 10/11 pass. The remaining failure (MLPerf/ImageClassification, parser backtracking on a standalone RequantShift node) is unrelated to GEMM and pre-dates this fix.
1 parent 438f100 commit e010a75

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

  • Deeploy/Targets/GAP9/TopologyOptimizationPasses

Deeploy/Targets/GAP9/TopologyOptimizationPasses/Passes.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,13 @@ def _ne16_adjust_gemm_weight_layout_fun(graph: gs.Graph, match: Match, name: str
4242
matched_nodes = list(match.nodes_map.values())
4343
node = matched_nodes[0]
4444

45+
# Only act on NE16-colored nodes. Cluster-bound Gemm/RequantizedGemm (e.g.
46+
# AnomalyDetection's 10 Gemm+RQ layers in the MLPerf gap9-tiled model
47+
# tests) must keep Deeploy's original mul / bias / no-scale_n layout so
48+
# pulp_nn_linear stays bit-exact with the int8 reference outputs.
49+
if node.attrs.get("engine") != "NE16":
50+
return graph
51+
4552
# Weight is input[1] for both Gemm and RequantizedGemm
4653
weightTensor = node.inputs[1]
4754

0 commit comments

Comments
 (0)