[GAP9] gate NE16AdjustGEMMWeightLayoutPass on engine == "NE16"

runwangdl · runwangdl · commit e010a75cf467 · 2026-05-14T21:01:58.000Z
The pass was added during the NE16 Linear PR integration (6c8ae2b) and matches every Gemm/RequantizedGemm node without checking the engine attribute, so cluster-bound GEMMs (e.g. MLPerf AnomalyDetection's 10 Gemm+RQ layers — they never run on NE16) had their mul/bias rewritten into NE16 scale/scale_n/shift-diff layout. The cluster pulp_nn_linear kernel then consumed the rewritten constants under its original integer contract and produced ±1 mismatches versus the int8 reference outputs. Mirror the existing NE16AdjustWeightMemoryLayoutPass: bail out for nodes whose engine attr isn't "NE16". Pure-GAP9 cluster Gemms keep Deeploy's Generic + PULPGEMMRequantMergePass layout (including the bias += div/2 rounding compensation), matching the reference. gvsoc gap9.evk (Models/MLPerf/AnomalyDetection L1=64000): - before: 33/640 errors (all ±1), Runtime 89110 cycles - after: 0/640 errors, Runtime 79332 cycles - devel base 3b011bb (where bug doesn't exist): 0/640, 78500 cycles gap9_tiled L2 single-buffer models goes from 9/11 → 10/11 pass. The remaining failure (MLPerf/ImageClassification, parser backtracking on a standalone RequantShift node) is unrelated to GEMM and pre-dates this fix.
diff --git a/Deeploy/Targets/GAP9/TopologyOptimizationPasses/Passes.py b/Deeploy/Targets/GAP9/TopologyOptimizationPasses/Passes.py
@@ -42,6 +42,13 @@ def _ne16_adjust_gemm_weight_layout_fun(graph: gs.Graph, match: Match, name: str
     matched_nodes = list(match.nodes_map.values())
     node = matched_nodes[0]
 
+    # Only act on NE16-colored nodes. Cluster-bound Gemm/RequantizedGemm (e.g.
+    # AnomalyDetection's 10 Gemm+RQ layers in the MLPerf gap9-tiled model
+    # tests) must keep Deeploy's original mul / bias / no-scale_n layout so
+    # pulp_nn_linear stays bit-exact with the int8 reference outputs.
+    if node.attrs.get("engine") != "NE16":
+        return graph
+
     # Weight is input[1] for both Gemm and RequantizedGemm
     weightTensor = node.inputs[1]