You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit replaces 39bb8f1's experimental Gemm->MatMul lowering pass
(which unblocked the original KeyError 'C' but exposed a deeper Transpose
rank-mismatch bug downstream) with two smaller, locally-verified fixes:
1) Hoist a properly-shaped zero C tensor in GEMMRedmuleParser when an
ONNX Gemm has only A and B (e.g. backward GradFusedMatMul rewrites in
CCT_train). Fixes for the hoist path:
- GEMMRedmuleParser.__init__ used to set self.noBiasHoisting *before*
calling super().__init__(), but MatMulParser.__init__ also writes
self.noBiasHoisting from its own default of True -- so the caller's
flag was silently clobbered. Reverse the order and forward the
kwarg.
- The hoist used to allocate a 1-element np.zeros((1)) scalar; that
would never satisfy RedmuleGEMMTileConstraint's "C dim equals
output dim" assertion. Allocate a zero array whose shape matches
node.outputs[0].shape.
- Pass _type=PointerClass(float32_t) to ctxt.hoistConstant so the
buffer is type-annotated up-front. Without it,
MemoryScheduler.getConstantTensorOffset later trips an
AttributeError on the un-annotated buffer.
- Append the hoisted Constant to node.inputs so the tiler picks
it up via its node.inputs + node.outputs walk, AND register the
Gemm as a user via newCtxt.addUser so the
MemoryConstraintFlow kill-set assertion (which walks _users)
finds a consumer.
- Engine.GEMMMRedmuleMapper now instantiates with
noBiasHoisting=False so the hoist path is actually taken.
Drop the BiaslessGemmToMatMulPass class (added in 39bb8f1) and its
Deployer registration: the parser-side hoist is the smaller fix and
side-steps the MatMul broadcasting issue entirely.
2) Fix Generic/TransposeTileConstraint and PULPOpen/TransposeTemplate to
use a *spatial-view* interpretation of perm. When MatMulLayer.
computeShapes broadens an already-existing tensor that is
simultaneously a forward MatMul B input *and* a downstream
non-broadening consumer (Gemm/Transpose), data_in and data_out of a
downstream Transpose can end up with different ranks. Both
addGeometricalConstraint and serializeTilingSolution previously
assumed len(perm) == data_in_rank == data_out_rank; they now offset
their shape lookups by len(shape) - len(perm) so the perm targets
the trailing spatial dims in either tensor. PULPTransposeTemplate's
alignToContext gets the same treatment for its dimLen_<idx> lookup
and parallelDim selection.
Aligned cases (existing kernel fixtures testFloatGEMM /
testFloatGEMMtransB) compute identical offsets of 0 and behave
exactly as before. This commit verifies the fix locally on
Models/Training/CCT/cct_train: testMVPTraining.py and
testMVPOptimizer.py both exit 0 on Siracusa_w_redmule, producing a
~7.7 MB TrainingNetwork.c and matching OptimizerNetwork.c.
C compilation + GVSoC simulation still need to be validated on CI
(can't run the runwangdl/gvsoc fork locally in the agent container).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments