You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow ConvGradW C_out tiling to fit dW in smaller L1 budgets
Root cause: ConvGradWTileConstraintBase.addPolicyConstraint hard-pinned
all four dW dimensions to their full shape and also forced dyName[1] full.
At L1=128KB the accumulation target dW for layer3.conv2 of ResNet8
(64x64x3x3x4 = 147456 B) alone exceeds L1, making the OR-Tools geometric
model infeasible.
Fix: dW[C_out, C_in, kH, kW] has the property that each C_out slice is
computed independently (dW[co] = sum_nhw dY[n,co,h,w] * X[n,:,...]). Drop
the C_out full constraint on dW[0] and dyName[1]; keep Cin/kH/kW pinned so
the tile remains a contiguous leading sub-range of the dW buffer (safe 1D
DMA). Extend serializeTilingSolution with an outer loop over C_out tiles
that pulls Cout_tile_max from the tiler solution, emits per-tile
HyperRectangles with the correct C_out offset, and propagates the tile
size into the ch_im_out replacement. When Cout_tile == Cout_full the
iteration count is one, so previously-working configurations (e.g.
ResNet8 at L1=300KB, DSCNN) are unchanged.
Verified:
- ResNet8 L1=300KB L3: tiling still feasible (previously working)
- ResNet8 L1=128KB L3: ConvGradW no longer the blocker; ConvGradX full-
weight constraint remains the blocker for layer3.conv2 at 128KB,
needs C_in tiling + 2D strided DMA (plan B)
0 commit comments