Skip to content

Commit d4596f8

Browse files
committed
debug: add PW ConvGradW entry probe for integration bug diagnosis
Temporary printf at entry of PULP_PWConvGradW2d_fp32_fp32_fp32_CHW dumping X[0] and dY[0]. Purpose: confirm whether integrated MobileNet's blocks.0.pw dW[0] = 0.00177 vs PyTorch ref 0.00295 (40% off) is caused by upstream data corruption or by the PW kernel itself. Finding: X[0] and dY[0] in sim match PyTorch bit-exact, so the input data feed is correct. The bug is in pulp_conv_pw_fp32_bw_param_grads_cl (or the mm_add it calls) when invoked with C_out tile size < NUM_CORES (blocks.0.pw tile is C_out=5 with 8 cores). Sim/ref ratio 0.6 ≈ 5/8 suggests ~3 cores' contributions are being lost. Isolated ConvGradW_PW_block_0 test uses full C_out=16 so doesn't hit this path. Revert this printf after diagnosis.
1 parent df7e10f commit d4596f8

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

TargetLibraries/PULPOpen/src/ConvGrad.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -732,6 +732,12 @@ void PULP_PWConvGradW2d_fp32_fp32_fp32_CHW(
732732
uint32_t C_out, const float *__restrict__ pInput, uint32_t H_in,
733733
uint32_t W_in, uint32_t C_in, float *__restrict__ pGradWeight) {
734734

735+
if (pi_core_id() == 0) {
736+
static int __pw_entry = 0;
737+
printf("[PWGRADW_ENTRY call=%d C_in=%u C_out=%u H=%u W=%u X[0]=%.9e dY[0]=%.9e]\r\n",
738+
__pw_entry++, C_in, C_out, H_in, W_in, pInput[0], pGradOut[0]);
739+
}
740+
735741
struct blob input_blob = {0};
736742
struct blob output_blob = {0};
737743
struct blob coeff_blob = {0};

0 commit comments

Comments
 (0)