Commit d4596f8
committed
debug: add PW ConvGradW entry probe for integration bug diagnosis
Temporary printf at entry of PULP_PWConvGradW2d_fp32_fp32_fp32_CHW
dumping X[0] and dY[0]. Purpose: confirm whether integrated
MobileNet's blocks.0.pw dW[0] = 0.00177 vs PyTorch ref 0.00295 (40%
off) is caused by upstream data corruption or by the PW kernel itself.
Finding: X[0] and dY[0] in sim match PyTorch bit-exact, so the input
data feed is correct. The bug is in pulp_conv_pw_fp32_bw_param_grads_cl
(or the mm_add it calls) when invoked with C_out tile size < NUM_CORES
(blocks.0.pw tile is C_out=5 with 8 cores). Sim/ref ratio 0.6 ≈ 5/8
suggests ~3 cores' contributions are being lost. Isolated
ConvGradW_PW_block_0 test uses full C_out=16 so doesn't hit this path.
Revert this printf after diagnosis.1 parent df7e10f commit d4596f8
1 file changed
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
732 | 732 | | |
733 | 733 | | |
734 | 734 | | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
735 | 741 | | |
736 | 742 | | |
737 | 743 | | |
| |||
0 commit comments