Commit 649cd25
committed
Fix BN backward: apply gamma factor to dX's dbeta/dgamma terms
The canonical BatchNorm backward formula is
dX = (gamma * inv_std / N) * (N * dY - dbeta - x_hat * dgamma)
The gamma factor must multiply ALL three terms inside the parentheses,
not just the N*dY term. Two sites had the same error:
1. PULP_BNGradNormalize_fp32 (split BN backward, second pass)
2. PULP_BatchNormGrad_fp32 (monolithic BN backward)
Fix: pull gamma out into the scale factor
scale = gamma * inv_std / N_total
so that
dX = scale * (N * dY - dbeta - x_hat * dgamma)
applies gamma uniformly.
Impact on MobileNetV1 training (4 steps, random-init):
before fix: step 3 loss diff 0.017 (fail)
after fix : step 3 loss diff 0.003 (pass at TOL=0.01)
The bug was masked at step 0 because gamma is initialized to 1, so
gamma × anything = anything. Visible only after the optimizer starts
updating gamma.
Verification: instrumented PULP_BatchNormGrad_fp32 with a per-call
signature print and compared against PyTorch's autograd dgamma/dbeta
across all 27 BN layers at step 0 — bit-exact within FP32 rounding
(max 1% rel diff on ~1e-8 magnitude grads, <0.1% on all larger grads).1 parent 59d3097 commit 649cd25
1 file changed
Lines changed: 9 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
237 | | - | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
238 | 241 | | |
239 | 242 | | |
240 | 243 | | |
241 | 244 | | |
242 | 245 | | |
243 | 246 | | |
244 | 247 | | |
245 | | - | |
246 | | - | |
| 248 | + | |
247 | 249 | | |
248 | 250 | | |
249 | 251 | | |
| |||
288 | 290 | | |
289 | 291 | | |
290 | 292 | | |
291 | | - | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
292 | 296 | | |
293 | 297 | | |
294 | 298 | | |
295 | 299 | | |
296 | 300 | | |
297 | 301 | | |
298 | 302 | | |
299 | | - | |
300 | | - | |
| 303 | + | |
301 | 304 | | |
302 | 305 | | |
303 | 306 | | |
| |||
0 commit comments