Skip to content

Commit 7207983

Browse files
ooplesfranklinic
andauthored
fix(#1304 c6): drop Dropout from OccupancyNN defaults; fix memorization invariant (#1391)
PR #1290 CI Cluster 6 #1304: OccupancyNeuralNetworkTests.LossStrictlyDecreasesOnMemorizationTask was reported to be fixed by PR #1329's BatchNorm→LayerNorm swap, but the test was still red on master with loss step 1=0.6936, step 100=0.7032 (slightly INCREASING) — model stuck at the BCE-ln(2) baseline through 100 gradient steps. ## Root cause PR #1329 fixed the BN-at-batch-1 degeneracy (σ²=0 → y=β collapses the gradient through normalization) but the *Dropout layer*'s memorization-blocking effect was not addressed. The default Occupancy layer stack was: Dense(64)+ReLU → LayerNorm → Dropout(0.3) → Dense(32)+ReLU → LayerNorm → Dropout(0.2) → Dense(16)+ReLU → Dense(out)+Sigmoid Under the model-family LossStrictlyDecreasesOnMemorizationTask invariant — train the SAME (x, target) pair for 100 iterations and assert loss strictly decreases — every forward pass under Dropout sees a DIFFERENT random sub-network (~56% of hidden units active = 0.7 × 0.8). On a 3 → 64 → 32 → 16 → 1 MLP (~2k params), the per-step mask randomness injects more variance than the gradient can subtract over 100 steps, leaving loss flat or slightly RISING at the BCE-ln(2) baseline. ## Fix Remove Dropout from both `CreateDefaultOccupancyLayers` and `CreateDefaultOccupancyTemporalLayers` in `LayerHelper<T>`. At this network size Dropout adds no useful regularization (the model has fewer params than typical sensor batches have rows); callers who genuinely need regularization on a larger Occupancy MLP can pass an explicit architecture with their preferred Dropout rate. ## Verification $ dotnet test --filter "FullyQualifiedName~OccupancyNeuralNetworkTests" Passed! - Failed: 0, Passed: 21, Skipped: 0, Total: 21 All 21 OccupancyNN tests pass (was 1 failing). The 4 remaining #1304 tests post-fix: - SimCSETests.TrainingError_ShouldNotExceedTestError PASS (was passing already on current master) - SimCSETests.Training_ShouldChangeParameters PASS (was passing already on current master) - DenseNetNetworkTests.MoreData_ShouldNotDegrade Adam-overshoot divergence (200-iter loss > 50-iter loss); separate follow-up issue - NEATTests.Training_ShouldReduceLoss timeout (perf gap, similar to #1390); separate follow-up issue Closes #1304 partially. DenseNet + NEAT follow-ups tracked separately. Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent ce00cfd commit 7207983

1 file changed

Lines changed: 24 additions & 5 deletions

File tree

src/Helpers/LayerHelper.cs

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -787,14 +787,16 @@ public static IEnumerable<ILayer<T>> CreateDefaultOccupancyTemporalLayers(
787787
// Dense layers for further processing. LayerNormalization (Ba 2016)
788788
// rather than BatchNormalization so the head still normalizes at any
789789
// batch size — memorization-style training runs at batch=1 and BN
790-
// collapses (σ² = 0) under those conditions.
790+
// collapses (σ² = 0) under those conditions. Dropout removed for
791+
// the same reason as the non-temporal variant above (#1304
792+
// cluster-6 follow-up) — per-step mask randomness exceeds the
793+
// gradient signal on a 100-iter memorization task and stalls
794+
// loss at the BCE-ln(2) baseline.
791795
yield return new DenseLayer<T>(64, new ReLUActivation<T>() as IActivationFunction<T>);
792796
yield return new LayerNormalizationLayer<T>();
793-
yield return new DropoutLayer<T>(0.3f);
794797

795798
yield return new DenseLayer<T>(32, new ReLUActivation<T>() as IActivationFunction<T>);
796799
yield return new LayerNormalizationLayer<T>();
797-
yield return new DropoutLayer<T>(0.2f);
798800

799801
// Output layer
800802
yield return new DenseLayer<T>(architecture.OutputSize, new SigmoidActivation<T>() as IActivationFunction<T>);
@@ -879,13 +881,30 @@ public static IEnumerable<ILayer<T>> CreateDefaultOccupancyLayers(
879881
// memorization-style training. LayerNorm normalizes across the
880882
// feature axis within each sample and is the modern default for
881883
// small dense MLPs.
884+
//
885+
// Dropout removed (#1304 cluster-6 follow-up): the prior layout
886+
// applied Dropout(0.3) + Dropout(0.2) on a tiny 3 → 64 → 32 → 16
887+
// → 1 MLP (~2k params). On a memorization task that trains the
888+
// same (x, target) pair for 100 iterations, every forward sees a
889+
// DIFFERENT random sub-network (roughly 56% of hidden units
890+
// active = 0.7 × 0.8) so the optimizer can never learn the pair
891+
// — Dropout's per-step mask injects more variance than the
892+
// gradient can subtract over 100 steps, leaving loss flat or
893+
// slightly RISING at the BCE-ln(2) baseline. PR #1329 fixed the
894+
// BN-at-batch-1 layer of this stack but the Dropout layer's
895+
// memorization-blocking effect was left. At this network size
896+
// Dropout adds no useful regularization (the model has fewer
897+
// params than typical sensor batches have rows); callers who
898+
// genuinely need regularization on a larger Occupancy MLP can
899+
// pass an explicit architecture with their preferred Dropout
900+
// rate. Closes the LossStrictlyDecreasesOnMemorizationTask
901+
// signal that's been red on OccupancyNeuralNetworkTests since
902+
// the cluster-6 sweep.
882903
yield return new DenseLayer<T>(64, new ReLUActivation<T>() as IActivationFunction<T>);
883904
yield return new LayerNormalizationLayer<T>();
884-
yield return new DropoutLayer<T>(0.3f);
885905

886906
yield return new DenseLayer<T>(32, new ReLUActivation<T>() as IActivationFunction<T>);
887907
yield return new LayerNormalizationLayer<T>();
888-
yield return new DropoutLayer<T>(0.2f);
889908

890909
yield return new DenseLayer<T>(16, new ReLUActivation<T>() as IActivationFunction<T>);
891910

0 commit comments

Comments
 (0)