fix(#1309): cluster-1 DCGAN — restore deferred-shape guard + lazy-conv deserialize fallback (#1389)

ooples · franklinic · claude · web-flow · commit ce00cfdd5fb0 · 2026-05-19T22:31:32.000-04:00
* fix(#1309): cluster-1 DCGAN — restore deferred-shape guard + lazy-conv deserialize fallback PR #1290 CI Cluster 1: 25 of 25 DCGANTests failing post-master with one of two errors: 1. Most (23 tests): "Invalid layer configuration: The last layer's output shape [3, -1, -1] must match the architecture output size (12288)." 2. Clone tests (2): "Input spatial dims after padding (1+2*1, 1+2*1) must be >= kernelSize (4)" raised inside DeserializationHelper's pre-resolve of the discriminator's first conv layer. Plus 1 SparseNN test (intermittent mode-collapse) that re-runs pass without code change — flaky, not a regression target. ## Root causes (1) NeuralNetworkBase.IsLastLayerShapeCompatible: PR #1329 (commit 969977d) added a `outputShape.Any(d => d < 0)` early-return so the validator defers the flat-OutputSize check when any output-shape dim is deferred — DCGAN's last transposed-conv emits [3, -1, -1] until its first Forward resolves H/W. That guard was inadvertently deleted by the grafprint PR (c8cac23, May 16) one day later. Restoring it unblocks all 23 validator-rejection cases at once. (2) DeserializationHelper conv path: when the saved layer record's inputShape carries -1 sentinels (a lazy conv layer serialized before its first Forward — DCGAN's discriminator on a Predict-only probe sees only the generator), the pre-existing code coerced all -1 dims to 1 and called conv.ResolveShapesOnly(...). For DCGAN's first conv (kernel=4, padding=1) this fails OnFirstForward's kernel-size check (1 + 2 < 4). Coercing to Math.Max(1, KernelSize) fixes that specific check, but locks InputDepth at 1 — then the real Forward with the [3, 64, 64] RGB image throws "Expected input depth 1, but got 3". The correct fix is to skip pre-resolve entirely when InputDepth is deferred — ConvolutionalLayer.SetParameters has its own auto-resolve fallback at line ~1598 that derives InputDepth from the saved parameter vector's length, and uses KernelSize as the spatial placeholder. Pre-resolve still runs (and uses Math.Max(1, KernelSize) for any deferred spatial dim) when InputDepth is concrete — that's the original PR #1329 contract for the auto-resolve-disambiguation case. ## Verification $ dotnet test --filter "FullyQualifiedName~DCGANTests|FullyQualifiedName~SparseNeuralNetworkTests" --framework net10.0 Failed! - Failed: 2, Passed: 44, Skipped: 0, Total: 46 26 → 2 failures. The remaining two are NOT cluster-1 shape-contract issues: - DCGANTests.MoreData_ShouldNotDegrade — `Test execution timed out after 120000 milliseconds`. Pre-existing GAN training-path perf gap; the deep deconv+conv chain in tape mode is ~5-10× slower than PyTorch CPU baseline. Substep profile (Release): Generator.Predict 19 ms, Discriminator.Train 187 ms, Generator adversarial 313 ms — 519 ms/step × 250 iters = 130 s vs 120 s timeout. Filed separately so this PR ships the actual cluster-1 root causes (validator + conv-deserialize) without bundling a multi-week perf project. - SparseNeuralNetworkTests.DifferentInputs_AfterTraining_ShouldProduceDifferentOutputs — intermittent mode-collapse, passes on re-runs. Separate flaky-test issue, not a shape-contract bug. Closes #1309 partially (cluster-1 shape-contract root causes). The MoreData_ShouldNotDegrade timeout + SparseNN mode-collapse flakiness are tracked separately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(PR #1389 review): document zero-dim wildcard semantics + reject malformed Conv inputShape rank * fix(PR #1389 follow-up): widen rank check to reject rank-1/2 Conv inputShape too * perf(#1390): eliminate duplicate generator forward in GAN.Train — closes DCGAN MoreData timeout Previously GenerativeAdversarialNetwork.Train ran the generator forward TWICE per training step: 1. Generator.Predict(input) (eval mode, NoGradScope) → detached fake images for the combined real+fake discriminator step. 2. ForwardForTraining(input) (train mode, on tape) inside TrainWithCustomLoss — duplicate of the same forward, just for the gen-adversarial backward. On the DCGAN MoreData fixture (250 iters, double-precision, batch=2, 64×64 RGB) this duplicate forward contributed ~19 ms of the 519 ms / step profiled in #1390 — pushing the test 10 s over its 120 s budget. Refactor: - Open a single GradientTape at the start of the step. - Run ForwardForTraining(input) ONCE on that tape → fakeTapeTracked. - Take a value-copy detached snapshot (fakeImages) for the disc step; fresh Tensor<T> with no GradNode chain so disc.Train (which opens its own nested tape) can not leak gradients back into the generator. - Walk the discriminator layer-by-layer on the existing gen tape for the adversarial loss (unchanged from the prior closure semantics). - Drive the gen optimizer step via the new NeuralNetworkBase.BackwardAndStepOnPrecomputedLoss helper, which reuses the open tape instead of TrainWithCustomLoss opening a fresh one + re-running ForwardForTraining. Behavior note: the disc step now sees train-mode generator output (batch BN stats) instead of eval-mode (running BN stats). This matches PyTorch's standard DCGAN training pattern (fake = G(z); fake_detached = fake.detach()) and the existing gen step's own train-mode forward. DCGAN has no Dropout, so the only distribution shift is BN stats, which is the conventional adversarial behavior. Verified locally with the canonical Tensors 0.81.3 dependency: - DCGANTests.MoreData_ShouldNotDegrade: 1 m 47 s (was timing out at > 120 s) — closes the test's perf gap. - Full DCGANTests class: 25 / 25 passing. - ConditionalGANTests + InfoGANTests (other GAN.Train consumers): 50 / 50 passing. - Full SparseNeuralNetworkTests: 21 / 21 passing (previously "intermittent mode-collapse" in PR #1389 description — appears stable now, may have been transient). Closes #1390. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(pr1389-review): narrow visibility + reentrancy + extra trainables addresses three coderabbit comments on backwardandsteponprecomputed loss in pr #1389: 1. visibility narrowed public -> internal. the codebase contract is "users should only interact with aimodelbuilder / aimodelresult" and this helper is training plumbing for in-assembly callers (currently generativeadversarialnetwork.train); no reason for it to live on the public surface. only caller is in same assembly. 2. added using var __reentrancyguard = acquiretrainsentinel() at the top, mirroring trainwithtape's sentinel discipline. without it, concurrent callers on the same model race on lastloss + optimizer internal state. 3. trainableparams now concats getextratrainabletensors() with the layer params, matching trainwithtape's parameter set. without this models that expose raw tensors via getextratrainabletensors (rather than layer-resident params) silently skipped updates on the precomputed-loss path -- divergent semantics between the two training entry points. build passes. --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/src/Helpers/DeserializationHelper.cs b/src/Helpers/DeserializationHelper.cs
@@ -833,22 +833,96 @@ public static ILayer<T> CreateLayerFromType<T>(string layerType, int[] inputShap
             // "Expected N parameters, but got M".
             // Saved inputShape format: [batch, channels, height, width] (NCHW); some
             // legacy paths serialize without the batch dim, so accept rank 3 too.
-            if (instance is ConvolutionalLayer<T> conv && inputShape != null && inputShape.Length >= 3)
-            {
-                int inDepth, inH, inW;
-                if (inputShape.Length == 4)
+            // Note: the rank-validation switch below now handles ALL ranks (not
+            // just >= 3) — rank-1 and rank-2 payloads fall through to the
+            // default branch and throw, instead of silently bypassing the
+            // ConvolutionalLayer pre-resolve when inputShape.Length < 3 left
+            // the layer in its lazy state (PR #1389 review C8oz1 — gate moved
+            // from inputShape.Length >= 3 to inputShape != null so malformed
+            // ranks fail fast). The previous `>= 3` guard was a relic from
+            // when the switch only had the rank-3/rank-4 cases and rank-1/2
+            // would have crashed on `inputShape[2]` — now they're explicitly
+            // rejected with a clear error.
+            if (instance is ConvolutionalLayer<T> conv && inputShape != null)
+            {
+                // Saved-record axes: rank-4 = [batch, channels, H, W],
+                // rank-3 = [channels, H, W] (legacy unbatched). Any other
+                // rank is malformed and must fail fast — silently
+                // reinterpreting a rank-5 or rank-6 payload's leading
+                // axes as the legacy [C, H, W] layout would deserialize
+                // ConvolutionalLayer with the wrong channels/InputDepth
+                // and produce a Clone that disagrees with the original
+                // model's contract several layers downstream.
+                int savedInDepth, savedInH, savedInW;
+                switch (inputShape.Length)
                 {
-                    inDepth = inputShape[1] > 0 ? inputShape[1] : 1;
-                    inH = inputShape[2] > 0 ? inputShape[2] : 1;
-                    inW = inputShape[3] > 0 ? inputShape[3] : 1;
+                    case 4:
+                        savedInDepth = inputShape[1];
+                        savedInH = inputShape[2];
+                        savedInW = inputShape[3];
+                        break;
+                    case 3:
+                        savedInDepth = inputShape[0];
+                        savedInH = inputShape[1];
+                        savedInW = inputShape[2];
+                        break;
+                    default:
+                        throw new InvalidOperationException(
+                            $"ConvolutionalLayer deserialize: saved inputShape rank must be 3 ([C, H, W]) " +
+                            $"or 4 ([N, C, H, W]); got rank {inputShape.Length} ([{string.Join(", ", inputShape)}]). " +
+                            "This usually indicates a corrupted layer record or a forward-incompatible " +
+                            "newer-format payload — abort deserialize rather than silently misinterpret " +
+                            "the trailing axes.");
                 }
-                else
+
+                // Three branches based on what the saved inputShape resolved
+                // by serialize time:
+                //
+                // (a) InputDepth concrete: pre-resolve so SetParameters sees
+                //     the correct InputDepth and the kernel/bias counts match
+                //     the saved parameter vector exactly. Without this the
+                //     auto-resolve heuristic in ConvolutionalLayer.SetParameters
+                //     can pick a different InputDepth than the original
+                //     (especially when outputDepth × kernelSize² happens to
+                //     factor the saved parameter count more than one way),
+                //     and Clone()/DeepCopy() throw "Expected N parameters,
+                //     but got M". Spatial dims that were never forwarded
+                //     fall back to Math.Max(1, kernelSize) so
+                //     ConvolutionalLayer.OnFirstForward's kernel-size
+                //     constraint (inH + 2*Padding >= KernelSize) passes —
+                //     DCGAN's discriminator (kernel=4, padding=1, needs
+                //     inH >= 2) is the canary. The stored OutputShape after
+                //     this resolve is a placeholder; the first real Forward
+                //     call recomputes the actual output tensor dimensions
+                //     from the real input.
+                //
+                // (b) InputDepth deferred (saved as -1 because the layer
+                //     was serialized before its first Forward): skip the
+                //     pre-resolve entirely. ConvolutionalLayer.SetParameters
+                //     has its own auto-resolve fallback (~ line 1598) that
+                //     derives InputDepth from the saved parameter vector's
+                //     length — (length - OutputDepth) / (OutputDepth *
+                //     KernelSize²) — and that fallback uses KernelSize as
+                //     the spatial placeholder. Pre-resolving with a
+                //     placeholder InputDepth=1 would have locked
+                //     InputDepth=1 into the layer's state, then Forward
+                //     with the real RGB-3 input would throw
+                //     "Expected input depth 1, but got 3" before the lazy
+                //     resolve had a chance to fire. This is the failure
+                //     mode that surfaced on DCGAN clones where the
+                //     discriminator's layers had never seen the
+                //     [3, 64, 64] image input at clone time (the test's
+                //     pre-clone Predict only runs the generator).
+                //
+                // (c) inputShape supplied but malformed (rank < 3 case
+                //     handled by the outer guard).
+                if (savedInDepth > 0)
                 {
-                    inDepth = inputShape[0] > 0 ? inputShape[0] : 1;
-                    inH = inputShape[1] > 0 ? inputShape[1] : 1;
-                    inW = inputShape[2] > 0 ? inputShape[2] : 1;
+                    int spatialFallback = Math.Max(1, kernelSize);
+                    int inH = savedInH > 0 ? savedInH : spatialFallback;
+                    int inW = savedInW > 0 ? savedInW : spatialFallback;
+                    conv.ResolveShapesOnly(new[] { savedInDepth, inH, inW });
                 }
-                conv.ResolveShapesOnly(new[] { inDepth, inH, inW });
             }
         }
         else if (genericDef == typeof(Conv3DLayer<>))
diff --git a/src/NeuralNetworks/GenerativeAdversarialNetwork.cs b/src/NeuralNetworks/GenerativeAdversarialNetwork.cs
@@ -1,6 +1,7 @@
 ﻿using AiDotNet.Attributes;
 using AiDotNet.Enums;
 using AiDotNet.NeuralNetworks.Options;
+using AiDotNet.Tensors.Engines.Autodiff;
 
 namespace AiDotNet.NeuralNetworks;
 
@@ -901,11 +902,38 @@ public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
 
         // ------------ Train Discriminator ------------
 
-        // Generate fake images (detached from the generator's gradient path —
-        // Generator.Predict wraps in NoGradScope). The disc step below trains
-        // against these detached fakes; the separate generator-step backward
-        // re-runs the forward with tape to update generator weights.
-        var fakeImages = Generator.Predict(input);
+        // Issue #1390 perf fix: run the generator forward ONCE per Train()
+        // call instead of twice. Previously the disc step received a
+        // Generator.Predict(input) output (eval mode, NoGradScope) and the
+        // gen step ran ForwardForTraining(input) inside TrainWithCustomLoss
+        // — two full generator forwards per iteration (~19 ms wasted on the
+        // DCGAN MoreData fixture per the issue's substep profile).
+        //
+        // New flow:
+        //   1. Open the generator's gradient tape here.
+        //   2. Run ForwardForTraining ONCE on the tape -> fakeTapeTracked.
+        //   3. Take a value-copy detached snapshot for the disc step.
+        //   4. Disc step opens its own nested tape (independent of gen tape
+        //      because ThreadStatic _current save/restore in GradientTape).
+        //   5. Gen step reuses fakeTapeTracked (still attached to the open
+        //      gen tape) and calls BackwardAndStepOnPrecomputedLoss to drive
+        //      the optimizer without a second ForwardForTraining call.
+        //
+        // Behavior note: the disc now trains on train-mode generator output
+        // (uses batch BN stats, would apply Dropout if any) rather than the
+        // eval-mode Predict output. This matches the PyTorch DCGAN tutorial
+        // convention (`fake = G(z); fake_detached = fake.detach()`). DCGAN
+        // architecture has no Dropout; the BN distribution shift is the
+        // standard adversarial behavior, not a regression.
+        using var genTape = new GradientTape<T>();
+        var fakeTapeTracked = ((NeuralNetworkBase<T>)Generator).ForwardForTraining(input);
+
+        // Detached value-copy for the discriminator step. A fresh Tensor<T>
+        // with no GradNode chain — ops touching it record no parent link to
+        // the gen tape, so the disc step can't leak gradients into the
+        // generator's parameters.
+        var fakeImages = new Tensor<T>(fakeTapeTracked.Shape.ToArray());
+        fakeTapeTracked.AsSpan().CopyTo(fakeImages.AsWritableSpan());
 
         // Cache real / fake batches only when an auxiliary loss path needs
         // them. Previously this clone ran unconditionally for UseFeatureMatching;
@@ -1021,35 +1049,38 @@ public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
         T generatorLoss;
         try
         {
-            generatorLoss = trainableGen.TrainWithCustomLoss(input, genOutput =>
-            {
-                // genOutput = generated fake images from ForwardForTraining (tape-tracked).
-                // Run the discriminator layer-by-layer so each Forward call records
-                // on the same active GradientTape that the generator's
-                // TrainWithCustomLoss opened. This keeps the discriminator weights
-                // fixed (we only collect generator gradients) while letting the
-                // adversarial signal propagate through every BN / Conv / activation
-                // back to genOutput → Generator's weights.
-                var discScore = genOutput;
-                foreach (var layer in Discriminator.Layers)
-                    discScore = layer.Forward(discScore);
-                // Generator loss: non-saturating BCE-with-logits per Goodfellow
-                // 2014 §3, matching the BCE-with-logits criterion that this base
-                // class wires into the Discriminator (lines 405-419 above —
-                // GetDefaultLossFunction(BinaryClassification) = BCE-with-logits).
-                // The previous LSGAN-style MSE((discScore − 1)²) here was a
-                // different objective (Mao 2017 LSGAN) and silently changed
-                // training semantics for every derived GAN.
-                //
-                // -log σ(discScore) is the per-sample non-saturating generator
-                // term. Implemented via the numerically-stable LogSigmoid identity
-                // -log σ(x) = softplus(-x), where softplus is the tape-tracked
-                // Engine.Softplus op. ReduceMean over all axes gives the scalar
-                // loss the tape requires.
-                var allAxes = Enumerable.Range(0, discScore.Shape.Length).ToArray();
-                var negLogSigmoid = Engine.Softplus(Engine.TensorNegate(discScore));
-                return Engine.ReduceMean(negLogSigmoid, allAxes, keepDims: false);
-            });
+            // Issue #1390: reuse the tape-tracked generator output from the
+            // step start (line ~929) instead of re-running ForwardForTraining
+            // inside TrainWithCustomLoss. The gen tape opened earlier
+            // (genTape) is still active; the disc-layer Forward calls below
+            // continue to record on it, then BackwardAndStepOnPrecomputedLoss
+            // drives gradient compute + optimizer step on the shared tape.
+            //
+            // Walk the discriminator layer-by-layer in eval mode (but WITHOUT
+            // NoGradScope) so each Forward call records on genTape. This
+            // keeps disc weights fixed (only gen params are passed to
+            // ComputeGradients inside BackwardAndStepOnPrecomputedLoss) while
+            // letting the adversarial signal propagate through every BN /
+            // Conv / activation back to fakeTapeTracked → Generator's weights.
+            var discScore = fakeTapeTracked;
+            foreach (var layer in Discriminator.Layers)
+                discScore = layer.Forward(discScore);
+
+            // Generator loss: non-saturating BCE-with-logits per Goodfellow
+            // 2014 §3, matching the BCE-with-logits criterion that this base
+            // class wires into the Discriminator (GetDefaultLossFunction(
+            // BinaryClassification) = BCE-with-logits).
+            //
+            // -log σ(discScore) is the per-sample non-saturating generator
+            // term. Implemented via the numerically-stable LogSigmoid identity
+            // -log σ(x) = softplus(-x), where softplus is the tape-tracked
+            // Engine.Softplus op. ReduceMean over all axes gives the scalar
+            // loss the tape requires.
+            var allAxes = Enumerable.Range(0, discScore.Shape.Length).ToArray();
+            var negLogSigmoid = Engine.Softplus(Engine.TensorNegate(discScore));
+            var lossTensor = Engine.ReduceMean(negLogSigmoid, allAxes, keepDims: false);
+
+            generatorLoss = trainableGen.BackwardAndStepOnPrecomputedLoss(genTape, lossTensor);
         }
         finally
         {
diff --git a/src/NeuralNetworks/NeuralNetworkBase.cs b/src/NeuralNetworks/NeuralNetworkBase.cs