Skip to content

Commit ce00cfd

Browse files
ooplesfranklinicclaude
authored
fix(#1309): cluster-1 DCGAN — restore deferred-shape guard + lazy-conv deserialize fallback (#1389)
* fix(#1309): cluster-1 DCGAN — restore deferred-shape guard + lazy-conv deserialize fallback PR #1290 CI Cluster 1: 25 of 25 DCGANTests failing post-master with one of two errors: 1. Most (23 tests): "Invalid layer configuration: The last layer's output shape [3, -1, -1] must match the architecture output size (12288)." 2. Clone tests (2): "Input spatial dims after padding (1+2*1, 1+2*1) must be >= kernelSize (4)" raised inside DeserializationHelper's pre-resolve of the discriminator's first conv layer. Plus 1 SparseNN test (intermittent mode-collapse) that re-runs pass without code change — flaky, not a regression target. ## Root causes (1) NeuralNetworkBase.IsLastLayerShapeCompatible: PR #1329 (commit 969977d) added a `outputShape.Any(d => d < 0)` early-return so the validator defers the flat-OutputSize check when any output-shape dim is deferred — DCGAN's last transposed-conv emits [3, -1, -1] until its first Forward resolves H/W. That guard was inadvertently deleted by the grafprint PR (c8cac23, May 16) one day later. Restoring it unblocks all 23 validator-rejection cases at once. (2) DeserializationHelper conv path: when the saved layer record's inputShape carries -1 sentinels (a lazy conv layer serialized before its first Forward — DCGAN's discriminator on a Predict-only probe sees only the generator), the pre-existing code coerced all -1 dims to 1 and called conv.ResolveShapesOnly(...). For DCGAN's first conv (kernel=4, padding=1) this fails OnFirstForward's kernel-size check (1 + 2 < 4). Coercing to Math.Max(1, KernelSize) fixes that specific check, but locks InputDepth at 1 — then the real Forward with the [3, 64, 64] RGB image throws "Expected input depth 1, but got 3". The correct fix is to skip pre-resolve entirely when InputDepth is deferred — ConvolutionalLayer.SetParameters has its own auto-resolve fallback at line ~1598 that derives InputDepth from the saved parameter vector's length, and uses KernelSize as the spatial placeholder. Pre-resolve still runs (and uses Math.Max(1, KernelSize) for any deferred spatial dim) when InputDepth is concrete — that's the original PR #1329 contract for the auto-resolve-disambiguation case. ## Verification $ dotnet test --filter "FullyQualifiedName~DCGANTests|FullyQualifiedName~SparseNeuralNetworkTests" --framework net10.0 Failed! - Failed: 2, Passed: 44, Skipped: 0, Total: 46 26 → 2 failures. The remaining two are NOT cluster-1 shape-contract issues: - DCGANTests.MoreData_ShouldNotDegrade — `Test execution timed out after 120000 milliseconds`. Pre-existing GAN training-path perf gap; the deep deconv+conv chain in tape mode is ~5-10× slower than PyTorch CPU baseline. Substep profile (Release): Generator.Predict 19 ms, Discriminator.Train 187 ms, Generator adversarial 313 ms — 519 ms/step × 250 iters = 130 s vs 120 s timeout. Filed separately so this PR ships the actual cluster-1 root causes (validator + conv-deserialize) without bundling a multi-week perf project. - SparseNeuralNetworkTests.DifferentInputs_AfterTraining_ShouldProduceDifferentOutputs — intermittent mode-collapse, passes on re-runs. Separate flaky-test issue, not a shape-contract bug. Closes #1309 partially (cluster-1 shape-contract root causes). The MoreData_ShouldNotDegrade timeout + SparseNN mode-collapse flakiness are tracked separately. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(PR #1389 review): document zero-dim wildcard semantics + reject malformed Conv inputShape rank * fix(PR #1389 follow-up): widen rank check to reject rank-1/2 Conv inputShape too * perf(#1390): eliminate duplicate generator forward in GAN.Train — closes DCGAN MoreData timeout Previously GenerativeAdversarialNetwork.Train ran the generator forward TWICE per training step: 1. Generator.Predict(input) (eval mode, NoGradScope) → detached fake images for the combined real+fake discriminator step. 2. ForwardForTraining(input) (train mode, on tape) inside TrainWithCustomLoss — duplicate of the same forward, just for the gen-adversarial backward. On the DCGAN MoreData fixture (250 iters, double-precision, batch=2, 64×64 RGB) this duplicate forward contributed ~19 ms of the 519 ms / step profiled in #1390 — pushing the test 10 s over its 120 s budget. Refactor: - Open a single GradientTape at the start of the step. - Run ForwardForTraining(input) ONCE on that tape → fakeTapeTracked. - Take a value-copy detached snapshot (fakeImages) for the disc step; fresh Tensor<T> with no GradNode chain so disc.Train (which opens its own nested tape) can not leak gradients back into the generator. - Walk the discriminator layer-by-layer on the existing gen tape for the adversarial loss (unchanged from the prior closure semantics). - Drive the gen optimizer step via the new NeuralNetworkBase.BackwardAndStepOnPrecomputedLoss helper, which reuses the open tape instead of TrainWithCustomLoss opening a fresh one + re-running ForwardForTraining. Behavior note: the disc step now sees train-mode generator output (batch BN stats) instead of eval-mode (running BN stats). This matches PyTorch's standard DCGAN training pattern (fake = G(z); fake_detached = fake.detach()) and the existing gen step's own train-mode forward. DCGAN has no Dropout, so the only distribution shift is BN stats, which is the conventional adversarial behavior. Verified locally with the canonical Tensors 0.81.3 dependency: - DCGANTests.MoreData_ShouldNotDegrade: 1 m 47 s (was timing out at > 120 s) — closes the test's perf gap. - Full DCGANTests class: 25 / 25 passing. - ConditionalGANTests + InfoGANTests (other GAN.Train consumers): 50 / 50 passing. - Full SparseNeuralNetworkTests: 21 / 21 passing (previously "intermittent mode-collapse" in PR #1389 description — appears stable now, may have been transient). Closes #1390. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(pr1389-review): narrow visibility + reentrancy + extra trainables addresses three coderabbit comments on backwardandsteponprecomputed loss in pr #1389: 1. visibility narrowed public -> internal. the codebase contract is "users should only interact with aimodelbuilder / aimodelresult" and this helper is training plumbing for in-assembly callers (currently generativeadversarialnetwork.train); no reason for it to live on the public surface. only caller is in same assembly. 2. added using var __reentrancyguard = acquiretrainsentinel() at the top, mirroring trainwithtape's sentinel discipline. without it, concurrent callers on the same model race on lastloss + optimizer internal state. 3. trainableparams now concats getextratrainabletensors() with the layer params, matching trainwithtape's parameter set. without this models that expose raw tensors via getextratrainabletensors (rather than layer-resident params) silently skipped updates on the precomputed-loss path -- divergent semantics between the two training entry points. build passes. --------- Co-authored-by: franklinic <franklin@ivorycloud.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 18a3d6a commit ce00cfd

3 files changed

Lines changed: 262 additions & 46 deletions

File tree

src/Helpers/DeserializationHelper.cs

Lines changed: 86 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -833,22 +833,96 @@ public static ILayer<T> CreateLayerFromType<T>(string layerType, int[] inputShap
833833
// "Expected N parameters, but got M".
834834
// Saved inputShape format: [batch, channels, height, width] (NCHW); some
835835
// legacy paths serialize without the batch dim, so accept rank 3 too.
836-
if (instance is ConvolutionalLayer<T> conv && inputShape != null && inputShape.Length >= 3)
837-
{
838-
int inDepth, inH, inW;
839-
if (inputShape.Length == 4)
836+
// Note: the rank-validation switch below now handles ALL ranks (not
837+
// just >= 3) — rank-1 and rank-2 payloads fall through to the
838+
// default branch and throw, instead of silently bypassing the
839+
// ConvolutionalLayer pre-resolve when inputShape.Length < 3 left
840+
// the layer in its lazy state (PR #1389 review C8oz1 — gate moved
841+
// from inputShape.Length >= 3 to inputShape != null so malformed
842+
// ranks fail fast). The previous `>= 3` guard was a relic from
843+
// when the switch only had the rank-3/rank-4 cases and rank-1/2
844+
// would have crashed on `inputShape[2]` — now they're explicitly
845+
// rejected with a clear error.
846+
if (instance is ConvolutionalLayer<T> conv && inputShape != null)
847+
{
848+
// Saved-record axes: rank-4 = [batch, channels, H, W],
849+
// rank-3 = [channels, H, W] (legacy unbatched). Any other
850+
// rank is malformed and must fail fast — silently
851+
// reinterpreting a rank-5 or rank-6 payload's leading
852+
// axes as the legacy [C, H, W] layout would deserialize
853+
// ConvolutionalLayer with the wrong channels/InputDepth
854+
// and produce a Clone that disagrees with the original
855+
// model's contract several layers downstream.
856+
int savedInDepth, savedInH, savedInW;
857+
switch (inputShape.Length)
840858
{
841-
inDepth = inputShape[1] > 0 ? inputShape[1] : 1;
842-
inH = inputShape[2] > 0 ? inputShape[2] : 1;
843-
inW = inputShape[3] > 0 ? inputShape[3] : 1;
859+
case 4:
860+
savedInDepth = inputShape[1];
861+
savedInH = inputShape[2];
862+
savedInW = inputShape[3];
863+
break;
864+
case 3:
865+
savedInDepth = inputShape[0];
866+
savedInH = inputShape[1];
867+
savedInW = inputShape[2];
868+
break;
869+
default:
870+
throw new InvalidOperationException(
871+
$"ConvolutionalLayer deserialize: saved inputShape rank must be 3 ([C, H, W]) " +
872+
$"or 4 ([N, C, H, W]); got rank {inputShape.Length} ([{string.Join(", ", inputShape)}]). " +
873+
"This usually indicates a corrupted layer record or a forward-incompatible " +
874+
"newer-format payload — abort deserialize rather than silently misinterpret " +
875+
"the trailing axes.");
844876
}
845-
else
877+
878+
// Three branches based on what the saved inputShape resolved
879+
// by serialize time:
880+
//
881+
// (a) InputDepth concrete: pre-resolve so SetParameters sees
882+
// the correct InputDepth and the kernel/bias counts match
883+
// the saved parameter vector exactly. Without this the
884+
// auto-resolve heuristic in ConvolutionalLayer.SetParameters
885+
// can pick a different InputDepth than the original
886+
// (especially when outputDepth × kernelSize² happens to
887+
// factor the saved parameter count more than one way),
888+
// and Clone()/DeepCopy() throw "Expected N parameters,
889+
// but got M". Spatial dims that were never forwarded
890+
// fall back to Math.Max(1, kernelSize) so
891+
// ConvolutionalLayer.OnFirstForward's kernel-size
892+
// constraint (inH + 2*Padding >= KernelSize) passes —
893+
// DCGAN's discriminator (kernel=4, padding=1, needs
894+
// inH >= 2) is the canary. The stored OutputShape after
895+
// this resolve is a placeholder; the first real Forward
896+
// call recomputes the actual output tensor dimensions
897+
// from the real input.
898+
//
899+
// (b) InputDepth deferred (saved as -1 because the layer
900+
// was serialized before its first Forward): skip the
901+
// pre-resolve entirely. ConvolutionalLayer.SetParameters
902+
// has its own auto-resolve fallback (~ line 1598) that
903+
// derives InputDepth from the saved parameter vector's
904+
// length — (length - OutputDepth) / (OutputDepth *
905+
// KernelSize²) — and that fallback uses KernelSize as
906+
// the spatial placeholder. Pre-resolving with a
907+
// placeholder InputDepth=1 would have locked
908+
// InputDepth=1 into the layer's state, then Forward
909+
// with the real RGB-3 input would throw
910+
// "Expected input depth 1, but got 3" before the lazy
911+
// resolve had a chance to fire. This is the failure
912+
// mode that surfaced on DCGAN clones where the
913+
// discriminator's layers had never seen the
914+
// [3, 64, 64] image input at clone time (the test's
915+
// pre-clone Predict only runs the generator).
916+
//
917+
// (c) inputShape supplied but malformed (rank < 3 case
918+
// handled by the outer guard).
919+
if (savedInDepth > 0)
846920
{
847-
inDepth = inputShape[0] > 0 ? inputShape[0] : 1;
848-
inH = inputShape[1] > 0 ? inputShape[1] : 1;
849-
inW = inputShape[2] > 0 ? inputShape[2] : 1;
921+
int spatialFallback = Math.Max(1, kernelSize);
922+
int inH = savedInH > 0 ? savedInH : spatialFallback;
923+
int inW = savedInW > 0 ? savedInW : spatialFallback;
924+
conv.ResolveShapesOnly(new[] { savedInDepth, inH, inW });
850925
}
851-
conv.ResolveShapesOnly(new[] { inDepth, inH, inW });
852926
}
853927
}
854928
else if (genericDef == typeof(Conv3DLayer<>))

src/NeuralNetworks/GenerativeAdversarialNetwork.cs

Lines changed: 65 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
using AiDotNet.Attributes;
22
using AiDotNet.Enums;
33
using AiDotNet.NeuralNetworks.Options;
4+
using AiDotNet.Tensors.Engines.Autodiff;
45

56
namespace AiDotNet.NeuralNetworks;
67

@@ -901,11 +902,38 @@ public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
901902

902903
// ------------ Train Discriminator ------------
903904

904-
// Generate fake images (detached from the generator's gradient path —
905-
// Generator.Predict wraps in NoGradScope). The disc step below trains
906-
// against these detached fakes; the separate generator-step backward
907-
// re-runs the forward with tape to update generator weights.
908-
var fakeImages = Generator.Predict(input);
905+
// Issue #1390 perf fix: run the generator forward ONCE per Train()
906+
// call instead of twice. Previously the disc step received a
907+
// Generator.Predict(input) output (eval mode, NoGradScope) and the
908+
// gen step ran ForwardForTraining(input) inside TrainWithCustomLoss
909+
// — two full generator forwards per iteration (~19 ms wasted on the
910+
// DCGAN MoreData fixture per the issue's substep profile).
911+
//
912+
// New flow:
913+
// 1. Open the generator's gradient tape here.
914+
// 2. Run ForwardForTraining ONCE on the tape -> fakeTapeTracked.
915+
// 3. Take a value-copy detached snapshot for the disc step.
916+
// 4. Disc step opens its own nested tape (independent of gen tape
917+
// because ThreadStatic _current save/restore in GradientTape).
918+
// 5. Gen step reuses fakeTapeTracked (still attached to the open
919+
// gen tape) and calls BackwardAndStepOnPrecomputedLoss to drive
920+
// the optimizer without a second ForwardForTraining call.
921+
//
922+
// Behavior note: the disc now trains on train-mode generator output
923+
// (uses batch BN stats, would apply Dropout if any) rather than the
924+
// eval-mode Predict output. This matches the PyTorch DCGAN tutorial
925+
// convention (`fake = G(z); fake_detached = fake.detach()`). DCGAN
926+
// architecture has no Dropout; the BN distribution shift is the
927+
// standard adversarial behavior, not a regression.
928+
using var genTape = new GradientTape<T>();
929+
var fakeTapeTracked = ((NeuralNetworkBase<T>)Generator).ForwardForTraining(input);
930+
931+
// Detached value-copy for the discriminator step. A fresh Tensor<T>
932+
// with no GradNode chain — ops touching it record no parent link to
933+
// the gen tape, so the disc step can't leak gradients into the
934+
// generator's parameters.
935+
var fakeImages = new Tensor<T>(fakeTapeTracked.Shape.ToArray());
936+
fakeTapeTracked.AsSpan().CopyTo(fakeImages.AsWritableSpan());
909937

910938
// Cache real / fake batches only when an auxiliary loss path needs
911939
// them. Previously this clone ran unconditionally for UseFeatureMatching;
@@ -1021,35 +1049,38 @@ public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
10211049
T generatorLoss;
10221050
try
10231051
{
1024-
generatorLoss = trainableGen.TrainWithCustomLoss(input, genOutput =>
1025-
{
1026-
// genOutput = generated fake images from ForwardForTraining (tape-tracked).
1027-
// Run the discriminator layer-by-layer so each Forward call records
1028-
// on the same active GradientTape that the generator's
1029-
// TrainWithCustomLoss opened. This keeps the discriminator weights
1030-
// fixed (we only collect generator gradients) while letting the
1031-
// adversarial signal propagate through every BN / Conv / activation
1032-
// back to genOutput → Generator's weights.
1033-
var discScore = genOutput;
1034-
foreach (var layer in Discriminator.Layers)
1035-
discScore = layer.Forward(discScore);
1036-
// Generator loss: non-saturating BCE-with-logits per Goodfellow
1037-
// 2014 §3, matching the BCE-with-logits criterion that this base
1038-
// class wires into the Discriminator (lines 405-419 above —
1039-
// GetDefaultLossFunction(BinaryClassification) = BCE-with-logits).
1040-
// The previous LSGAN-style MSE((discScore − 1)²) here was a
1041-
// different objective (Mao 2017 LSGAN) and silently changed
1042-
// training semantics for every derived GAN.
1043-
//
1044-
// -log σ(discScore) is the per-sample non-saturating generator
1045-
// term. Implemented via the numerically-stable LogSigmoid identity
1046-
// -log σ(x) = softplus(-x), where softplus is the tape-tracked
1047-
// Engine.Softplus op. ReduceMean over all axes gives the scalar
1048-
// loss the tape requires.
1049-
var allAxes = Enumerable.Range(0, discScore.Shape.Length).ToArray();
1050-
var negLogSigmoid = Engine.Softplus(Engine.TensorNegate(discScore));
1051-
return Engine.ReduceMean(negLogSigmoid, allAxes, keepDims: false);
1052-
});
1052+
// Issue #1390: reuse the tape-tracked generator output from the
1053+
// step start (line ~929) instead of re-running ForwardForTraining
1054+
// inside TrainWithCustomLoss. The gen tape opened earlier
1055+
// (genTape) is still active; the disc-layer Forward calls below
1056+
// continue to record on it, then BackwardAndStepOnPrecomputedLoss
1057+
// drives gradient compute + optimizer step on the shared tape.
1058+
//
1059+
// Walk the discriminator layer-by-layer in eval mode (but WITHOUT
1060+
// NoGradScope) so each Forward call records on genTape. This
1061+
// keeps disc weights fixed (only gen params are passed to
1062+
// ComputeGradients inside BackwardAndStepOnPrecomputedLoss) while
1063+
// letting the adversarial signal propagate through every BN /
1064+
// Conv / activation back to fakeTapeTracked → Generator's weights.
1065+
var discScore = fakeTapeTracked;
1066+
foreach (var layer in Discriminator.Layers)
1067+
discScore = layer.Forward(discScore);
1068+
1069+
// Generator loss: non-saturating BCE-with-logits per Goodfellow
1070+
// 2014 §3, matching the BCE-with-logits criterion that this base
1071+
// class wires into the Discriminator (GetDefaultLossFunction(
1072+
// BinaryClassification) = BCE-with-logits).
1073+
//
1074+
// -log σ(discScore) is the per-sample non-saturating generator
1075+
// term. Implemented via the numerically-stable LogSigmoid identity
1076+
// -log σ(x) = softplus(-x), where softplus is the tape-tracked
1077+
// Engine.Softplus op. ReduceMean over all axes gives the scalar
1078+
// loss the tape requires.
1079+
var allAxes = Enumerable.Range(0, discScore.Shape.Length).ToArray();
1080+
var negLogSigmoid = Engine.Softplus(Engine.TensorNegate(discScore));
1081+
var lossTensor = Engine.ReduceMean(negLogSigmoid, allAxes, keepDims: false);
1082+
1083+
generatorLoss = trainableGen.BackwardAndStepOnPrecomputedLoss(genTape, lossTensor);
10531084
}
10541085
finally
10551086
{

0 commit comments

Comments
 (0)