Commit ce00cfd
* fix(#1309): cluster-1 DCGAN — restore deferred-shape guard + lazy-conv deserialize fallback
PR #1290 CI Cluster 1: 25 of 25 DCGANTests failing post-master with one
of two errors:
1. Most (23 tests): "Invalid layer configuration: The last layer's
output shape [3, -1, -1] must match the architecture output size
(12288)."
2. Clone tests (2): "Input spatial dims after padding (1+2*1, 1+2*1)
must be >= kernelSize (4)" raised inside DeserializationHelper's
pre-resolve of the discriminator's first conv layer.
Plus 1 SparseNN test (intermittent mode-collapse) that re-runs pass
without code change — flaky, not a regression target.
## Root causes
(1) NeuralNetworkBase.IsLastLayerShapeCompatible: PR #1329 (commit
969977d) added a `outputShape.Any(d => d < 0)` early-return so the
validator defers the flat-OutputSize check when any output-shape dim
is deferred — DCGAN's last transposed-conv emits [3, -1, -1] until
its first Forward resolves H/W. That guard was inadvertently deleted
by the grafprint PR (c8cac23, May 16) one day later. Restoring it
unblocks all 23 validator-rejection cases at once.
(2) DeserializationHelper conv path: when the saved layer record's
inputShape carries -1 sentinels (a lazy conv layer serialized before
its first Forward — DCGAN's discriminator on a Predict-only probe
sees only the generator), the pre-existing code coerced all -1 dims
to 1 and called conv.ResolveShapesOnly(...). For DCGAN's first conv
(kernel=4, padding=1) this fails OnFirstForward's kernel-size check
(1 + 2 < 4). Coercing to Math.Max(1, KernelSize) fixes that
specific check, but locks InputDepth at 1 — then the real Forward
with the [3, 64, 64] RGB image throws "Expected input depth 1, but
got 3". The correct fix is to skip pre-resolve entirely when
InputDepth is deferred — ConvolutionalLayer.SetParameters has its
own auto-resolve fallback at line ~1598 that derives InputDepth from
the saved parameter vector's length, and uses KernelSize as the
spatial placeholder. Pre-resolve still runs (and uses
Math.Max(1, KernelSize) for any deferred spatial dim) when
InputDepth is concrete — that's the original PR #1329 contract for
the auto-resolve-disambiguation case.
## Verification
$ dotnet test --filter "FullyQualifiedName~DCGANTests|FullyQualifiedName~SparseNeuralNetworkTests" --framework net10.0
Failed! - Failed: 2, Passed: 44, Skipped: 0, Total: 46
26 → 2 failures. The remaining two are NOT cluster-1 shape-contract
issues:
- DCGANTests.MoreData_ShouldNotDegrade — `Test execution timed
out after 120000 milliseconds`. Pre-existing GAN training-path
perf gap; the deep deconv+conv chain in tape mode is ~5-10×
slower than PyTorch CPU baseline. Substep profile (Release):
Generator.Predict 19 ms, Discriminator.Train 187 ms, Generator
adversarial 313 ms — 519 ms/step × 250 iters = 130 s vs 120 s
timeout. Filed separately so this PR ships the actual
cluster-1 root causes (validator + conv-deserialize) without
bundling a multi-week perf project.
- SparseNeuralNetworkTests.DifferentInputs_AfterTraining_ShouldProduceDifferentOutputs
— intermittent mode-collapse, passes on re-runs. Separate
flaky-test issue, not a shape-contract bug.
Closes #1309 partially (cluster-1 shape-contract root causes).
The MoreData_ShouldNotDegrade timeout + SparseNN mode-collapse
flakiness are tracked separately.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(PR #1389 review): document zero-dim wildcard semantics + reject malformed Conv inputShape rank
* fix(PR #1389 follow-up): widen rank check to reject rank-1/2 Conv inputShape too
* perf(#1390): eliminate duplicate generator forward in GAN.Train — closes DCGAN MoreData timeout
Previously GenerativeAdversarialNetwork.Train ran the generator forward TWICE
per training step:
1. Generator.Predict(input) (eval mode, NoGradScope) → detached fake images
for the combined real+fake discriminator step.
2. ForwardForTraining(input) (train mode, on tape) inside
TrainWithCustomLoss — duplicate of the same forward, just for the
gen-adversarial backward.
On the DCGAN MoreData fixture (250 iters, double-precision, batch=2, 64×64
RGB) this duplicate forward contributed ~19 ms of the 519 ms / step
profiled in #1390 — pushing the test 10 s over its 120 s budget.
Refactor:
- Open a single GradientTape at the start of the step.
- Run ForwardForTraining(input) ONCE on that tape → fakeTapeTracked.
- Take a value-copy detached snapshot (fakeImages) for the disc step;
fresh Tensor<T> with no GradNode chain so disc.Train (which opens
its own nested tape) can not leak gradients back into the generator.
- Walk the discriminator layer-by-layer on the existing gen tape for
the adversarial loss (unchanged from the prior closure semantics).
- Drive the gen optimizer step via the new
NeuralNetworkBase.BackwardAndStepOnPrecomputedLoss helper, which
reuses the open tape instead of TrainWithCustomLoss opening a fresh
one + re-running ForwardForTraining.
Behavior note: the disc step now sees train-mode generator output
(batch BN stats) instead of eval-mode (running BN stats). This matches
PyTorch's standard DCGAN training pattern (fake = G(z); fake_detached =
fake.detach()) and the existing gen step's own train-mode forward.
DCGAN has no Dropout, so the only distribution shift is BN stats, which
is the conventional adversarial behavior.
Verified locally with the canonical Tensors 0.81.3 dependency:
- DCGANTests.MoreData_ShouldNotDegrade: 1 m 47 s (was timing out at
> 120 s) — closes the test's perf gap.
- Full DCGANTests class: 25 / 25 passing.
- ConditionalGANTests + InfoGANTests (other GAN.Train consumers):
50 / 50 passing.
- Full SparseNeuralNetworkTests: 21 / 21 passing (previously
"intermittent mode-collapse" in PR #1389 description — appears
stable now, may have been transient).
Closes #1390.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(pr1389-review): narrow visibility + reentrancy + extra trainables
addresses three coderabbit comments on backwardandsteponprecomputed
loss in pr #1389:
1. visibility narrowed public -> internal. the codebase contract is
"users should only interact with aimodelbuilder / aimodelresult"
and this helper is training plumbing for in-assembly callers
(currently generativeadversarialnetwork.train); no reason for it
to live on the public surface. only caller is in same assembly.
2. added using var __reentrancyguard = acquiretrainsentinel() at the
top, mirroring trainwithtape's sentinel discipline. without it,
concurrent callers on the same model race on lastloss + optimizer
internal state.
3. trainableparams now concats getextratrainabletensors() with the
layer params, matching trainwithtape's parameter set. without this
models that expose raw tensors via getextratrainabletensors (rather
than layer-resident params) silently skipped updates on the
precomputed-loss path -- divergent semantics between the two
training entry points.
build passes.
---------
Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 18a3d6a commit ce00cfd
3 files changed
Lines changed: 262 additions & 46 deletions
File tree
- src
- Helpers
- NeuralNetworks
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
833 | 833 | | |
834 | 834 | | |
835 | 835 | | |
836 | | - | |
837 | | - | |
838 | | - | |
839 | | - | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
840 | 858 | | |
841 | | - | |
842 | | - | |
843 | | - | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
844 | 876 | | |
845 | | - | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
846 | 920 | | |
847 | | - | |
848 | | - | |
849 | | - | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
850 | 925 | | |
851 | | - | |
852 | 926 | | |
853 | 927 | | |
854 | 928 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
901 | 902 | | |
902 | 903 | | |
903 | 904 | | |
904 | | - | |
905 | | - | |
906 | | - | |
907 | | - | |
908 | | - | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
909 | 937 | | |
910 | 938 | | |
911 | 939 | | |
| |||
1021 | 1049 | | |
1022 | 1050 | | |
1023 | 1051 | | |
1024 | | - | |
1025 | | - | |
1026 | | - | |
1027 | | - | |
1028 | | - | |
1029 | | - | |
1030 | | - | |
1031 | | - | |
1032 | | - | |
1033 | | - | |
1034 | | - | |
1035 | | - | |
1036 | | - | |
1037 | | - | |
1038 | | - | |
1039 | | - | |
1040 | | - | |
1041 | | - | |
1042 | | - | |
1043 | | - | |
1044 | | - | |
1045 | | - | |
1046 | | - | |
1047 | | - | |
1048 | | - | |
1049 | | - | |
1050 | | - | |
1051 | | - | |
1052 | | - | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
1053 | 1084 | | |
1054 | 1085 | | |
1055 | 1086 | | |
| |||
0 commit comments