Commit 2ce0b30
* fix(#1309): cluster-1 DCGAN — restore deferred-shape guard + lazy-conv deserialize fallback
PR #1290 CI Cluster 1: 25 of 25 DCGANTests failing post-master with one
of two errors:
1. Most (23 tests): "Invalid layer configuration: The last layer's
output shape [3, -1, -1] must match the architecture output size
(12288)."
2. Clone tests (2): "Input spatial dims after padding (1+2*1, 1+2*1)
must be >= kernelSize (4)" raised inside DeserializationHelper's
pre-resolve of the discriminator's first conv layer.
Plus 1 SparseNN test (intermittent mode-collapse) that re-runs pass
without code change — flaky, not a regression target.
## Root causes
(1) NeuralNetworkBase.IsLastLayerShapeCompatible: PR #1329 (commit
969977d) added a `outputShape.Any(d => d < 0)` early-return so the
validator defers the flat-OutputSize check when any output-shape dim
is deferred — DCGAN's last transposed-conv emits [3, -1, -1] until
its first Forward resolves H/W. That guard was inadvertently deleted
by the grafprint PR (c8cac23, May 16) one day later. Restoring it
unblocks all 23 validator-rejection cases at once.
(2) DeserializationHelper conv path: when the saved layer record's
inputShape carries -1 sentinels (a lazy conv layer serialized before
its first Forward — DCGAN's discriminator on a Predict-only probe
sees only the generator), the pre-existing code coerced all -1 dims
to 1 and called conv.ResolveShapesOnly(...). For DCGAN's first conv
(kernel=4, padding=1) this fails OnFirstForward's kernel-size check
(1 + 2 < 4). Coercing to Math.Max(1, KernelSize) fixes that
specific check, but locks InputDepth at 1 — then the real Forward
with the [3, 64, 64] RGB image throws "Expected input depth 1, but
got 3". The correct fix is to skip pre-resolve entirely when
InputDepth is deferred — ConvolutionalLayer.SetParameters has its
own auto-resolve fallback at line ~1598 that derives InputDepth from
the saved parameter vector's length, and uses KernelSize as the
spatial placeholder. Pre-resolve still runs (and uses
Math.Max(1, KernelSize) for any deferred spatial dim) when
InputDepth is concrete — that's the original PR #1329 contract for
the auto-resolve-disambiguation case.
## Verification
$ dotnet test --filter "FullyQualifiedName~DCGANTests|FullyQualifiedName~SparseNeuralNetworkTests" --framework net10.0
Failed! - Failed: 2, Passed: 44, Skipped: 0, Total: 46
26 → 2 failures. The remaining two are NOT cluster-1 shape-contract
issues:
- DCGANTests.MoreData_ShouldNotDegrade — `Test execution timed
out after 120000 milliseconds`. Pre-existing GAN training-path
perf gap; the deep deconv+conv chain in tape mode is ~5-10×
slower than PyTorch CPU baseline. Substep profile (Release):
Generator.Predict 19 ms, Discriminator.Train 187 ms, Generator
adversarial 313 ms — 519 ms/step × 250 iters = 130 s vs 120 s
timeout. Filed separately so this PR ships the actual
cluster-1 root causes (validator + conv-deserialize) without
bundling a multi-week perf project.
- SparseNeuralNetworkTests.DifferentInputs_AfterTraining_ShouldProduceDifferentOutputs
— intermittent mode-collapse, passes on re-runs. Separate
flaky-test issue, not a shape-contract bug.
Closes #1309 partially (cluster-1 shape-contract root causes).
The MoreData_ShouldNotDegrade timeout + SparseNN mode-collapse
flakiness are tracked separately.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
* fix(PR #1389 review): document zero-dim wildcard semantics + reject malformed Conv inputShape rank
* fix(PR #1389 follow-up): widen rank check to reject rank-1/2 Conv inputShape too
* perf(#1390): eliminate duplicate generator forward in GAN.Train — closes DCGAN MoreData timeout
Previously GenerativeAdversarialNetwork.Train ran the generator forward TWICE
per training step:
1. Generator.Predict(input) (eval mode, NoGradScope) → detached fake images
for the combined real+fake discriminator step.
2. ForwardForTraining(input) (train mode, on tape) inside
TrainWithCustomLoss — duplicate of the same forward, just for the
gen-adversarial backward.
On the DCGAN MoreData fixture (250 iters, double-precision, batch=2, 64×64
RGB) this duplicate forward contributed ~19 ms of the 519 ms / step
profiled in #1390 — pushing the test 10 s over its 120 s budget.
Refactor:
- Open a single GradientTape at the start of the step.
- Run ForwardForTraining(input) ONCE on that tape → fakeTapeTracked.
- Take a value-copy detached snapshot (fakeImages) for the disc step;
fresh Tensor<T> with no GradNode chain so disc.Train (which opens
its own nested tape) can not leak gradients back into the generator.
- Walk the discriminator layer-by-layer on the existing gen tape for
the adversarial loss (unchanged from the prior closure semantics).
- Drive the gen optimizer step via the new
NeuralNetworkBase.BackwardAndStepOnPrecomputedLoss helper, which
reuses the open tape instead of TrainWithCustomLoss opening a fresh
one + re-running ForwardForTraining.
Behavior note: the disc step now sees train-mode generator output
(batch BN stats) instead of eval-mode (running BN stats). This matches
PyTorch's standard DCGAN training pattern (fake = G(z); fake_detached =
fake.detach()) and the existing gen step's own train-mode forward.
DCGAN has no Dropout, so the only distribution shift is BN stats, which
is the conventional adversarial behavior.
Verified locally with the canonical Tensors 0.81.3 dependency:
- DCGANTests.MoreData_ShouldNotDegrade: 1 m 47 s (was timing out at
> 120 s) — closes the test's perf gap.
- Full DCGANTests class: 25 / 25 passing.
- ConditionalGANTests + InfoGANTests (other GAN.Train consumers):
50 / 50 passing.
- Full SparseNeuralNetworkTests: 21 / 21 passing (previously
"intermittent mode-collapse" in PR #1389 description — appears
stable now, may have been transient).
Closes #1390.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(pr1389-review): narrow visibility + reentrancy + extra trainables
addresses three coderabbit comments on backwardandsteponprecomputed
loss in pr #1389:
1. visibility narrowed public -> internal. the codebase contract is
"users should only interact with aimodelbuilder / aimodelresult"
and this helper is training plumbing for in-assembly callers
(currently generativeadversarialnetwork.train); no reason for it
to live on the public surface. only caller is in same assembly.
2. added using var __reentrancyguard = acquiretrainsentinel() at the
top, mirroring trainwithtape's sentinel discipline. without it,
concurrent callers on the same model race on lastloss + optimizer
internal state.
3. trainableparams now concats getextratrainabletensors() with the
layer params, matching trainwithtape's parameter set. without this
models that expose raw tensors via getextratrainabletensors (rather
than layer-resident params) silently skipped updates on the
precomputed-loss path -- divergent semantics between the two
training entry points.
build passes.
* fix(#1405): moe default optimizer overshoots — use amsgrad adam(1e-4)
vanilla adam default at the recommended learning rate diverges on the
moredata long-training regression for mixtureofexpertsneuralnetwork
(short=ok, long=overshoot) — same signature as the densenet fix in
#1393.
two root causes:
1. lr too aggressive for the moe gating + per-expert summed-gradient
path, where multiple experts touch each parameter per step.
2. vanilla adam's v denominator drifts late in training, so step size
grows even after loss has bottomed out.
switching the built-in default to amsgrad-mode adam at 1e-4 keeps the
running max of v (reddi et al. 2018), which removes the late-training
denominator decay and matches the recipe already proven on densenet.
users passing an explicit optimizer are unaffected.
Closes #1405
---------
Co-authored-by: franklinic <franklin@ivorycloud.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 7bbfcda commit 2ce0b30
1 file changed
Lines changed: 31 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
3 | 5 | | |
4 | 6 | | |
5 | 7 | | |
| |||
179 | 181 | | |
180 | 182 | | |
181 | 183 | | |
182 | | - | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
183 | 213 | | |
184 | 214 | | |
185 | 215 | | |
| |||
0 commit comments