Skip to content

Commit 1cf430c

Browse files
ooplesclaudefranklinic
authored
fix(ci): resolve 6 real CI failures + DiT / weight-init vectorization (#1156)
* fix(stats): break BasicStats.CalculateStats recursion that crashed test host BasicStats's lazy-stats accessors all read through property getters that call EnsureFullStatsComputed -> CalculateStats. When CalculateStats itself reads any of those properties (N, Mean, Variance, StandardDeviation, Median, FirstQuartile, ThirdQuartile), the getter re-enters EnsureFullStatsComputed because _fullStatsComputed is still false during the body of CalculateStats — that flag is only set after CalculateStats returns. The result is unbounded recursion that crashes the xUnit test host with a StackOverflowException. Stack from CI failures: BasicStats<double>.CalculateStats(Vector<double>) BasicStats<double>.EnsureFullStatsComputed() BasicStats<double>.get_N() // <-- re-entry BasicStats<double>.CalculateStats(Vector<double>) ... Reported as the "Test Run Aborted — host process exited unexpectedly" on these CI jobs (PR #1154 / master): - AiDotNet.Serving.Tests - ModelFamily - Classification - ModelFamily - Clustering/GP - ModelFamily - Regression - ModelFamily - TimeSeries/Activation/Loss - Unit - 04 Feature/Fit/Fitness/Genetics Fix: compute every intermediate value into a local variable, only assign to the publicly-observable properties at the end. Property reads never happen inside CalculateStats, so the lazy getter never re-enters. Verified locally: FederatedRun_Lifecycle_FedAvg_AggregatesAndAdvancesRound (which serializes a model and triggers the lazy stats path) now passes end-to-end instead of crashing the host. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * test(data): cross-platform retry trigger for RobustFileOps tests Two RobustFileOps retry tests passed on Windows but failed on the Linux CI runner because FileShare.None on a FileStream does not actually block File.Move on POSIX: - Move_SucceedsAfter_TransientSharingViolation - Move_Propagates_WhenLockNeverReleases Both used a held FileStream with FileShare.None as the "failed-attempt" trigger. On Linux that does not block rename(2), so File.Move succeeded on the first attempt — Move_Propagates' Assert. Throws fired ("No exception was thrown") and Move_SucceedsAfter short-circuited without ever exercising the retry loop. Replaced the lock-based simulation with a cross-platform missing- parent-directory trigger: - Move_SucceedsAfter_TransientSharingViolation: destination's parent directory does not exist when MoveWithRetryAsync runs. File.Move throws DirectoryNotFoundException (an IOException subclass) on each attempt. A background task creates the parent ~250 ms in, so a subsequent attempt succeeds. Retry path is exercised on every platform. - Move_Propagates_WhenLockNeverReleases: parent directory is never created. Every attempt throws DirectoryNotFoundException; the final attempt must propagate. Test now asserts the more specific DirectoryNotFoundException type for clarity, and adds a check that the source file is still in place after the failed move (the move never started, so src must remain). Verified locally: all 5 RobustFileOpsMoveRetryTests pass on net10.0. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(serialization): match MultiHeadAttentionLayer 5-arg constructor in deserializer DeserializationHelper.CreateMultiHeadAttentionLayer was looking up a 4-parameter constructor signature (int, int, int, IActivationFunction<T>) but MultiHeadAttentionLayer<T>'s constructor is actually 5-parameter: (int, int, int, IActivationFunction<T>?, IInitializationStrategy<T>?) Type.GetConstructor matches by exact parameter list, not by "first N plus defaults," so the lookup returned null and threw "Cannot find MultiHeadAttentionLayer constructor with (int, int, int, IActivationFunction<T>)" Failure path observed in CI: - InferenceOptimizer.OptimizeForInference(model, cloneModel: true) -> NeuralNetworkBase.Clone (serialization round-trip) -> DeserializationHelper.CreateMultiHeadAttentionLayer (throws) -> caught in OptimizeForInference, returns (model, false) - Test InferenceOptimizer_RewritesMultiHeadAttention_To CachedAttention_ForTextGeneration_WhenKVCacheEnabled then sees anyApplied == false instead of the expected rewrite. The fix mirrors how CreateDenseLayer already passes IInitializationStrategy<T> in its constructor lookup. Pass null for the strategy slot, matching the constructor's default-value semantics. Verified locally: all 9 InferenceOptimizerTests pass on net10.0. Wider impact: this also unblocks Clone-via-serialization for any model containing MHA layers — previously every transformer-style model would silently skip inference optimizations after clone failed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(optimizer): re-allocate Adam moments when cached shape mismatches param AdamOptimizer.Step keyed its per-parameter moment tensors (_tapeM, _tapeV) by Tensor reference. If a parameter was first seen while a lazy-initialized layer (e.g. MultiHeadAttentionLayer with IsLazy: true initialization strategy) had its weights allocated as the placeholder [0, 0] tensor, the cached m / v captured shape [0, 0] and Length 0. Once the layer materialized real weights and real-shape gradients arrived, mScaled and gradScaled differed in shape; TensorAdd broadcast to the larger shape and the result no longer matched m's underlying buffer. Fix: at every Step, validate the cached m and v match the parameter's current shape via SequenceEqual, and re-allocate if not. Identity caching by reference still works for stable parameters; the explicit shape check covers the lazy-init case. Note: this fix alone is not sufficient to make MobileNetV3_Train_CompletesWithoutError pass — that test also hits a separate bug in AiDotNet.Tensors (CpuEngine.TensorCopy uses sourceArray.Length instead of source.Length, see follow-up PR on the Tensors repo). This commit fixes the lazy-init half of the issue, which would otherwise mask the Tensors bug behind a noisier symptom. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(serving): cross-platform sanitizer for AesGcm artifact filenames Path.GetInvalidFileNameChars returns a platform-specific set: - Windows: includes ':', '\', '*', '?', '<', '>', '|', '"' plus control chars 1-31 - Linux / macOS: only '\0' and '/' Encrypted model artifacts are designed to be portable across operating systems (an artifact written on a Linux training cluster might be loaded on a Windows inference host). Using the platform-specific set broke the AesGcmModelArtifactProtectorTests. ProtectToFile_WritesHeaderAndReturnsArtifact test on Linux CI: expected "my_model.aidn.enc" actual "my:model.aidn.enc" (':' isn't invalid on POSIX) Fix: replace Path.GetInvalidFileNameChars with a hardcoded cross-platform-invalid set that combines the Windows superset with POSIX. Now the sanitizer produces identical output on every OS, so artifacts are guaranteed mountable everywhere. Verified locally: ProtectToFile_WritesHeaderAndReturnsArtifact passes on net10.0. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * fix(layers): sparselinearlayer reports supportstraining true The layer's SupportsTraining property previously returned false with a detailed comment explaining that sparse weight tensors don't fit the tape's dense ParameterBuffer<T> contract. But returning false was incorrect: SupportsTraining gates the LEGACY non-tape training path (`if (layer.SupportsTraining) layer.UpdateParameters(lr)`), and the layer DOES have a working UpdateParameters that updates both the sparse weight tensor and the dense bias vector from gradients computed in Backward. Setting it to false was preventing the layer from training in the legacy path even though the update mechanism existed. Tape-mode discovery is unaffected by SupportsTraining — that path uses [TrainableParameter] / RegisterTrainableParameter discovery, not this property. The sparse weight tensor remains invisible to tape mode pending sparse-aware ParameterBuffer<T> support, which is a separate architectural follow-up. Updated docstring to describe the actual semantics (legacy path trains the layer; tape-mode caveat documented inline). Verified locally: SparseLinearLayer_SupportsTraining_IsTrue passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(dit): vectorize Patchify/Unpatchify/AdaLN via Engine reshape+permute Replaces the scalar nested-loop implementations of Patchify, Unpatchify, ReshapeForHeads, ReshapeFromHeads, and the ExtractModulation/ApplyAdaLN/ AddWithGate helpers with their Engine-op equivalents — reshape + permute + reshape pipelines and zero-copy TensorSliceAxis views off the AdaLN modulation tensor. Specific changes: * Patchify/Unpatchify: replace the 6-deep scalar nested loop with Engine.Reshape → Engine.TensorPermute → Engine.Reshape. The permute runs through the engine's vectorized memcpy kernel (or stays as a view when the downstream consumer supports strided) instead of a per-element C# scalar copy. * ReshapeForHeads/FromHeads: same pattern (reshape + permute + reshape) instead of the original triple-nested scalar copy with span slices. * ExtractModulation eliminated entirely. Previously ForwardBlock did 6 ExtractModulation calls per block (24 blocks × 50 inference steps × 6 = 7200 T[] allocations per Predict). Now ForwardBlock reshapes the AdaLN modulation output to [B, 6, 1, H] once and slices out each shift/scale/gate via Engine.TensorSliceAxis — zero allocations, zero scalar fill loops. * ApplyAdaLN / AddWithGate rewritten to accept Tensor<T> broadcast views (from TensorSliceAxis) instead of T[] scalar arrays. The previous implementations built a [1,1,H] broadcast tensor via TensorAllocator.Rent + a per-element scalar fill; the new ones use Engine.TensorAddScalar / Engine.TensorBroadcastMultiply / Engine. TensorBroadcastAdd directly on the sliced views. * EmbedPatches / FinalLayerWithAdaLN: replaced the TensorAllocator.Rent + CopyTo scratch-buffer round trips with Engine.Reshape view chains (the downstream dense forward is contiguous-input-tolerant). Every hot-path scalar copy in DiT forward is now either a view (zero-copy) or a SIMD-vectorized engine op. Depends on the matching AiDotNet.Tensors PR #196 for the double-precision SIMD fallbacks in TensorMatMul / ScaledDotProductAttention / FusedLinear / broadcast ops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(init): batched parallel Xavier normal weight initialization Replaces the per-element SampleGaussian call loop (which ran a virtual-dispatch Box-Muller + rejection test for every element) with a tight specialized fill routine for double and float: one paired Box-Muller transform produces two samples per pair of uniform draws, halving the log/sqrt/sin/cos call count, and large layers (≥ 256K elements) are partitioned across the thread pool so the ~29s of init cost per DiT-XL-sized Dense layer (hidden 8192 × out 12288 = 100M doubles per AdaLN modulation layer) is parallelized instead of running single-threaded. Context: even after the Tensors-side SIMD fixes on the forward matmul path, the first Pika21 Predict paid ~150s of lazy-init overhead across the 24 block layers because each first-call XavierNormalInitialize hit a scalar loop doing 100M virtual calls. The cost is one-time per layer but it dominated the first forward and pushed Training_Should* tests that exercise a fresh model over the per-test xUnit budget. Preserves reproducibility: per-chunk RNGs are seeded deterministically from the master Random instance, so for a given parent seed the output is stable across thread counts. Keeps the generic-T fallback on the old path since only float/double are expected to be perf-critical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump aidotnet.tensors 0.46.0 -> 0.46.1 Pulls in the Tensors SIMD fallback fixes from Tensors PR #196: - TensorMatMul double fallback routed through MultiplyBlocked - ScaledDotProductAttention double SIMD fast path - FusedGemmBiasActivation double fallback SIMD-routed - TensorBroadcast{Multiply,Add} trailing-repeat fast path - Odometer-based Contiguous() materialization - LayerNorm generic fallback uses SIMD numOps.Sum Unblocks the DiT vectorization work in this PR — every double-precision matmul / broadcast / attention op it relies on now hits a SIMD path instead of a scalar triple-loop. Also unblocks MobileNetV3_Train_CompletesWithoutError which hit the TensorCopy source.Length regression (Tensors PR #195, included in 0.46.1 via #194's follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(stats): break EnsureFullStatsComputed recursion in errorstats/modelstats/predictionstats Same bug class as the earlier BasicStats fix: the Calculate* method was assigning to properties AND reading them back during its own body, but the property getters call EnsureFullStatsComputed — which is still running the Calculate* method. The _fullStatsComputed flag only flips after Calculate* returns, so any intra-method property read re-enters Calculate* unbounded. The test host crashes with StackOverflowException before the test framework can report anything except "host process exited unexpectedly." Specific re-entry points the previous code had: * ErrorStats.CalculateErrorStats - RMSE = _numOps.Sqrt(MSE) ← re-enters via MSE getter - AIC/BIC/AICAlt pass RSS ← re-enters via RSS getter * ModelStats.CalculateModelStats - VIFList = ... CalculateVIF(CorrelationMatrix, ...) ← CorrelationMatrix - Mahalanobis block reads CovarianceMatrix thrice ← CovarianceMatrix * PredictionStats.CalculatePredictionStats - AdjustedR2 = ... CalculateAdjustedR2(R2, ...) ← R2 - PredictionIntervalCoverage = ... (PredictionInterval.Lower, PredictionInterval.Upper) ← PredictionInterval - ConfidenceInterval/CredibleInterval read BestDistributionFit .DistributionType ← BestDistributionFit All three methods are rewritten to compute every intermediate into a local variable first; properties are only assigned once every dependency is a local. No property reads happen inside Calculate*, so the lazy getter never re-enters. Observed failure path (Classification CI shard, PR #1156 run): AdaBoostClassifierTests.Predict_ShouldBeDeterministic trains the model, which computes ErrorStats, which stack-overflows the host. Other crashed tests in the same shard: - ExtraTreesClassifierTests.Clone_ShouldProduceIdenticalPredictions - CategoricalNaiveBayesTests.Builder_AccuracyShouldBeatChance - OneVsRestClassifierTests.Builder_AccuracyShouldBeatChance All 4 pass locally after this fix. Unblocks the host_crash jobs on PR #1154 triage: - ModelFamily - Classification - ModelFamily - Clustering/GP - ModelFamily - Regression - ModelFamily - TimeSeries/Activation/Loss - Unit - 04 Feature/Fit/Fitness/Genetics - AiDotNet.Serving.Tests Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(networks): resnet/vgg train adds batch dim for 3d input ResNet/VGG's Forward() explicitly accepts 3D [C,H,W] input and expands it to 4D [1,C,H,W] before running the layer stack. Their Train() overrides, however, called TrainWithTape directly — which delegates to NeuralNetworkBase.ForwardForTraining, which does NOT add a batch dim and just runs the raw tensor through every layer. For a 3D input [3, 32, 32], the conv/pool chain preserves the rank-3 shape and the classifier's AdaptiveAveragePool + Flatten ends up producing [512, 1] (the 512 final-block channel count gets treated as a batch dim by FlattenLayer.Forward's "preserve first dim" rule). The final DenseLayer with inputSize=512 sees actualInputSize=1 via input.Shape[^1], calls EnsureWeightShapeForInput(1) which resizes weights to [1, 10], and produces [512, 10] — which then fails the loss shape check in EnsureTargetMatchesPredicted because the target is [10]. Fix: mirror Forward()'s expansion in Train() — when input is 3D, add a leading batch dim to BOTH input and target before dispatching to TrainWithTape. Any 4D input is passed through untouched. The target expansion is guarded so a caller that already provided a batched target is not double-expanded. Verified locally, all 4 of the previously-failing tests now pass: - ResNetNetwork_Train_CompletesWithoutError - ResNetNetwork_Train_LossDecreases - VGGNetwork_Train_CompletesWithoutError - VGGNetwork_Train_LossDecreases Closes the 08a NN-Classic (ResNet/VGG/DenseNet) CI shard failure from the PR #1154 triage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(networks): mobilenetv2 handles 3d input in forward/train/namedactivations Same structural bug as ResNet/VGG: MobileNetV2's Forward / Train / GetNamedLayerActivations all iterated the layer stack with the raw input. For 3D [C, H, W] inputs, BatchNormalizationLayer's channel scale (shape [1, C, 1, 1]) cannot broadcast against the 3D layout because dim 1 of the input (spatial H) doesn't match the BN's C channel count: "Tensors with shapes [16, 32, 32] and [1, 16, 1] cannot be broadcast: dimension 1 has sizes 32 and 16 (must be equal or one must be 1)." Fix: add a leading batch dimension when the caller passes a 3D input so every BN in every InvertedResidualBlock sees the 4D layout it requires, and squeeze it back off at the end of Forward so the output shape matches the caller's 3D contract. Train() expands both input and target the same way so ForwardForTraining (which iterates layers without adding batch dim) also sees the correct shape. GetNamedLayerActivations is overridden with the same expansion so the layer-by-layer probe used by NamedLayerActivations_ShouldBeNonEmpty doesn't hit the same BN broadcast error. Also fixes the test: the parameterless MobileNetV2Network constructor defaults to 1000 ImageNet classes and 224x224 input; the test probed with 3x64x64 and 10-class OutputShape. Swap in the architecture-aware overload so the classifier head matches the expected output dim. Goes from 0/17 passing on the previous config to 14/17 passing — the three remaining failures are a deeper shape-collapse issue inside the InvertedResidualBlock chain for the NamedLayerActivations probe and a perf timeout on the training tests, both of which are separate from this broadcast-shape root cause. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(networks): instructorembedding test shape matches 768-dim model InstructorEmbedding's default ctor builds a 768-dim transformer (inputSize=768, outputSize=768) but the test inherited the base class's default InputShape=[1, 4] and OutputShape=[1, 1]. The training tests fed a [1, 4] input to a 768-dim model and a [1, 1] target that the loss function then tried to subtract from the model's [1, 768] prediction, throwing "Tensor shapes must match. Got [1, 768] and [1, 1]." in MeanSquaredErrorLoss.ComputeTapeLoss. Fix: override InputShape/OutputShape to the model's actual 768-dim embedding layout so input, prediction, and target all align. Closes the InstructorEmbedding part of the "ModelFamily - NeuralNetworks" CI shard failure from the PR #1154 triage (remaining failures in that shard are MobileNetV2 and are addressed in the previous commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(networks): convolutionalneuralnetwork train adds batch dim for 3d input Same 3D-input bug as ResNet/VGG/MobileNetV2: CNN's Train() called TrainWithTape with the raw 3D [C, H, W] tensor. ForwardForTraining iterates layers without a shape-adjustment step, so the final FlattenLayer treats the 32-channel dimension as a batch (preserve-first-dim rule) and produces a [32, 10] prediction against a [10] one-hot target — fails EnsureTargetMatchesPredicted with "Target shape dimension 0 (10) does not match predicted shape dimension 0 (32)." Fix: expand 3D input to 4D before dispatching to TrainWithTape, and expand the target too when the caller provided it without a batch dim. All 5 previously-failing CNN tests pass locally: - TrainingError_ShouldNotExceedTestError - Training_ShouldReduceLoss - Training_ShouldChangeParameters - GradientFlow_ShouldBeNonZeroAndFinite - ForwardPass_ShouldBeFinite_AfterTraining Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(networks): unet3d decoder channel count + test output shape Two related problems surfaced by every UNet3D test: 1. LayerHelper.CreateDefaultUNet3DLayers — the decoder path declared the first Conv3D of each non-bottleneck-adjacent block with `inChannels = encoderFilters[block + 1] * 2`. The "*2" was there to account for a full U-Net concatenating skip connections from the encoder at each decoder level. This implementation does NOT actually perform the concatenation, so the preceding decoder block's Second-Conv3D emitted encoderFilters[block + 1] channels, not double that. Every CI call (and every local Predict) hit "Input channels (128) must match kernel in_channels (256)" in the first decoder block after the one adjacent to the bottleneck. Fix: drop the "*2" so the declared in_channels match the tensors that actually flow through. Concatenating real skip connections is a separate architectural improvement. 2. UNet3DTests — OutputShape declared as [1], treating the network as a classifier, but UNet3D is a per-voxel segmentation model whose final 1x1x1 Conv3D emits [numClasses, D, H, W] per sample. With default numClasses=1 and 32³ voxel grid, every training test tried to subtract a [1, 32, 32, 32] prediction from a [1] target and threw "Tensor shapes must match. Got [1, 32, 32, 32] and [1]." Fix: OutputShape → [1, 32, 32, 32] so input, prediction, and target all line up. Goes from 0/17 passing on UNet3D to 12/17. The five remaining failures are separate issues (NaN during training for this conv stack, metadata parity) that are independent of these two root causes. Closes 7 of the 8 UNet3D failures from the PR #1154 CI triage that were all attributed to the "Input channels (128) vs (256)" error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gp): escalating cholesky jitter for sparsegaussianprocess.fit Ky = Kuu + D·Kuf·Kuf^T is only positive-semi-definite in exact arithmetic, so floating-point roundoff on the combined matrix routinely pushes the smallest eigenvalue just below zero and CholeskyDecomposition throws "Matrix is not positive definite" on every SparseGaussianProcess fit. Kuu already gets a constant 1e-4 jitter before its Cholesky, but the Ky path had none — that produced the six SparseGaussianProcessTests failures in the PR #1156 CI shard. Add a PyTorch/GPyTorch-style escalating jitter schedule (1e-6 → 1e-4 → 1e-2 → 1e-1, scaled by the matrix trace so it's invariant to kernel amplitude) and retry the Cholesky after each increment. Geometric escalation instead of a single larger constant keeps the numerical error introduced for already-well-conditioned matrices minimal while still rescuing the borderline cases. Goes from 7/16 passing to 14/16 on SparseGaussianProcessTests. Remaining two failures are separate bugs (predictive mean is NaN, not a PD-matrix issue) tracked independently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(generators): correct audio/video modeldomain ordinal in testscaffoldgenerator ModelDomain enum order is General=0, Vision=1, Language=2, Audio=3, Video=4, Multimodal=5. The scaffold generator had Audio and Video ordinals swapped in three places: 1. Line 1495 — treats Domain=3 as "temporal video" and emits `throw new NotImplementedException(...)` in the test's CreateNetwork. Audio is 3, not 4, so EVERY audio model (PlayHT, Bark, StableAudio, etc.) got a NotImplementedException factory instead of a working architecture. Ten PlayHTTests failures on PR #1156 traced back to this single line. 2. Line 1520 — `isAudio = Domains.Contains(4)`. Should be 3. 3. Line 1633 — `isVideoModel = Domains.Contains(3)`. Should be 4. All three sites now use the correct ordinals (Audio=3, Video=4). This aligns the generator with the enum and the facade/customization pattern the project prefers over hard-coded factories — every audio model's test can now construct a real Architecture and run the test body (which exposes the real model-specific failures downstream, where they can be fixed in the model code rather than hidden behind a runtime factory stub). PlayHTTests go from 0/21 passing (all NotImplementedException) to 2/21 (metadata/parameter-count tests now execute). The remaining 19 failures are a separate PlayHT LayerNorm shape-mismatch issue that can be addressed independently now that the tests actually run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(neuralnetworks): align word2vec test shapes with softmax vocab head word2vec's default constructor uses vocabsize=10000. the final layer emits a 10000-dim softmax over the vocabulary, so per-sample output is [1, 10000], not the [1, 1] implied by the base-class default. align input/output shape so outputdimension_shouldmatchexpectedshape compares the right tensors. * test(ner): emit 768-dim scaffolded shapes for transformer ner models transformernerbase, spanbasedernbase, and the lstm-crf family all validate token embeddings against their options.hiddendimension (768 by default, 100 for lstm-crf). the auto-scaffolded test base inherited [1, 4] as inputshape, so multiheadattention threw "input embedding dimension (4) does not match weight dimension (768)" before any downstream logic could run — the reported scibertner training-error regression on pr #1156. emit inputshape = [8, 768] for transformerner/spanbasedner and [8, 100] for sequencelabelingner in the test scaffolder. add a manual tinybertnertests with [8, 312] so the one model that overrides hiddendimension still gets covered. * fix(layers): default rnn head should use identityactivation, not relu-via-null recurrent network's default layer stack terminated in a dense layer constructed with activationfunction:null, which the dense ctor substitutes with relu. the preceding two tanh recurrent layers produce small mixed-sign activations (range ~[-0.16, 0.16] on random input), and relu then clips the single-output regression head to exactly 0 for essentially any input. that is why scaledinput_shouldchangeoutput and differentinputs_shouldproducedifferentoutputs saw identical zero outputs for distinct inputs on recurrentneuralnetworktests. pass an explicit identityactivation so the dense head stays linear. the task-appropriate softmax/sigmoid activation layer emitted after it remains unchanged. * fix(memorynetwork): seed memory and wire training through the memory-aware flow two root causes made every memorynetwork prediction identical regardless of input, and the training path diverge from the prediction path: 1. _memory was initialized as a zero matrix. memoryreadlayer computes keys · memory^t, so with zero memory every attention score is zero, softmax produces a uniform distribution, and attentionweights · memory reads back zero — every subsequent layer saw the same constant vector. scaledinput_shouldchangeoutput and differentinputs_ shouldproducedifferentoutputs both reported the network ignored its input. seed _memory with small xavier-scale random values so there is something non-trivial to attend over on the very first forward pass. 2. predict specialcased memoryread/memorywritelayer to pass the memory tensor and reshaped rank-1 input to [1, n], but train went through the base trainwithtape → forwardfortraining path which did neither, so training crashed ("tensormatmul requires tensors of rank >= 2") or silently read from an identity-memory fallback. factor the shared layer walk into runlayers() and override forwardfortraining so train and predict share the same memory plumbing. locally memorynetworktests goes from 9 failing → 2 (the remaining two are the known memoryreadlayer deserialization gap and namedlayeractivations, tracked separately). * fix(quantumnn): migrate training to trainwithtape and use identity on final dense quantumneuralnetworktests was failing 10/17 because train called _trainoptimizer.updateparameters(layers) without first running a backward pass, tripping "backward pass must be called before updating parameters" inside each dense layer's legacy per-learning-rate update path. switch train to trainwithtape, matching resnet/vgg/mobilenetv2. the quantum default layer stack also terminated its final dense in the generator with activationfunction:null (→ relu), so regression-task output got clipped at zero before the task-specific final activation layer could run. promote that dense to identityactivation so the subsequent activationlayer owns the non-linearity, same fix pattern as the rnn regression head. locally qnn goes from 10 failing → 5 (remaining five look like a deeper input-independent forward pass — separate issue). * fix(diffusion): upscaleavideo inputconv should match latent channels, not concat width upscaleavideomodel set input_channels=8 to describe the "concat latent+low-res conditioning" path from the reference paper, but forwardvideounet adds the image condition via the _imagecondprojection dense layer *after* _inputconv, not by concatenating before it. the first conv was therefore sized for 8 channels while ever actually seeing 4, and the 14 upscaleavideomodeltests cases on the diffusion a-i shard all failed with "expected input depth 8, but got 4". pin input_channels to latent_channels so the conv weight shape matches what the forward pass feeds it. this exposes a downstream film projection width mismatch tracked separately (videounetpredictor.applyfilmconditioning) — fixing that is the next step. * fix(diffusion): videounet spatial resblock must mix channels, not width createspatialresblock wrapped a lazydense(inchannels, outchannels), but denselayer projects the *last* dimension of its input. for a 4d feature map [b, c, h, w] that is the width axis, not the channel axis — so the resblock silently scrambled width into outchannels while leaving the channel count untouched. the next timecondprojection was sized for the planned outchannels, so applyfilmconditioning saw "expected 2*c, got 2*outc" and threw "film conditioning projection width mismatch: expected 640, got 1280" across upscaleavideo and streamingt2v tests. switch to a 1x1 lazyconv2d — the standard channel-mixing primitive. it consumes [b, inchannels, h, w] and produces [b, outchannels, h, w] without touching spatial dims, so downstream film projections receive a feature map with the channel count they were sized for. follow-ups (separate): multihead attention, temporal attention, and cross-attention layers still receive the 4d tensor directly without reshape, which surfaces as input-dim mismatches further down the forward pass. * fix(serialization): register memoryread and memorywrite layers for deserialization clone()-style roundtrips on memorynetwork crashed with "layer type memoryreadlayer is not supported for deserialization (no known constructor found)" because deserializationhelper.createlayerfromtype had no explicit arm for either memoryread or memorywrite layer, and the default fallback tries a ctor(int[]) that neither layer exposes. add cases for both. memoryreadlayer uses a (inputdim, memorydim, outputdim, iactivation) ctor and memorywritelayer uses (inputdim, memorydim, iactivation). pick memorydim from a "memorydimension" metadata key when present, otherwise reuse the output dim — which matches how memorynetwork wires its memoryreadlayer (embeddingsize for all three dims). * fix(gp): sparsegp ky solve falls back to svd pseudoinverse when cholesky gives up sparsegaussianprocess.fit builds ky = kuu + d·kuf·kuf^t and factors it via cholesky. in exact arithmetic ky is psd (not pd) whenever rank(d·kuf·kuf^t) < m — the common regime where inducing points equal the data dimensionality — and floating-point roundoff then pushes the smallest eigenvalue just below zero, so choleskydecomposition throws "matrix is not positive definite". the earlier escalating jitter schedule (1e-6 → 1e-4 → 1e-2 → 1e-1 of the trace) was still losing on the ci shard, leaving 7 sparsegaussianprocesstests failing. keep the cholesky + jitter escalation as the primary path for performance, then fall back to an svd moore-penrose pseudoinverse when no jitter level makes ky pd. the pseudoinverse truncates singular values below max(rows, cols) · ε_machine · σ_max, which is numpy.linalg.pinv's default tolerance, and produces a well-defined α even when d·kuf·kuf^t has a near-null space. locally sparsegaussianprocesstests: 7 failing → 16/16 passing. * fix(regression): poisson irls must not overwrite coefficients with nan/inf predictions_shouldbefinite and collinearfeatures_shouldnotcrash both failed on net10 because the irls step in poissonregression.train can produce a newcoefficients vector with nan entries when x^t·w·x is numerically singular (the solve with qr/svd doesn't always refuse the factorization — it sometimes just hands back 1/0 or 0/0). the loop then assigned those nan values into coefficients and intercept, and every subsequent predictmean call propagated nan through the linear predictor. check for non-finite entries before accepting the step and halt iteration instead, preserving the last known-good coefficients. matches statsmodels glm's "linearalgerror" abort. locally poissonregressiontests: 20/22 → 21/22 (the remaining moredata_shouldnotdegrade_r2 is a separate convergence issue). * fix(regression): rbf solve via tikhonov-damped svd instead of normal-equations inverse rbf design matrices are often severely ill-conditioned — when a handful of centers end up far from every input, the corresponding columns go to near-zero and x^t·x has a huge condition number. the previous solve inverted x^t·x + λi directly via matrix.inverse(), which amplified roundoff into nan predictions (predictions_shouldbefinite, singlefeature_shouldwork, collinearfeatures_shouldnotcrash) and catastrophic negative r² (r2_shouldbepositive_onlineardata saw r² ≈ -10¹²). replace with a tikhonov-regularized svd solve on x directly: weights = v · diag(σ / (σ² + λ²)) · uᵀ · y with λ = 1e-6 · σ_max. this smoothly damps the ill-conditioned directions instead of zeroing them (which a hard-tolerance pseudoinverse would, dropping real signal along with roundoff) and avoids forming the normal-equations matrix that was the source of the explosion. locally rbfregression: nan predictions cleared, r² on linear data improved by 11+ orders of magnitude (from ~-10¹² to single-digit negative). a couple of r²-positivity tests still fail — likely center-placement / gamma choice, separate improvement — but the nan-poisoning is gone. * fix: address 10 CodeRabbit review comments on PR #1156 - AesGcmModelArtifactProtector.SanitizeFileName: reject Windows DOS reserved device names (CON/PRN/AUX/NUL/COM1-9/LPT1-9) and trim trailing dot/space characters. Previously portable-artifact guarantee failed on names like "CON.bin" or "model." — now prefixed with '_' and trimmed so artifacts created on POSIX hosts still mount on Windows. - DiTNoisePredictor.ForwardBlock + FinalLayerWithAdaLN: guard against misconfigured AdaLN modulation output sizes. If modulation.Length isn't divisible by 6 * _hiddenSize (or 2 * _hiddenSize for final layer), throw InvalidOperationException with a clear diagnostic rather than letting integer division truncate silently and Engine.Reshape throw a cryptic shape-mismatch error downstream. - RobustFileOpsMoveRetryTests: renamed Move_SucceedsAfter_TransientSharingViolation → ...TransientMissingParentDirectory and Move_Propagates_WhenLockNeverReleases → ...WhenParentDirectoryNeverCreated so the test names match the actual cross-platform retry trigger (missing destination parent directory, not lock/share violation which doesn't work on Linux). Fixed XML-doc reference from IOException → DirectoryNotFoundException. - PredictionStats.CalculatePredictionStats: reuse R2 + AdjustedR2 already computed eagerly in the constructor with identical inputs, instead of recalculating them in the lazy-compute path. Cuts two O(n) scans. - NeuralNetworkBase: new protected PromoteToBatchedTensor + EnsureBatchForCnnTraining helpers. Extracted from the duplicated 4-line rank-3 → rank-4 input expansion pattern that ResNet/VGG/MobileNetV2/ConvolutionalNeuralNetwork all carried individually. Subclasses' Train() now delegates to the base helper and removes their private AddBatchDimension copies. (Name differs from per-subclass AddBatchDimension to avoid CS0108 hides-inherited warnings on 10+ segmentation subclasses that keep their own local helpers for non-CNN-training paths.) Verify: - src build net10.0 — 0 errors - tests build net10.0 — 0 errors - Tensors 0.46.1 confirmed published on NuGet Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent 9b89378 commit 1cf430c

31 files changed

Lines changed: 1195 additions & 511 deletions

Directory.Packages.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
<ItemGroup>
66
<!-- AiDotNet ecosystem -->
77
<PackageVersion Include="AiDotNet" Version="0.113.0" />
8-
<PackageVersion Include="AiDotNet.Tensors" Version="0.46.0" />
8+
<PackageVersion Include="AiDotNet.Tensors" Version="0.46.1" />
99
<PackageVersion Include="AiDotNet.Native.OneDNN" Version="0.38.0" />
1010
<PackageVersion Include="AiDotNet.Native.OpenBLAS" Version="0.28.0" />
1111
<PackageVersion Include="AiDotNet.Native.CLBlast" Version="0.37.0" />

src/AiDotNet.Generators/TestScaffoldGenerator.cs

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1492,14 +1492,21 @@ private static void EmitGeneratedTestClass(
14921492
}
14931493
}
14941494
}
1495-
else if (model.Domains.Contains(3) && !model.Tasks.Contains(35))
1495+
else if (model.Domains.Contains(4) && !model.Tasks.Contains(35))
14961496
{
14971497
// Temporal video models (ActionRecognition=22, VideoGeneration=41, etc.)
14981498
// need a 4D [frames, channels, height, width] input shape, but
14991499
// NeuralNetworkArchitecture only expresses 3D (height/width/depth).
15001500
// Rather than silently emit a mismatched 3D architecture alongside a
15011501
// 4D InputShape, route these to a runtime placeholder until the
15021502
// architecture type can represent a temporal dimension.
1503+
//
1504+
// The enum ordinal for ModelDomain.Video is 4
1505+
// (General=0, Vision=1, Language=2, Audio=3, Video=4, ...).
1506+
// This check previously used 3, which incorrectly flagged every
1507+
// *audio* model (PlayHT, Bark, etc.) as "temporal video" and
1508+
// emitted a NotImplementedException factory — ten PlayHTTests
1509+
// failures on PR #1156 traced to this off-by-one.
15031510
constructorExpr = "throw new System.NotImplementedException(" +
15041511
$"\"'{GeneratorHelpers.StripGenericSuffix(model.ClassName)}' is a temporal video model; NeuralNetworkArchitecture<T> cannot express its 4D [frames, channels, height, width] input. Implement this factory manually.\")";
15051512
}
@@ -1510,7 +1517,7 @@ private static void EmitGeneratedTestClass(
15101517
// others default to OneDimensional. Temporal video is handled above.
15111518
needsArchitectureUsing = true;
15121519
bool isVision = model.Domains.Contains(1) || model.Domains.Contains(11); // Vision=1, ThreeD=11
1513-
bool isAudio = model.Domains.Contains(4); // Audio=4
1520+
bool isAudio = model.Domains.Contains(3); // Audio=3 (enum ordinal, not Video=4)
15141521
bool isFrameInterp = model.Tasks.Contains(35); // FrameInterpolation → 3D input
15151522

15161523
string inputTypeExpr;
@@ -1623,11 +1630,12 @@ private static void EmitGeneratedTestClass(
16231630

16241631
// Override InputShape/OutputShape for domain-appropriate test data.
16251632
// Vision/Video/3D models need [C, H, W]; default is [1, 4].
1626-
bool isVideoModel = model.Domains.Contains(3);
1633+
// Enum ordinals: General=0, Vision=1, Language=2, Audio=3, Video=4.
1634+
bool isVideoModel = model.Domains.Contains(4); // Video=4 (was incorrectly 3)
16271635
bool isFrameInterpModel = model.Tasks.Contains(35); // FrameInterpolation
16281636
bool isTemporalVideoModel = isVideoModel && !isFrameInterpModel;
16291637
bool isVisionModel = model.Domains.Contains(1) || model.Domains.Contains(11);
1630-
bool isAudioModel = model.Domains.Contains(4);
1638+
bool isAudioModel = model.Domains.Contains(3); // Audio=3 (was incorrectly 4)
16311639
if (isTemporalVideoModel)
16321640
{
16331641
// Temporal video: [frames, channels, height, width]
@@ -1660,6 +1668,23 @@ private static void EmitGeneratedTestClass(
16601668
sb.AppendLine($" protected override int[] InputShape => new[] {{ {dim} }};");
16611669
sb.AppendLine(" protected override int[] OutputShape => new[] { 4 };");
16621670
}
1671+
else if (family == TestFamily.TransformerNER || family == TestFamily.SpanBasedNER)
1672+
{
1673+
// TransformerNERBase and SpanBasedNERBase both default to
1674+
// HiddenDimension=768 (BERT-base). Inputs are validated as
1675+
// [seqLen, 768], so the base-class default [1, 4] causes a
1676+
// hard "embedding dim mismatch" failure inside MultiHeadAttention
1677+
// before any downstream logic runs. Use a short sequence to
1678+
// keep the test fast while matching the model's expected
1679+
// embedding size. Models with non-default hidden dimensions
1680+
// (TinyBERT=312, etc.) need a manual test override.
1681+
sb.AppendLine(" protected override int[] InputShape => new[] { 8, 768 };");
1682+
}
1683+
else if (family == TestFamily.SequenceLabelingNER)
1684+
{
1685+
// LSTM-CRF family defaults to EmbeddingDimension=100.
1686+
sb.AppendLine(" protected override int[] InputShape => new[] { 8, 100 };");
1687+
}
16631688

16641689
sb.AppendLine($" protected override {returnTypeCode} {factoryMethodName}()");
16651690
sb.AppendLine(factoryBody);

src/AiDotNet.Serving/Services/AesGcmModelArtifactProtector.cs

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,11 +72,67 @@ public ProtectedModelArtifact ProtectToFile(string modelName, string sourcePath,
7272
}
7373
}
7474

75+
/// <summary>
76+
/// Cross-platform-invalid filename characters. Combines the Windows
77+
/// invalid set (most restrictive: ":" + "\\" + reserved punctuation +
78+
/// control chars) with POSIX "/" and "\0". Used instead of
79+
/// <see cref="Path.GetInvalidFileNameChars"/> because that method
80+
/// returns a platform-specific set — on Linux it only contains '\0'
81+
/// and '/', so a model name like "my:model" sanitizes to "my:model"
82+
/// on Linux but "my_model" on Windows. Encrypted artifacts are
83+
/// designed to be portable, so we apply the strict Windows superset
84+
/// on every OS to guarantee the output is mountable everywhere.
85+
/// </summary>
86+
private static readonly HashSet<char> CrossPlatformInvalidFileNameChars =
87+
new(new[]
88+
{
89+
'\0', '/', '\\', ':', '*', '?', '"', '<', '>', '|',
90+
}
91+
.Concat(Enumerable.Range(1, 31).Select(i => (char)i)));
92+
93+
/// <summary>
94+
/// DOS reserved device names. Creating a file with any of these as the
95+
/// base name (with or without extension) fails on Windows with
96+
/// <c>PathTooLongException</c> / <c>IOException</c> because the kernel
97+
/// still routes them to legacy character devices. Cross-platform
98+
/// portability requires rejecting them even on POSIX hosts so an
99+
/// artifact produced on Linux can't be loaded on Windows.
100+
/// </summary>
101+
private static readonly HashSet<string> WindowsReservedFileNames =
102+
new(StringComparer.OrdinalIgnoreCase)
103+
{
104+
"CON", "PRN", "AUX", "NUL",
105+
"COM1", "COM2", "COM3", "COM4", "COM5", "COM6", "COM7", "COM8", "COM9",
106+
"LPT1", "LPT2", "LPT3", "LPT4", "LPT5", "LPT6", "LPT7", "LPT8", "LPT9",
107+
};
108+
75109
private static string SanitizeFileName(string name)
76110
{
77-
var invalid = Path.GetInvalidFileNameChars();
78-
var chars = name.Select(c => invalid.Contains(c) ? '_' : c).ToArray();
79-
return new string(chars);
111+
// 1. Replace cross-platform-invalid characters.
112+
var chars = name.Select(c => CrossPlatformInvalidFileNameChars.Contains(c) ? '_' : c).ToArray();
113+
var sanitized = new string(chars);
114+
115+
// 2. Windows strips trailing dots and spaces from filenames at create-time
116+
// (so "model." silently becomes "model", but "model." on some paths fails
117+
// with PathNotFound). Trim on every platform to avoid the mismatch.
118+
sanitized = sanitized.TrimEnd(' ', '.');
119+
120+
// 3. If the base (pre-extension) is a reserved DOS device name, prefix it
121+
// so the artifact remains portable. Split on the first dot so "NUL.bin"
122+
// also gets rewritten.
123+
if (sanitized.Length == 0)
124+
{
125+
return "_";
126+
}
127+
128+
var dotIndex = sanitized.IndexOf('.');
129+
var baseName = dotIndex >= 0 ? sanitized.Substring(0, dotIndex) : sanitized;
130+
if (WindowsReservedFileNames.Contains(baseName))
131+
{
132+
sanitized = "_" + sanitized;
133+
}
134+
135+
return sanitized;
80136
}
81137
}
82138

0 commit comments

Comments
 (0)