Commit 18a3d6a
* feat(#1370): shape oracle trydeclareshape skips lora warmup
Adds a layer-side shape declaration mechanism that lets AiModelBuilder skip its
LoRA-warmup forward pass when every layer can declare its parameter shapes from
constructor args alone. Matches PyTorch / HuggingFace PEFT's construction-time
shape model without giving up AiDotNet's lazy-shape flexibility (lazy convs
and inferred-shape layers still trigger the warmup fallback).
Phase 1 — foundation:
- Add `public virtual bool TryDeclareShape()` to LayerBase<T>. Default impl
returns `IsShapeResolved`. Layers whose ctor carries enough info to allocate
weights override this to materialise their state and return true.
Phase 2 — high-value overrides:
- LayerNormalizationLayer: new eager `(int featureSize, double epsilon)` ctor
that allocates gamma/beta immediately. Uses the default TryDeclareShape impl
(which returns IsShapeResolved=true on the eager path). The existing
parameter-less lazy ctor is unchanged.
- MultiHeadAttentionLayer: override TryDeclareShape to call EnsureWeightsAllocated
(now internal). Allocates Q/K/V/O matrices from ctor-known embeddingDim and
returns true even though InputShape still has a -1 seq placeholder — LoRA
wraps weight matrices, the seq placeholder doesn't matter. Documented
asymmetry with IsShapeResolved on the override.
Phase 3 — rewire AiModelBuilder.BuildSupervisedInternalAsync:
- Before the warmup forward, loop over layers and call TryDeclareShape on each.
Count declared-vs-still-needs-warmup. When ALL declare successfully, skip the
warmup entirely (Trace.TraceInformation surfaces the skip for observability).
When any layer needs warmup, fall back to the existing warmup forward.
- Change the LoRA wrap gate from IsShapeResolved to TryDeclareShape so MHA
(whose seq stays -1 but weights are allocated) gets wrapped.
Tests:
- New ShapeOracleIssue1370Tests covers Phase 1 default impl, Phase 2 LayerNorm
eager ctor + MHA override (including idempotency + the documented asymmetry).
Test class joined LayerSerializationCollection so MHA weight init does not
shift seeds for the parallel auto-generated TapeGradient tests.
- One pre-existing failing test (LayerNorm_ParameterCount_IsTwiceFeatureSize)
migrated to the new eager (featureSize, ...) ctor; now passes.
- Bucket10 LoRA test still passes — verifies the rewire didn't change the
wrap-loop outcome on the canonical LoRA-bucket model.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* feat(#1370): phase 4 trydeclareshape sweep across remaining lazy layers
Extends the shape oracle to four more layer types that were lazy on master
but carry enough info in their constructors to declare shape eagerly:
Eager-init constructors (default TryDeclareShape via IsShapeResolved):
- BatchNormalizationLayer: new (int numFeatures, double epsilon, double momentum)
ctor allocates gamma/beta/runningMean/runningVariance immediately. Existing
parameter-less lazy ctor unchanged.
- RMSNormalizationLayer: new (int featureSize, double epsilon) ctor allocates
gamma immediately. Existing parameter-less lazy ctor unchanged.
TryDeclareShape overrides (allocate state from ctor-known dims, return true):
- PReLULayer: alpha is already allocated + registered in the existing ctor;
only the broadcast shape is forward-runtime-deferred. Override returns true
unconditionally so LoRA can wrap the layer without a warmup forward.
- TransformerEncoderLayer: the eager-dimension ctor (numHeads, feedForwardDim,
embeddingSize) constructs sublayers immediately. Override returns true when
isInitialized or IsShapeResolved; lazy ctor (embeddingSize == -1) still
falls through to the warmup forward via false.
Tests:
- Extended ShapeOracleIssue1370Tests with 17 more tests covering all four
layers. 30/30 unit tests pass.
- Migrated three pre-existing failing tests to use the new eager ctors:
- AdvancedLayersIntegrationTests.BatchNormalizationLayer_ParameterCount_IsPositive
- AdvancedLayersIntegrationTests.TransformerEncoderLayer_ParameterCount_ReturnsPositiveValue
- NormalizationLayersIntegrationTests.BatchNormalizationLayer_ParameterCount_IncludesGammaAndBeta
Coverage summary across the LayerBase<T> subclass sweep:
- Already-eager layers (default impl correct): GroupNormalizationLayer,
SelfAttentionLayer (default Eager init strategy)
- Now-eager layers (this PR): LayerNormalizationLayer, BatchNormalizationLayer,
RMSNormalizationLayer, MultiHeadAttentionLayer, PReLULayer,
TransformerEncoderLayer
- Cannot declare from ctor (input dependent — default correct):
Dense/FullyConnected/FeedForward (LazyLinear analogs, need input width),
Conv variants (LazyConv2d analogs, need input channels), LSTM/GRU/RNN,
AttentionLayer (older API), TransformerDecoderLayer (lazy ctor only)
- Stateless layers (no LoRA-relevant weights, default false correct since
LoRA wrap-type check skips them anyway): Activation, Pooling, Reshape,
Flatten, Transpose, Padding, Cropping, Slicing, Split, Upsample,
PixelShuffle, GaussianNoise, Masking
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(PR #1388 review): try/catch TryDeclareShape + LoRA-target pre-scan + private EnsureWeightsAllocated
* fix(PR #1388 follow-up): also gate the wrap-loop TryDeclareShape probe on IsLoRATarget
* fix(pr1388 review): narrow trydeclareshape visibility + transformerencoder allocation gate
addresses two coderabbit comments on pr #1388:
1. trydeclareshape() narrowed public -> internal across layerbase +
3 overrides (mha, prelu, transformerencoder). same rationale as
the backwardandsteponprecomputedloss fix in pr #1389: this hook
is shape-oracle orchestration plumbing for aimodelbuilder and
in-assembly callers (#1370), users only see the builder surface.
internalsvisibleto already covers aidotnettests for the existing
shapeoracleissue1370tests.
2. transformerencoderlayer.trydeclareshape no longer returns true
for isshaperesolved without checking sublayer allocation. the
prior `_isinitialized || isshaperesolved` could flip true via a
non-allocating upstream shape declaration, skipping warmup while
weight matrices were still missing — lora wrapping then operated
on empty sublayer state. new behavior:
- if `_isinitialized` -> true (allocated)
- else if `_embeddingsize > 0` -> call ensureinitialized()
then return based on result
- else -> false (genuinely lazy, fall through to warmup)
verified locally: 30/30 shapeoracleissue1370tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent 36fb35f commit 18a3d6a
13 files changed
Lines changed: 782 additions & 105 deletions
File tree
- src
- LoRA
- NeuralNetworks/Layers
- tests/AiDotNet.Tests
- IntegrationTests/NeuralNetworks
- UnitTests/NeuralNetworks/Layers
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2813 | 2813 | | |
2814 | 2814 | | |
2815 | 2815 | | |
2816 | | - | |
2817 | | - | |
2818 | | - | |
2819 | | - | |
2820 | | - | |
2821 | | - | |
2822 | | - | |
2823 | | - | |
2824 | | - | |
2825 | | - | |
2826 | | - | |
2827 | | - | |
2828 | | - | |
2829 | | - | |
| 2816 | + | |
| 2817 | + | |
| 2818 | + | |
| 2819 | + | |
| 2820 | + | |
| 2821 | + | |
| 2822 | + | |
| 2823 | + | |
| 2824 | + | |
| 2825 | + | |
| 2826 | + | |
| 2827 | + | |
| 2828 | + | |
| 2829 | + | |
| 2830 | + | |
2830 | 2831 | | |
2831 | | - | |
2832 | | - | |
2833 | 2832 | | |
2834 | 2833 | | |
2835 | | - | |
2836 | | - | |
2837 | | - | |
2838 | | - | |
2839 | | - | |
2840 | | - | |
2841 | | - | |
| 2834 | + | |
2842 | 2835 | | |
2843 | | - | |
| 2836 | + | |
| 2837 | + | |
| 2838 | + | |
| 2839 | + | |
2844 | 2840 | | |
2845 | | - | |
| 2841 | + | |
| 2842 | + | |
| 2843 | + | |
| 2844 | + | |
2846 | 2845 | | |
2847 | 2846 | | |
2848 | | - | |
| 2847 | + | |
| 2848 | + | |
| 2849 | + | |
| 2850 | + | |
| 2851 | + | |
| 2852 | + | |
| 2853 | + | |
| 2854 | + | |
| 2855 | + | |
| 2856 | + | |
| 2857 | + | |
| 2858 | + | |
| 2859 | + | |
| 2860 | + | |
| 2861 | + | |
| 2862 | + | |
2849 | 2863 | | |
2850 | | - | |
2851 | | - | |
| 2864 | + | |
| 2865 | + | |
| 2866 | + | |
| 2867 | + | |
| 2868 | + | |
| 2869 | + | |
| 2870 | + | |
| 2871 | + | |
| 2872 | + | |
| 2873 | + | |
| 2874 | + | |
| 2875 | + | |
| 2876 | + | |
| 2877 | + | |
| 2878 | + | |
| 2879 | + | |
| 2880 | + | |
| 2881 | + | |
2852 | 2882 | | |
2853 | | - | |
| 2883 | + | |
| 2884 | + | |
| 2885 | + | |
| 2886 | + | |
| 2887 | + | |
| 2888 | + | |
| 2889 | + | |
| 2890 | + | |
2854 | 2891 | | |
2855 | | - | |
2856 | | - | |
2857 | | - | |
2858 | | - | |
2859 | | - | |
2860 | | - | |
| 2892 | + | |
| 2893 | + | |
| 2894 | + | |
2861 | 2895 | | |
2862 | | - | |
| 2896 | + | |
2863 | 2897 | | |
2864 | | - | |
2865 | | - | |
2866 | | - | |
2867 | | - | |
2868 | | - | |
2869 | | - | |
2870 | | - | |
2871 | | - | |
2872 | | - | |
2873 | | - | |
2874 | | - | |
2875 | | - | |
2876 | | - | |
2877 | | - | |
2878 | | - | |
2879 | | - | |
2880 | | - | |
2881 | | - | |
2882 | | - | |
| 2898 | + | |
| 2899 | + | |
| 2900 | + | |
| 2901 | + | |
| 2902 | + | |
| 2903 | + | |
| 2904 | + | |
| 2905 | + | |
| 2906 | + | |
| 2907 | + | |
| 2908 | + | |
| 2909 | + | |
| 2910 | + | |
| 2911 | + | |
| 2912 | + | |
| 2913 | + | |
| 2914 | + | |
| 2915 | + | |
| 2916 | + | |
| 2917 | + | |
| 2918 | + | |
| 2919 | + | |
| 2920 | + | |
| 2921 | + | |
| 2922 | + | |
| 2923 | + | |
| 2924 | + | |
| 2925 | + | |
| 2926 | + | |
| 2927 | + | |
| 2928 | + | |
| 2929 | + | |
| 2930 | + | |
| 2931 | + | |
| 2932 | + | |
| 2933 | + | |
| 2934 | + | |
| 2935 | + | |
| 2936 | + | |
| 2937 | + | |
| 2938 | + | |
| 2939 | + | |
| 2940 | + | |
| 2941 | + | |
| 2942 | + | |
| 2943 | + | |
| 2944 | + | |
| 2945 | + | |
| 2946 | + | |
| 2947 | + | |
| 2948 | + | |
| 2949 | + | |
| 2950 | + | |
| 2951 | + | |
| 2952 | + | |
| 2953 | + | |
| 2954 | + | |
| 2955 | + | |
| 2956 | + | |
| 2957 | + | |
| 2958 | + | |
| 2959 | + | |
| 2960 | + | |
| 2961 | + | |
| 2962 | + | |
| 2963 | + | |
| 2964 | + | |
| 2965 | + | |
| 2966 | + | |
| 2967 | + | |
| 2968 | + | |
2883 | 2969 | | |
2884 | 2970 | | |
2885 | 2971 | | |
| |||
2888 | 2974 | | |
2889 | 2975 | | |
2890 | 2976 | | |
| 2977 | + | |
| 2978 | + | |
| 2979 | + | |
| 2980 | + | |
| 2981 | + | |
| 2982 | + | |
| 2983 | + | |
| 2984 | + | |
| 2985 | + | |
| 2986 | + | |
| 2987 | + | |
| 2988 | + | |
| 2989 | + | |
2891 | 2990 | | |
2892 | | - | |
| 2991 | + | |
| 2992 | + | |
2893 | 2993 | | |
2894 | 2994 | | |
2895 | 2995 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
276 | 276 | | |
277 | 277 | | |
278 | 278 | | |
279 | | - | |
280 | | - | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
281 | 289 | | |
282 | | - | |
283 | | - | |
| 290 | + | |
284 | 291 | | |
285 | 292 | | |
286 | 293 | | |
287 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
288 | 343 | | |
289 | 344 | | |
290 | 345 | | |
291 | 346 | | |
292 | | - | |
293 | | - | |
294 | | - | |
| 347 | + | |
295 | 348 | | |
296 | 349 | | |
297 | 350 | | |
298 | 351 | | |
299 | | - | |
300 | | - | |
301 | | - | |
| 352 | + | |
302 | 353 | | |
303 | 354 | | |
304 | 355 | | |
305 | | - | |
306 | | - | |
307 | | - | |
| 356 | + | |
308 | 357 | | |
309 | 358 | | |
310 | 359 | | |
311 | | - | |
312 | | - | |
313 | | - | |
| 360 | + | |
314 | 361 | | |
315 | 362 | | |
316 | 363 | | |
317 | | - | |
318 | | - | |
319 | | - | |
| 364 | + | |
320 | 365 | | |
321 | 366 | | |
322 | 367 | | |
323 | 368 | | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
328 | | - | |
329 | | - | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
| 369 | + | |
334 | 370 | | |
335 | 371 | | |
336 | 372 | | |
337 | | - | |
338 | | - | |
339 | | - | |
| 373 | + | |
340 | 374 | | |
341 | 375 | | |
342 | 376 | | |
343 | | - | |
344 | | - | |
345 | | - | |
| 377 | + | |
346 | 378 | | |
347 | | - | |
348 | | - | |
349 | | - | |
| 379 | + | |
350 | 380 | | |
351 | 381 | | |
352 | 382 | | |
| |||
0 commit comments