Skip to content

Commit 18a3d6a

Browse files
ooplesclaudefranklinic
authored
feat(#1370): shape oracle TryDeclareShape() — eliminate LoRA warmup forward when ctor carries enough info (#1388)
* feat(#1370): shape oracle trydeclareshape skips lora warmup Adds a layer-side shape declaration mechanism that lets AiModelBuilder skip its LoRA-warmup forward pass when every layer can declare its parameter shapes from constructor args alone. Matches PyTorch / HuggingFace PEFT's construction-time shape model without giving up AiDotNet's lazy-shape flexibility (lazy convs and inferred-shape layers still trigger the warmup fallback). Phase 1 — foundation: - Add `public virtual bool TryDeclareShape()` to LayerBase<T>. Default impl returns `IsShapeResolved`. Layers whose ctor carries enough info to allocate weights override this to materialise their state and return true. Phase 2 — high-value overrides: - LayerNormalizationLayer: new eager `(int featureSize, double epsilon)` ctor that allocates gamma/beta immediately. Uses the default TryDeclareShape impl (which returns IsShapeResolved=true on the eager path). The existing parameter-less lazy ctor is unchanged. - MultiHeadAttentionLayer: override TryDeclareShape to call EnsureWeightsAllocated (now internal). Allocates Q/K/V/O matrices from ctor-known embeddingDim and returns true even though InputShape still has a -1 seq placeholder — LoRA wraps weight matrices, the seq placeholder doesn't matter. Documented asymmetry with IsShapeResolved on the override. Phase 3 — rewire AiModelBuilder.BuildSupervisedInternalAsync: - Before the warmup forward, loop over layers and call TryDeclareShape on each. Count declared-vs-still-needs-warmup. When ALL declare successfully, skip the warmup entirely (Trace.TraceInformation surfaces the skip for observability). When any layer needs warmup, fall back to the existing warmup forward. - Change the LoRA wrap gate from IsShapeResolved to TryDeclareShape so MHA (whose seq stays -1 but weights are allocated) gets wrapped. Tests: - New ShapeOracleIssue1370Tests covers Phase 1 default impl, Phase 2 LayerNorm eager ctor + MHA override (including idempotency + the documented asymmetry). Test class joined LayerSerializationCollection so MHA weight init does not shift seeds for the parallel auto-generated TapeGradient tests. - One pre-existing failing test (LayerNorm_ParameterCount_IsTwiceFeatureSize) migrated to the new eager (featureSize, ...) ctor; now passes. - Bucket10 LoRA test still passes — verifies the rewire didn't change the wrap-loop outcome on the canonical LoRA-bucket model. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(#1370): phase 4 trydeclareshape sweep across remaining lazy layers Extends the shape oracle to four more layer types that were lazy on master but carry enough info in their constructors to declare shape eagerly: Eager-init constructors (default TryDeclareShape via IsShapeResolved): - BatchNormalizationLayer: new (int numFeatures, double epsilon, double momentum) ctor allocates gamma/beta/runningMean/runningVariance immediately. Existing parameter-less lazy ctor unchanged. - RMSNormalizationLayer: new (int featureSize, double epsilon) ctor allocates gamma immediately. Existing parameter-less lazy ctor unchanged. TryDeclareShape overrides (allocate state from ctor-known dims, return true): - PReLULayer: alpha is already allocated + registered in the existing ctor; only the broadcast shape is forward-runtime-deferred. Override returns true unconditionally so LoRA can wrap the layer without a warmup forward. - TransformerEncoderLayer: the eager-dimension ctor (numHeads, feedForwardDim, embeddingSize) constructs sublayers immediately. Override returns true when isInitialized or IsShapeResolved; lazy ctor (embeddingSize == -1) still falls through to the warmup forward via false. Tests: - Extended ShapeOracleIssue1370Tests with 17 more tests covering all four layers. 30/30 unit tests pass. - Migrated three pre-existing failing tests to use the new eager ctors: - AdvancedLayersIntegrationTests.BatchNormalizationLayer_ParameterCount_IsPositive - AdvancedLayersIntegrationTests.TransformerEncoderLayer_ParameterCount_ReturnsPositiveValue - NormalizationLayersIntegrationTests.BatchNormalizationLayer_ParameterCount_IncludesGammaAndBeta Coverage summary across the LayerBase<T> subclass sweep: - Already-eager layers (default impl correct): GroupNormalizationLayer, SelfAttentionLayer (default Eager init strategy) - Now-eager layers (this PR): LayerNormalizationLayer, BatchNormalizationLayer, RMSNormalizationLayer, MultiHeadAttentionLayer, PReLULayer, TransformerEncoderLayer - Cannot declare from ctor (input dependent — default correct): Dense/FullyConnected/FeedForward (LazyLinear analogs, need input width), Conv variants (LazyConv2d analogs, need input channels), LSTM/GRU/RNN, AttentionLayer (older API), TransformerDecoderLayer (lazy ctor only) - Stateless layers (no LoRA-relevant weights, default false correct since LoRA wrap-type check skips them anyway): Activation, Pooling, Reshape, Flatten, Transpose, Padding, Cropping, Slicing, Split, Upsample, PixelShuffle, GaussianNoise, Masking Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #1388 review): try/catch TryDeclareShape + LoRA-target pre-scan + private EnsureWeightsAllocated * fix(PR #1388 follow-up): also gate the wrap-loop TryDeclareShape probe on IsLoRATarget * fix(pr1388 review): narrow trydeclareshape visibility + transformerencoder allocation gate addresses two coderabbit comments on pr #1388: 1. trydeclareshape() narrowed public -> internal across layerbase + 3 overrides (mha, prelu, transformerencoder). same rationale as the backwardandsteponprecomputedloss fix in pr #1389: this hook is shape-oracle orchestration plumbing for aimodelbuilder and in-assembly callers (#1370), users only see the builder surface. internalsvisibleto already covers aidotnettests for the existing shapeoracleissue1370tests. 2. transformerencoderlayer.trydeclareshape no longer returns true for isshaperesolved without checking sublayer allocation. the prior `_isinitialized || isshaperesolved` could flip true via a non-allocating upstream shape declaration, skipping warmup while weight matrices were still missing — lora wrapping then operated on empty sublayer state. new behavior: - if `_isinitialized` -> true (allocated) - else if `_embeddingsize > 0` -> call ensureinitialized() then return based on result - else -> false (genuinely lazy, fall through to warmup) verified locally: 30/30 shapeoracleissue1370tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent 36fb35f commit 18a3d6a

13 files changed

Lines changed: 782 additions & 105 deletions

src/AiModelBuilder.cs

Lines changed: 156 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -2813,73 +2813,159 @@ void OnAutoMLCandidate(IFullModel<T, TInput, TOutput> candidate)
28132813
{
28142814
System.Diagnostics.Trace.TraceInformation("Applying LoRA adapters to neural network layers...");
28152815

2816-
// Warmup forward to materialise lazy-init layers BEFORE LoRA
2817-
// wrapping. LoRAAdapterBase.CreateLoRALayer needs the
2818-
// layer's input/output dimensions at adapter-construction
2819-
// time; lazy layers (LayerNorm gamma/beta, MultiHeadAttention
2820-
// lazy weight banks) report (0, …) until first Forward
2821-
// materialises the shape. Without the warmup, LoRALayer's
2822-
// ctor would throw ArgumentOutOfRangeException("Output size
2823-
// must be positive"). Best-effort: if the warmup throws
2824-
// (e.g. the user wired a forward path that requires training
2825-
// mode), the ApplyLoRA-side IsShapeResolved guard silently
2826-
// skips still-unresolved layers so the wrap loop succeeds on
2827-
// the materialised ones. Discovered by AiDotNet#1345 Bucket10
2828-
// ConfigureLoRA test.
2829-
try
2816+
// AiDotNet#1370 shape oracle: pre-loop asks every layer to declare its
2817+
// shape from constructor args alone (TryDeclareShape). Layers like
2818+
// MultiHeadAttentionLayer (knows embeddingDim from ctor) and any
2819+
// layer constructed with explicit shape (e.g. LayerNormalizationLayer
2820+
// with the featureSize ctor) return true without needing input.
2821+
// Lazy convs / inferred-shape layers still return false and trigger
2822+
// the existing warmup forward as a fallback.
2823+
// PR #1388 review C7iL5: TryDeclareShape() is a public virtual
2824+
// extension point — a custom layer override can throw arbitrary
2825+
// exceptions. Treat non-fatal failures as "shape not declared"
2826+
// (falls back to the warmup forward below), but let cancellation
2827+
// and OOM propagate so the host can still abort. Trace the
2828+
// failure with the layer type + full exception so the operator
2829+
// can diagnose silently-skipped declarations.
2830+
static bool TryDeclareShapeSafely(NeuralNetworks.Layers.LayerBase<T> layer)
28302831
{
2831-
bool prevTrainingMode = neuralNetForLoRA.IsTrainingMode;
2832-
neuralNetForLoRA.SetTrainingMode(false);
28332832
try
28342833
{
2835-
// One sample is enough to resolve lazy-layer shapes;
2836-
// a full-dataset forward would do O(N) work and
2837-
// allocate a full pass of activation tensors just to
2838-
// shape-resolve. Carve off a 1-row probe.
2839-
var warmupProbe = TrySliceFirstSampleForLoRAWarmup(x);
2840-
var warmupResult = _model.Predict(warmupProbe);
2841-
System.GC.KeepAlive(warmupResult);
2834+
return layer.TryDeclareShape();
28422835
}
2843-
finally
2836+
catch (Exception ex) when (
2837+
ex is not OperationCanceledException
2838+
&& ex is not OutOfMemoryException
2839+
&& ex is not StackOverflowException)
28442840
{
2845-
neuralNetForLoRA.SetTrainingMode(prevTrainingMode);
2841+
System.Diagnostics.Trace.TraceWarning(
2842+
$"TryDeclareShape failed for {layer.GetType().FullName} — " +
2843+
$"treating as 'needs warmup': {ex}");
2844+
return false;
28462845
}
28472846
}
2848-
catch (OperationCanceledException)
2847+
2848+
// PR #1388 review C8mvN: only let LoRA-targeted layers drive the
2849+
// warmup-skip decision. A non-target lazy layer (e.g. a lazy
2850+
// ActivationLayer or DropoutLayer) won't be wrapped by ApplyLoRA
2851+
// anyway — counting it as "needs warmup" forces the warmup
2852+
// forward needlessly on mixed networks. Use the configuration's
2853+
// own non-mutating eligibility predicate when available; for a
2854+
// custom ILoRAConfiguration implementation that doesn't expose
2855+
// one, fall back to "every LayerBase counts" (conservative —
2856+
// may force an unnecessary warmup, but never skips one
2857+
// incorrectly).
2858+
var loraTargetProbe = _loraConfiguration as LoRA.DefaultLoRAConfiguration<T>;
2859+
2860+
int declaredCount = 0;
2861+
int needsWarmupCount = 0;
2862+
for (int i = 0; i < neuralNetForLoRA.Layers.Count; i++)
28492863
{
2850-
// Cancellation propagates — caller wants out, not a swallowed warmup.
2851-
throw;
2864+
var layer = neuralNetForLoRA.Layers[i];
2865+
if (layer is not NeuralNetworks.Layers.LayerBase<T> declarable)
2866+
{
2867+
// Non-LayerBase<T> layers (rare, e.g. wrapper adapters from a
2868+
// prior pass) bypass the oracle entirely — the ApplyLoRA call
2869+
// handles its own shape probing.
2870+
continue;
2871+
}
2872+
if (loraTargetProbe is not null && !loraTargetProbe.IsLoRATarget(declarable))
2873+
{
2874+
// Not a LoRA target — its shape doesn't gate the warmup-skip
2875+
// decision. Skip without bumping either counter.
2876+
continue;
2877+
}
2878+
if (TryDeclareShapeSafely(declarable))
2879+
declaredCount++;
2880+
else
2881+
needsWarmupCount++;
28522882
}
2853-
catch (OutOfMemoryException)
2883+
2884+
// If every shape-aware layer declared successfully, skip the warmup
2885+
// forward entirely — this is the win that beats PyTorch / HuggingFace
2886+
// PEFT's construction-time shape requirement: we get the zero-warmup
2887+
// behavior when shapes are known, AND still support lazy layers via
2888+
// the warmup fallback below when needed.
2889+
bool skipWarmup = needsWarmupCount == 0;
2890+
if (skipWarmup)
28542891
{
2855-
// Critical: don't mask. The host may need to abort.
2856-
// StackOverflowException is intentionally NOT listed —
2857-
// modern .NET terminates the process on SOE rather than
2858-
// letting it propagate, so a catch clause for it is
2859-
// unreachable (review #1368 C7mpq).
2860-
throw;
2892+
System.Diagnostics.Trace.TraceInformation(
2893+
$"LoRA warmup forward SKIPPED — all {declaredCount} shape-aware layer(s) " +
2894+
"declared shape from constructor args (AiDotNet#1370 shape oracle).");
28612895
}
2862-
catch (Exception ex)
2896+
else
28632897
{
2864-
// Best-effort warmup: documented forward-mode requirements
2865-
// (e.g. layers that need IsTrainingMode=true) can throw here.
2866-
// The ApplyLoRA-side IsShapeResolved guard silently skips
2867-
// still-unresolved layers so the wrap loop succeeds on
2868-
// materialized ones (review #1368 C6WOG: narrowed to let
2869-
// OperationCanceledException + OutOfMemoryException +
2870-
// StackOverflowException propagate; everything else is
2871-
// genuine warmup variance and stays as a Trace warning).
2872-
// Include ex.ToString() so the trace carries the full
2873-
// stack trace + inner exceptions, not just the top-frame
2874-
// message. Trace.TraceWarning is the only signal an
2875-
// operator has when the warmup fails silently (this PR's
2876-
// review C88M6: ex.Message dropped the origin frame and
2877-
// any chained inner exception, leaving a downstream
2878-
// skipped-lazy-layer mystery if the warmup actually
2879-
// failed inside an unrelated subsystem).
2880-
System.Diagnostics.Trace.TraceWarning(
2881-
$"LoRA warmup forward failed (proceeding — layers that materialised get wrapped; " +
2882-
$"lazy ones skipped via IsShapeResolved guard): {ex}");
2898+
System.Diagnostics.Trace.TraceInformation(
2899+
$"LoRA warmup forward required — {needsWarmupCount} layer(s) still need a forward " +
2900+
$"pass to resolve shape ({declaredCount} declared from ctor).");
2901+
2902+
// Warmup forward to materialise lazy-init layers that didn't
2903+
// self-declare. LoRAAdapterBase.CreateLoRALayer needs the
2904+
// layer's input/output dimensions at adapter-construction
2905+
// time; lazy layers that fall through TryDeclareShape report
2906+
// (0, …) until first Forward materialises the shape.
2907+
// Without the warmup, LoRALayer's ctor would throw
2908+
// ArgumentOutOfRangeException("Output size must be positive").
2909+
// Best-effort: if the warmup throws (e.g. the user wired a
2910+
// forward path that requires training mode), the ApplyLoRA-side
2911+
// IsShapeResolved guard silently skips still-unresolved layers
2912+
// so the wrap loop succeeds on the materialised ones.
2913+
// Discovered by AiDotNet#1345 Bucket10 ConfigureLoRA test.
2914+
try
2915+
{
2916+
bool prevTrainingMode = neuralNetForLoRA.IsTrainingMode;
2917+
neuralNetForLoRA.SetTrainingMode(false);
2918+
try
2919+
{
2920+
// One sample is enough to resolve lazy-layer shapes;
2921+
// a full-dataset forward would do O(N) work and
2922+
// allocate a full pass of activation tensors just to
2923+
// shape-resolve. Carve off a 1-row probe.
2924+
var warmupProbe = TrySliceFirstSampleForLoRAWarmup(x);
2925+
var warmupResult = _model.Predict(warmupProbe);
2926+
System.GC.KeepAlive(warmupResult);
2927+
}
2928+
finally
2929+
{
2930+
neuralNetForLoRA.SetTrainingMode(prevTrainingMode);
2931+
}
2932+
}
2933+
catch (OperationCanceledException)
2934+
{
2935+
// Cancellation propagates — caller wants out, not a swallowed warmup.
2936+
throw;
2937+
}
2938+
catch (OutOfMemoryException)
2939+
{
2940+
// Critical: don't mask. The host may need to abort.
2941+
// StackOverflowException is intentionally NOT listed —
2942+
// modern .NET terminates the process on SOE rather than
2943+
// letting it propagate, so a catch clause for it is
2944+
// unreachable (review #1368 C7mpq).
2945+
throw;
2946+
}
2947+
catch (Exception ex)
2948+
{
2949+
// Best-effort warmup: documented forward-mode requirements
2950+
// (e.g. layers that need IsTrainingMode=true) can throw here.
2951+
// The ApplyLoRA-side IsShapeResolved guard silently skips
2952+
// still-unresolved layers so the wrap loop succeeds on
2953+
// materialized ones (review #1368 C6WOG: narrowed to let
2954+
// OperationCanceledException + OutOfMemoryException +
2955+
// StackOverflowException propagate; everything else is
2956+
// genuine warmup variance and stays as a Trace warning).
2957+
// Include ex.ToString() so the trace carries the full
2958+
// stack trace + inner exceptions, not just the top-frame
2959+
// message. Trace.TraceWarning is the only signal an
2960+
// operator has when the warmup fails silently (this PR's
2961+
// review C88M6: ex.Message dropped the origin frame and
2962+
// any chained inner exception, leaving a downstream
2963+
// skipped-lazy-layer mystery if the warmup actually
2964+
// failed inside an unrelated subsystem).
2965+
System.Diagnostics.Trace.TraceWarning(
2966+
$"LoRA warmup forward failed (proceeding — layers that materialised get wrapped; " +
2967+
$"lazy ones skipped via IsShapeResolved guard): {ex}");
2968+
}
28832969
}
28842970

28852971
int adaptedCount = 0;
@@ -2888,8 +2974,22 @@ void OnAutoMLCandidate(IFullModel<T, TInput, TOutput> candidate)
28882974
{
28892975
var originalLayer = neuralNetForLoRA.Layers[i];
28902976

2977+
// AiDotNet#1370: gate on TryDeclareShape() rather than IsShapeResolved.
2978+
// Layers like MHA that allocate weights from ctor-known dims return true
2979+
// from TryDeclareShape even when InputShape still has a -1 seq placeholder
2980+
// — LoRA wraps weight matrices, the seq placeholder doesn't matter.
2981+
//
2982+
// PR #1388 follow-up review C9PtZ: only probe TryDeclareShape on
2983+
// layers that ApplyLoRA would actually wrap. A non-target lazy
2984+
// layer (e.g. a lazy ActivationLayer or DropoutLayer) would get
2985+
// its TryDeclareShape called, potentially allocating weights or
2986+
// emitting a Trace warning, only for ApplyLoRA below to return
2987+
// it unchanged. Gate on the same IsLoRATarget predicate the
2988+
// pre-scan loop uses so the side effects of TryDeclareShape
2989+
// only run for actual adaptation candidates.
28912990
if (originalLayer is NeuralNetworks.Layers.LayerBase<T> lazyCheck
2892-
&& !lazyCheck.IsShapeResolved)
2991+
&& (loraTargetProbe is null || loraTargetProbe.IsLoRATarget(lazyCheck))
2992+
&& !TryDeclareShapeSafely(lazyCheck))
28932993
{
28942994
skippedLazyCount++;
28952995
continue;

src/LoRA/DefaultLoRAConfiguration.cs

Lines changed: 68 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -276,77 +276,107 @@ public ILayer<T> ApplyLoRA(ILayer<T> layer)
276276
return layer;
277277
}
278278

279-
// Check if this is a layer type that benefits from LoRA adaptation
280-
// (layers with trainable weight matrices)
279+
// Graph convolutional layers - use specialized GraphConvolutionalLoRAAdapter
280+
// which implements IGraphConvolutionLayer<T> and properly delegates graph methods.
281+
// Kept separate from the IsLoRATargetType type-whitelist (which uses
282+
// CreateAdapter for everything else) because the GraphConvolutionalLoRAAdapter
283+
// ctor takes (layer, Rank, Alpha, FreezeBaseLayer) directly rather than going
284+
// through the standard CreateAdapter dispatch.
285+
if (layer is IGraphConvolutionLayer<T>)
286+
{
287+
return new GraphConvolutionalLoRAAdapter<T>(layer, Rank, Alpha, FreezeBaseLayer);
288+
}
281289

282-
// Dense/Linear layers
283-
if (layer is DenseLayer<T> || layer is FullyConnectedLayer<T> || layer is FeedForwardLayer<T>)
290+
if (IsLoRATargetType(layer))
284291
{
285292
return CreateAdapter(layer);
286293
}
287294

295+
// Return layers without trainable weights unchanged
296+
// (Activation, Pooling, Dropout, Flatten, Reshape, Normalization, etc.)
297+
return layer;
298+
}
299+
300+
/// <summary>
301+
/// Non-mutating predicate: returns <c>true</c> when this configuration would
302+
/// wrap <paramref name="layer"/> with a LoRA adapter (modulo the
303+
/// shape-resolved guard, which is independent of the layer type).
304+
/// </summary>
305+
/// <remarks>
306+
/// <para>
307+
/// Shares the same layer-type whitelist as <see cref="ApplyLoRA"/> so a
308+
/// caller (typically <see cref="AiModelBuilder{T,TInput,TOutput}"/>'s
309+
/// pre-wrap warmup-skip decision) can probe which layers will actually
310+
/// participate in the LoRA pass without paying for adapter construction.
311+
/// Returns <c>true</c> for graph-convolutional layers too — they route
312+
/// through <see cref="GraphConvolutionalLoRAAdapter{T}"/> in
313+
/// <see cref="ApplyLoRA"/>, but the warmup-skip pre-scan only needs to
314+
/// know "would I wrap this", not which adapter type.
315+
/// </para>
316+
/// <para>
317+
/// AiDotNet#1370 PR #1388 review C7iL5 — the pre-scan that decides
318+
/// <c>skipWarmup</c> was treating every <see cref="LayerBase{T}"/> as a
319+
/// LoRA candidate, which forced the warmup forward whenever ANY lazy
320+
/// layer (even a non-target like a lazy Activation) hadn't declared.
321+
/// This predicate lets the pre-scan restrict its count to actual LoRA
322+
/// targets so non-target lazy layers don't block the zero-warmup path.
323+
/// </para>
324+
/// </remarks>
325+
public bool IsLoRATarget(ILayer<T> layer)
326+
{
327+
if (layer is null) return false;
328+
return layer is IGraphConvolutionLayer<T> || IsLoRATargetType(layer);
329+
}
330+
331+
/// <summary>
332+
/// Whitelist of concrete layer types that <see cref="ApplyLoRA"/> wraps via
333+
/// <see cref="CreateAdapter"/>. Kept as a single private method so the
334+
/// public <see cref="IsLoRATarget"/> probe and <see cref="ApplyLoRA"/>'s
335+
/// dispatch can't drift.
336+
/// </summary>
337+
private static bool IsLoRATargetType(ILayer<T> layer)
338+
{
339+
// Dense/Linear layers
340+
if (layer is DenseLayer<T> || layer is FullyConnectedLayer<T> || layer is FeedForwardLayer<T>)
341+
return true;
342+
288343
// Convolutional layers
289344
if (layer is ConvolutionalLayer<T> || layer is DeconvolutionalLayer<T> ||
290345
layer is DepthwiseSeparableConvolutionalLayer<T> || layer is DilatedConvolutionalLayer<T> ||
291346
layer is SeparableConvolutionalLayer<T> || layer is SubpixelConvolutionalLayer<T>)
292-
{
293-
return CreateAdapter(layer);
294-
}
347+
return true;
295348

296349
// Recurrent layers (LSTM, GRU, etc.)
297350
if (layer is LSTMLayer<T> || layer is GRULayer<T> || layer is RecurrentLayer<T> ||
298351
layer is ConvLSTMLayer<T> || layer is BidirectionalLayer<T>)
299-
{
300-
return CreateAdapter(layer);
301-
}
352+
return true;
302353

303354
// Attention layers
304355
if (layer is AttentionLayer<T> || layer is MultiHeadAttentionLayer<T> || layer is SelfAttentionLayer<T>)
305-
{
306-
return CreateAdapter(layer);
307-
}
356+
return true;
308357

309358
// Transformer layers
310359
if (layer is TransformerEncoderLayer<T> || layer is TransformerDecoderLayer<T>)
311-
{
312-
return CreateAdapter(layer);
313-
}
360+
return true;
314361

315362
// Embedding layers
316363
if (layer is EmbeddingLayer<T> || layer is PatchEmbeddingLayer<T>)
317-
{
318-
return CreateAdapter(layer);
319-
}
364+
return true;
320365

321366
// Specialized layers with trainable weights
322367
if (layer is LocallyConnectedLayer<T> || layer is HighwayLayer<T> ||
323368
layer is GatedLinearUnitLayer<T> || layer is SqueezeAndExcitationLayer<T>)
324-
{
325-
return CreateAdapter(layer);
326-
}
327-
328-
// Graph convolutional layers - use specialized GraphConvolutionalLoRAAdapter
329-
// which implements IGraphConvolutionLayer<T> and properly delegates graph methods
330-
if (layer is IGraphConvolutionLayer<T>)
331-
{
332-
return new GraphConvolutionalLoRAAdapter<T>(layer, Rank, Alpha, FreezeBaseLayer);
333-
}
369+
return true;
334370

335371
// Capsule layers
336372
if (layer is CapsuleLayer<T> || layer is PrimaryCapsuleLayer<T> || layer is DigitCapsuleLayer<T>)
337-
{
338-
return CreateAdapter(layer);
339-
}
373+
return true;
340374

341375
// CRF and other advanced layers
342376
if (layer is ConditionalRandomFieldLayer<T>)
343-
{
344-
return CreateAdapter(layer);
345-
}
377+
return true;
346378

347-
// Return layers without trainable weights unchanged
348-
// (Activation, Pooling, Dropout, Flatten, Reshape, Normalization, etc.)
349-
return layer;
379+
return false;
350380
}
351381

352382
/// <summary>

0 commit comments

Comments
 (0)