Commit dcc0aef
PR #1290 CI Cluster 3 #1311: 23 SmolVLM tests failing on master with the cluster's signature shape-mismatch:
System.ArgumentException : Input embedding dimension (384) does not match
weight dimension (378). Query shape: [1, 256, 384], Weights shape: [378, 378]
## Root cause
SmolVLM defaults: VisionDim=384, NumHeads=9. At the vision-encoder MHA construction in `CreateDefaultPixelShuffleProjectorLayers` (and 9 other VLM factories):
new MultiHeadAttentionLayer<T>(numHeads > 16 ? 16 : numHeads,
(visionDim) / (numHeads > 16 ? 16 : numHeads))
C# integer division: `384 / 9 = 42`. Then `MultiHeadAttentionLayer._embeddingDimension = 9 * 42 = 378` (NOT 384). The QKV weight matrices end up sized `[378, 378]`, but `PatchEmbeddingLayer` upstream emits patch tokens at visionDim=384 — so `ForwardInternal` throws at the very first vision MHA call.
The 9-heads / 384-vision-dim mismatch is paper-faithful (SmolVLM uses SmolLM's 9-head decoder config) but the vision encoder is SigLIP-Large @ 16 heads × 64 head-dim = 1024 vision-dim — different counts per subsystem. AiDotNet's `SmolVLMOptions` collapses both to a single `NumHeads` knob, so the factory reuses the decoder's 9 for the vision MHA where it doesn't divide.
Per-subsystem head counts on the options class (`NumVisionHeads` vs `NumDecoderHeads`) is the paper-faithful long-term fix but is an API-surface change. The minimal, no-surface-change fix is to snap the vision MHA's head count downward to the largest divisor of visionDim that's ≤ numHeads.
## Fix
Add `ChooseDivisibleHeadConfig(embedDim, requestedHeads, maxHeads = 16)` helper in `LayerHelper<T>` that returns `(heads, headDim)` with `heads * headDim == embedDim` exactly — finds the largest `h ≤ min(requestedHeads, maxHeads)` such that `embedDim % h == 0`. For SmolVLM (visionDim=384, numHeads=9): start at 9, 384%9=6≠0, drop to 8, 384%8=0 ✓ → `(8, 48)`. MHA gets [384, 384] weights matching the 384-dim input.
Add `CreateVisionMha(visionDim, numHeads, initializationStrategy?)` shim that applies the helper and returns the configured `MultiHeadAttentionLayer<T>`. Replace all 10 inline `new MultiHeadAttentionLayer<T>(numHeads > 16 ? 16 : numHeads, ...)` call sites across the VLM factories.
Snapping heads downward (vs upward / padding embedDim) keeps every other shape in the chain unchanged — FFN, LayerNorm, downstream Dense all keep their visionDim-wide view. The trade-off is the attention pattern uses slightly fewer heads than the upstream model card; that's strictly more local than reshaping the entire residual stream.
## Verification
Pre-fix (current master):
$ dotnet test --filter "FullyQualifiedName~EmotiVoiceTests|FullyQualifiedName~Phi3VisionTests|FullyQualifiedName~SmolVLMTests|FullyQualifiedName~RainbowDQNAgentTests"
Failed: 47, Passed: 37
EmotiVoiceTests: pass=26, fail=1 (timeout)
Phi3VisionTests: pass=2, fail=23 (all OOM/timeout, foundation-scale)
RainbowDQNAgentTests: pass=7, fail=0
SmolVLMTests: pass=2, fail=23 (all shape-mismatch — THIS PR)
Post-fix:
$ dotnet test --filter "FullyQualifiedName~SmolVLMTests"
Failed: 14, Passed: 11
Remaining 14 failures: 7 OutOfMemoryException + 6 timeout 120s + 1 timeout 180s
— NO MORE shape mismatch.
So this PR closes **23 of 23 SmolVLM shape-contract failures**. The remaining 14 SmolVLM failures (plus Phi3Vision's 23) are foundation-scale resource issues — same class as #1394 (ResNet/VGG ImageNet-scale perf). Different root cause, separate follow-up.
## Affected paths (10 sites)
All `(visionDim) / (numHeads > 16 ? 16 : numHeads)` patterns in VLM factories:
- CreateDefaultEncoderDecoderVLMLayers
- CreateDefaultVisualExpertVLMLayers
- CreateDefaultCrossAttentionResamplerVLMLayers
- CreateDefaultPixelShuffleProjectorLayers (SmolVLM — direct fix here)
- CreateDefaultVisionAdapterLayers (Phi3Vision)
- CreateDefaultTokenReductionVLMLayers (DeepSeek-VL)
- + 4 more
Closes #1311 partially (shape-contract root cause for SmolVLM; defensive fix applied to all 10 vision-encoder MHA sites). Foundation-scale resource residue tracked elsewhere.
Co-authored-by: franklinic <franklin@ivorycloud.com>
1 parent 7207983 commit dcc0aef
1 file changed
Lines changed: 77 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
31 | 98 | | |
32 | 99 | | |
33 | 100 | | |
| |||
23623 | 23690 | | |
23624 | 23691 | | |
23625 | 23692 | | |
23626 | | - | |
| 23693 | + | |
23627 | 23694 | | |
23628 | 23695 | | |
23629 | 23696 | | |
| |||
23680 | 23747 | | |
23681 | 23748 | | |
23682 | 23749 | | |
23683 | | - | |
| 23750 | + | |
23684 | 23751 | | |
23685 | 23752 | | |
23686 | 23753 | | |
| |||
23742 | 23809 | | |
23743 | 23810 | | |
23744 | 23811 | | |
23745 | | - | |
| 23812 | + | |
23746 | 23813 | | |
23747 | 23814 | | |
23748 | 23815 | | |
| |||
23857 | 23924 | | |
23858 | 23925 | | |
23859 | 23926 | | |
23860 | | - | |
| 23927 | + | |
23861 | 23928 | | |
23862 | 23929 | | |
23863 | 23930 | | |
| |||
23918 | 23985 | | |
23919 | 23986 | | |
23920 | 23987 | | |
23921 | | - | |
| 23988 | + | |
23922 | 23989 | | |
23923 | 23990 | | |
23924 | 23991 | | |
| |||
24013 | 24080 | | |
24014 | 24081 | | |
24015 | 24082 | | |
24016 | | - | |
| 24083 | + | |
24017 | 24084 | | |
24018 | 24085 | | |
24019 | 24086 | | |
| |||
24077 | 24144 | | |
24078 | 24145 | | |
24079 | 24146 | | |
24080 | | - | |
| 24147 | + | |
24081 | 24148 | | |
24082 | 24149 | | |
24083 | 24150 | | |
| |||
24182 | 24249 | | |
24183 | 24250 | | |
24184 | 24251 | | |
24185 | | - | |
| 24252 | + | |
24186 | 24253 | | |
24187 | 24254 | | |
24188 | 24255 | | |
| |||
24326 | 24393 | | |
24327 | 24394 | | |
24328 | 24395 | | |
24329 | | - | |
| 24396 | + | |
24330 | 24397 | | |
24331 | 24398 | | |
24332 | 24399 | | |
| |||
24391 | 24458 | | |
24392 | 24459 | | |
24393 | 24460 | | |
24394 | | - | |
| 24461 | + | |
24395 | 24462 | | |
24396 | 24463 | | |
24397 | 24464 | | |
| |||
0 commit comments