Commit 390d7ed
fix(scaffold): add Gemma3 + DeepSeekVL/InternVL family to patch-vision list (#1420)
* fix(scaffold): add Gemma3 + DeepSeekVL/InternVL family to patch-vision list
PR #1408 Generated Layers shard (run 26254401589 job 77275610156) had
23 Gemma3 tests all failing at the same boundary:
System.ArgumentException : Image H/W (128/128) must be divisible by
patchSize (14)
at PatchEmbeddingLayer.OnFirstForward
at Gemma3.Train / Gemma3.Predict
Gemma3 (Google 2025) uses SigLIP-SO 14×14 patches per its paper
(ImageSize=896 / sqrt(MaxVisualTokens=4096) = 14). The auto-scaffold's
generic vision-model branch emitted a [3, 128, 128] input that's not
divisible by 14, so every test that calls Train or Predict hard-rejected
at the very first layer.
Add the missing prefixes to s_patchVisionFamilies so the helper returns
the patch-divisible 112 (= lcm(14, 16)) spatial size: Gemma, DeepSeekVL,
InternVL, Llama32Vision, Phi3Vision, Phi4Multimodal. All six use
ComputeVisualPatchSize → patchSize=14 via the LLaVA-MLP / SigLIP-SO
vision adapter path. The existing 112 helper survives every patch-14
and patch-16 division, so no other vision model regresses.
Also mark Gemma3 as paper-scale (vision dim 1152, 27 vision layers,
3584 decoder dim, 36 decoder layers — true 3B-foundation scale)
so its iteration-count overrides for paper-scale models engage. The
warm-up Predict still OOMs on a standard CI runner because Gemma3's
default config materializes too many lazy DenseLayer weights at
construction time; the OOM constraint is independent of patch
divisibility and remains as follow-up (likely needs streaming-pool
engagement or a per-class scaffold override that constructs Gemma3
with reduced dims for testing).
* docs(#1420): document streaming-engagement blocker for Gemma3 OOM
The scaffold patch-divisibility fix lets Gemma3 reach the warm-up
Predict's first lazy weight materialization, where it then OOMs the
runner because Gemma3's paper-scale defaults (3B+ params via
VisionDim=1152, 27 vision + 36 decoder layers) overflow the GC heap
before any streaming-pool engagement.
The natural fix is to call ConfigureWeightLifetime(new GpuOffloadOptions())
in InitializeLayers after the layer list is populated but before any
weight materializes — exactly what the LayerBase.UseStreamingAllocator
flag was built for. I prototyped this locally and confirmed the lazy
DenseLayer's AllocateLazyWeight DID route through the streaming pool
(PredictEagerStreaming kicked in immediately).
But the streaming path then trips a deeper engine bug:
System.InvalidOperationException : Streaming drop requires sole storage
ownership; storage refcount is 2. Register the weight via
WeightRegistry before any RebindStorageFrom / view operation that
shares its storage.
at TensorBase.DropStorageForStreaming
at WeightRegistry.RegisterWeight
at PredictEagerStreaming:3775 (RegisterLayerTrainableTensorsWithWeightRegistry)
at Gemma3.Train
The lazy weight tensor that PatchEmbeddingLayer.OnFirstForward
materializes ends up with refcount=2 on its storage by the time
PredictEagerStreaming's post-forward re-registration runs. Some view /
init op (Xavier init, RegisterTrainableParameter, or
ResolveShapes) is producing a second reference to the underlying
storage that DropStorageForStreaming refuses to silently drop.
Deferring the streaming pre-engagement until the engine-side
refcount issue is fixed. The patch-divisibility scaffold portion of
this PR stays — it's the necessary first half (unblocks the same-
shape failures across DeepSeekVL / InternVL / Llama32Vision / Phi3
/ Phi4 even on smaller configs where OOM isn't an issue).
---------
Co-authored-by: franklinic <franklin@ivorycloud.com>1 parent a801d11 commit 390d7ed
2 files changed
Lines changed: 35 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1239 | 1239 | | |
1240 | 1240 | | |
1241 | 1241 | | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
1242 | 1252 | | |
1243 | 1253 | | |
1244 | 1254 | | |
| |||
4350 | 4360 | | |
4351 | 4361 | | |
4352 | 4362 | | |
| 4363 | + | |
| 4364 | + | |
| 4365 | + | |
| 4366 | + | |
| 4367 | + | |
| 4368 | + | |
4353 | 4369 | | |
4354 | 4370 | | |
4355 | 4371 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
137 | 138 | | |
138 | 139 | | |
139 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
140 | 159 | | |
141 | 160 | | |
142 | 161 | | |
| |||
0 commit comments