Skip to content

Fix Gemma 4 quantized per-layer projection loading#935

Merged
Blaizzy merged 4 commits into
Blaizzy:mainfrom
spicyneuron:fix-gemma-4
Apr 7, 2026
Merged

Fix Gemma 4 quantized per-layer projection loading#935
Blaizzy merged 4 commits into
Blaizzy:mainfrom
spicyneuron:fix-gemma-4

Conversation

@spicyneuron
Copy link
Copy Markdown
Contributor

While trying to run unsloth/gemma-4-E2B-it-UD-MLX-4bit, I hit a loading error:

ValueError: Unable to quantize model of type <class 'mlx_lm.models.gemma4_text.ScaledLinear'>

After cross-checking this against the Gemma 4 implementation in Transformers (constructor, projection path), I replaced the custom ScaledLinear wrapper used for per_layer_model_projection with a standard bias-free nn.Linear and moved the hidden_size**-0.5 scale into the projection path explicitly.

The math stays the same, but the layer now works with MLX's normal quantization and loading flow.

For reference, here's the same fix on mlx-lm: ml-explore/mlx-lm#1112

Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Blaizzy Blaizzy merged commit b2cffea into Blaizzy:main Apr 7, 2026
1 check passed
@spicyneuron spicyneuron deleted the fix-gemma-4 branch April 30, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants