Skip to content

fix: resolve Gemma4 K-eq-V broadcast crash (closes #59)#60

Merged
solderzzc merged 1 commit into
mainfrom
fix/gemma4-keqv-transpose-issue-59
Apr 16, 2026
Merged

fix: resolve Gemma4 K-eq-V broadcast crash (closes #59)#60
solderzzc merged 1 commit into
mainfrom
fix/gemma4-keqv-transpose-issue-59

Conversation

@solderzzc
Copy link
Copy Markdown
Member

Summary

Bumps the mlx-swift-lm submodule to SharpAI/mlx-swift-lm#23 which fixes the root cause of the crash reported in #59.

Root Cause

All gemma-4-26b-a4b-it-* models use attention_k_eq_v: true for full-attention layers. In this path, the Swift port had a subtle double-transpose bug:

k → kNorm → transpose → [B, nKvH, L, D]
v = k  (already transposed)
v → vNorm → transpose → [B, L, nKvH, D]  ← WRONG!

The wrongly-shaped v caused the crash at SDPA:

[broadcast_shapes] (1,512,2,512) vs (1,2,512,512)

Changes (in mlx-swift-lm)

  • MLXLLM/Models/Gemma4Text.swift: When vProj is nil (K-eq-V path), skip the redundant transpose — just v = vNorm(k).
  • MLXVLM/Models/Gemma4.swift (stretch): Added LayerPartitionable + StreamableMoE conformance to Gemma4TextBackbone and wired streamExperts through the public Gemma4 VLM class, enabling SSD expert streaming on the VLM path (mirrors Qwen35.swift).

Testing

  • swift build --target MLXLLM
  • swift build --target MLXVLM
  • swift test --filter Gemma4Tests

Closes #59

@solderzzc solderzzc force-pushed the fix/gemma4-keqv-transpose-issue-59 branch from 0613b7c to 6d08be9 Compare April 16, 2026 22:08
@solderzzc solderzzc force-pushed the fix/gemma4-keqv-transpose-issue-59 branch from 6d08be9 to 90e294b Compare April 16, 2026 22:46
@solderzzc solderzzc merged commit ef2fca9 into main Apr 16, 2026
8 checks passed
@solderzzc solderzzc deleted the fix/gemma4-keqv-transpose-issue-59 branch April 16, 2026 23:21
@notatestuser
Copy link
Copy Markdown

Awesome and amazing. Thank you!

@solderzzc
Copy link
Copy Markdown
Member Author

Awesome and amazing. Thank you!

Thanks for your detailed bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: MLXLLM Gemma 4 MoE forward crashes with broadcast_shapes (1,512,2,512) vs (1,2,512,512)

2 participants