fix: resolve Gemma4 K-eq-V broadcast crash (closes #59)#60
Merged
Conversation
0613b7c to
6d08be9
Compare
6d08be9 to
90e294b
Compare
|
Awesome and amazing. Thank you! |
Member
Author
Thanks for your detailed bug report. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bumps the
mlx-swift-lmsubmodule to SharpAI/mlx-swift-lm#23 which fixes the root cause of the crash reported in #59.Root Cause
All
gemma-4-26b-a4b-it-*models useattention_k_eq_v: truefor full-attention layers. In this path, the Swift port had a subtle double-transpose bug:The wrongly-shaped
vcaused the crash at SDPA:Changes (in mlx-swift-lm)
MLXLLM/Models/Gemma4Text.swift: WhenvProjis nil (K-eq-V path), skip the redundant transpose — justv = vNorm(k).MLXVLM/Models/Gemma4.swift(stretch): AddedLayerPartitionable+StreamableMoEconformance toGemma4TextBackboneand wiredstreamExpertsthrough the publicGemma4VLM class, enabling SSD expert streaming on the VLM path (mirrorsQwen35.swift).Testing
swift build --target MLXLLM✅swift build --target MLXVLM✅swift test --filter Gemma4Tests✅Closes #59