Commit 3472132
fix: Gemma 4 audio — mel preprocessing, weight loading, feature extractor (#931)
* fix: Gemma 4 audio — mel preprocessing, weight loading, feature extractor
Fix four bugs preventing Gemma 4 audio from working:
1. Missing semicausal left-padding in audio feature extractor.
The HF reference prepends frame_length//2 (160) zero samples before
the unfold, centering the first frame at t=0. Without this, the mel
spectrogram is misaligned and the frame count is wrong, which also
causes the broadcast shapes error (issue #923).
2. Wrong Hann window formula. Used cos(2*pi*(n+0.5)/N) instead of the
correct periodic Hann cos(2*pi*n/N). The +0.5 phase shift produces
meaningfully different spectral values from what the model was
trained on.
3. sanitize() double-nests language_model weights (issue #912).
HF keys like model.language_model.model.embed_tokens.weight become
language_model.model.embed_tokens.weight after stripping model.,
which already matches the MLX path. The unconditional insertion of
.model. created language_model.model.model.*, so all LM weights
loaded as zero.
4. Feature extractor not instantiated (issue #903). Only created when
processor_config.json contains a "feature_extractor" key, which
standard HF checkpoints don't include. Now instantiates with USM
defaults unconditionally.
Fixes #903, #912, #923
* format
* format
* Update audio feature extractor in Gemma4 model to match hf
---------
Co-authored-by: Stephen Cox <stephencoxmail@gmail.com>
Co-authored-by: Prince Canuma <prince.gdt@gmail.com>1 parent b2cffea commit 3472132
3 files changed
Lines changed: 46 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
160 | 163 | | |
161 | | - | |
162 | 164 | | |
163 | 165 | | |
164 | 166 | | |
| |||
209 | 211 | | |
210 | 212 | | |
211 | 213 | | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
212 | 222 | | |
213 | 223 | | |
214 | 224 | | |
| |||
239 | 249 | | |
240 | 250 | | |
241 | 251 | | |
242 | | - | |
| 252 | + | |
243 | 253 | | |
244 | 254 | | |
245 | 255 | | |
| |||
248 | 258 | | |
249 | 259 | | |
250 | 260 | | |
251 | | - | |
252 | | - | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
253 | 268 | | |
254 | 269 | | |
255 | 270 | | |
| |||
341 | 356 | | |
342 | 357 | | |
343 | 358 | | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
344 | 365 | | |
345 | 366 | | |
346 | 367 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
195 | 195 | | |
196 | 196 | | |
197 | 197 | | |
198 | | - | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
199 | 201 | | |
200 | 202 | | |
201 | 203 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
548 | 548 | | |
549 | 549 | | |
550 | 550 | | |
551 | | - | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
552 | 555 | | |
553 | | - | |
554 | | - | |
555 | | - | |
| 556 | + | |
| 557 | + | |
556 | 558 | | |
557 | | - | |
558 | | - | |
559 | | - | |
560 | | - | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
561 | 563 | | |
562 | | - | |
563 | | - | |
564 | | - | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
565 | 567 | | |
566 | 568 | | |
567 | 569 | | |
| |||
0 commit comments