You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Audio encoder fixes:
- Fix swapped conv norm weight mapping in tensor_mapping.py
(A_ENC_CONV_NORM and A_ENC_NORM_CONV had their gemma4 entries inverted,
causing the conv pre-norm and internal norm weights to be swapped in GGUF.
This produced 0.67 encoder cosine vs PyTorch; now 0.9999)
- Fix causal mask off-by-one: add (gq - gk) < max_past to match PyTorch's
dist < left_window_size (was attending to 13 past tokens instead of 12)
- Use -1e9 instead of -INFINITY for masked positions to match PyTorch's
attention_invalid_logits_value and avoid NaN in padded attention weights
LM fixes:
- Disable attention logit softcapping for Gemma4 (unlike Gemma2, Gemma4's
text model does not use attn softcapping; was incorrectly hardcoded)
- Use BF16-rounded embedding scale constants to match PyTorch's native
BF16 training precision (ref: PR ggml-org#21451). Fixes long-context coherence
on CPU/Vulkan backends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments