Fix speculator config for models with explicit head_dim#517
Conversation
Models like Qwen3.6-27B have hidden_size (5120) not divisible by num_attention_heads (24) because they use an explicit head_dim (256). LlamaConfig's validate_architecture rejects this, so recompute num_attention_heads as hidden_size // head_dim for the speculator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Purpose
Models like Laguna XS and Qwen3.6-27B have hidden_size (5120) not divisible by num_attention_heads (24) because they use an explicit head_dim (256). LlamaConfig's validate_architecture rejects this, so recompute num_attention_heads as hidden_size // head_dim for the speculator.
Description
This PR changes the way we calculate the attention heads so that the initialization doesn't fail when we use we use a hidden state dim that isn't divisible by the number of attention heads in the model, due to a limitation in the llama config.
Related Issue
NA
Tests
Using this PR makes Qwen 3.6 27B run whereas it previously failed
I have filled in: