Fix speculator config for models with explicit head_dim by MeganEFlynn · Pull Request #517 · vllm-project/speculators

MeganEFlynn · 2026-05-12T23:47:19Z

Purpose

Models like Laguna XS and Qwen3.6-27B have hidden_size (5120) not divisible by num_attention_heads (24) because they use an explicit head_dim (256). LlamaConfig's validate_architecture rejects this, so recompute num_attention_heads as hidden_size // head_dim for the speculator.

Description

This PR changes the way we calculate the attention heads so that the initialization doesn't fail when we use we use a hidden state dim that isn't divisible by the number of attention heads in the model, due to a limitation in the llama config.

Related Issue

NA

Tests

Using this PR makes Qwen 3.6 27B run whereas it previously failed

I have filled in:

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan/results, such as providing test command and pasting the results.
(Optional) The necessary documentation update.
I (a human) have written or reviewed the code in this pr to the best of my ability.

Models like Qwen3.6-27B have hidden_size (5120) not divisible by num_attention_heads (24) because they use an explicit head_dim (256). LlamaConfig's validate_architecture rejects this, so recompute num_attention_heads as hidden_size // head_dim for the speculator. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-05-12T23:47:27Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2bfc0356-108b-44dd-a8a8-10068fd6176f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch attention_head_dim_fix

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MeganEFlynn requested review from fynnsu and shanjiaz May 12, 2026 23:47

MeganEFlynn requested a review from dsikka May 12, 2026 23:47

fynnsu approved these changes May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix speculator config for models with explicit head_dim#517

Fix speculator config for models with explicit head_dim#517
MeganEFlynn wants to merge 1 commit into
mainfrom
attention_head_dim_fix

MeganEFlynn commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MeganEFlynn commented May 12, 2026

Purpose

Description

Related Issue

Tests

Uh oh!

coderabbitai Bot commented May 12, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants