Skip to content

[OpenVINO] Support eagle3 draft model for Qwen3-VL model#1679

Open
openvino-agent wants to merge 3 commits intohuggingface:mainfrom
openvino-agent:qwen3_vl_eagle3
Open

[OpenVINO] Support eagle3 draft model for Qwen3-VL model#1679
openvino-agent wants to merge 3 commits intohuggingface:mainfrom
openvino-agent:qwen3_vl_eagle3

Conversation

@openvino-agent
Copy link
Copy Markdown

@openvino-agent openvino-agent commented Apr 10, 2026

What does this PR do?

Cmd-line for exporting of the draft model

optimum-cli export openvino -m AngelSlim/Qwen3-VL-4B-Instruct_eagle3 Qwen3-VL-4B-Instruct_eagle3 --trust-remote-code

OpenVINO GenAI code:

import openvino as ov
import openvino_genai
from huggingface_hub import hf_hub_download
from transformers.video_utils import load_video

video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")
input_video = ov.Tensor(input_video)
question = "Why is this video funny?"

draft_model_path = "./Qwen3-VL-4B-Instruct_eagle3"
main_model_path = "./Qwen3-VL-4B-Instruct"

ov_draft_model = openvino_genai.draft_model(draft_model_path, "CPU")
ov_eagle3_pipe = openvino_genai.VLMPipeline(main_model_path, "CPU", draft_model=ov_draft_model)
#ov_eagle3_pipe = openvino_genai.VLMPipeline(main_model_path, "CPU")

genai_eagle3_output = ov_eagle3_pipe.generate(prompt=question, videos=[input_video], max_new_tokens=100)

print(genai_eagle3_output)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@peterchen-intel
Copy link
Copy Markdown
Contributor

peterchen-intel commented Apr 27, 2026

@rkazants As on AngelSlim page, it is mentioned that it has been benchmarked with vLLM, and there is EAGLE3 modeling file in vLLM, is it better to adopt from vLLM?
AngelSlim/Qwen3-VL-4B-Instruct_eagle3
On AngelSlim page: Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding on vLLM (v0.12.0) across language and multimodal tasks
vLLM EAGLE3 modeling file: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama_eagle3.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants