[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6 by rkazants · Pull Request #1689 · huggingface/optimum-intel

rkazants · 2026-04-15T19:44:39Z

What does this PR do?

Re-created PR #1634

Fixes 181271, 181280, 182003

Installation instructions:

pip install -U git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python

Exporting cmd-line:

optimum-cli export openvino -m Qwen/Qwen3.5-0.8B Qwen3.5-0.8B

Inference script:

from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM

model_dir = "Qwen/Qwen3.5-0.8B"

processor = AutoProcessor.from_pretrained(model_dir)
model = OVModelForVisualCausalLM.from_pretrained(model_dir)

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Before submitting

[N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
[] Did you write any new necessary tests?

Copilot

Pull request overview

This PR adds OpenVINO export + runtime support for Qwen3.5-family visual-language models by introducing new OpenVINO configs/patchers and integrating them into the model loading/export paths.

Changes:

Add Qwen3.5/Qwen3.5-MoE model types to OpenVINO exporter utilities, stateful patching logic, and model patchers.
Implement an OpenVINO Visual Causal LM wrapper for Qwen3.5 (including position-id handling and vision embedding utilities).
Update docs and tests metadata to include Qwen3.5.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/openvino/utils_tests.py	Adds tiny test model id and expected INT8 node counts for `qwen3_5`.
optimum/intel/openvino/modeling_visual_language.py	Adds Qwen3.5 VLM runtime class + position id handling + model-type mapping.
optimum/intel/openvino/modeling_decoder.py	Ensures full-context attention mask behavior for Qwen3.5 text model types during decoding.
optimum/exporters/openvino/utils.py	Registers Qwen3.5 model types for submodel detection and SSM model handling.
optimum/exporters/openvino/stateful.py	Extends stateful patching to handle VLMs whose `text_config.model_type` is SSM-based.
optimum/exporters/openvino/model_patcher.py	Adds Qwen3.5(+MoE) patchers and patched forward implementations for export.
optimum/exporters/openvino/model_configs.py	Registers new OpenVINO configs + dummy input generators for Qwen3.5 model types.
optimum/exporters/openvino/convert.py	Adds cleanup logic to undo certain 16-bit traceability patches post-conversion.
docs/source/openvino/models.mdx	Adds Qwen3.5 to the supported architectures list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

rkazants · 2026-05-04T09:49:24Z

@regisss, let us make regular tests to be passing. Then I will switch on slow tests.
It is needed to avoid HF exceeding downloading rate error when multiple tests are downloading in parallel.

regisss

LGTM

rkazants · 2026-05-04T12:26:00Z

+        if: ${{ matrix.test-pattern == '*seq2seq*' }}
+        run: |
+          uv pip install "transformers>=5.2,<5.3"
+          uv pip install "openvino==2026.1.0" "openvino-tokenizers==2026.1.0.0" "openvino-genai==2026.1.0.0"


Suggested change

uv pip install "openvino==2026.1.0" "openvino-tokenizers==2026.1.0.0" "openvino-genai==2026.1.0.0"

let us use nightly build

need to check manually, because there was a regression in some recent nightly build

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>

popovaan · 2026-05-04T15:05:09Z

+            block.attn.forward = block.attn._orig_forward
+
+
+def patched_qwen3_5_moe_sparse_moe_block(self, hidden_states: torch.Tensor) -> torch.Tensor:


Moe block looks the same as LFM2:
https://github.com/huggingface/transformers/blob/v5.0.0/src/transformers/models/lfm2_moe/modeling_lfm2_moe.py#L167
https://github.com/huggingface/transformers/blob/v5.2.0/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L823

Can we reuse lfm2 MoE patching?

echarlaix added 30 commits January 19, 2026 10:39

Transformers v5

53d19b9

fix loading for llava_next_video

5205434

Remove deprecated transformers.onnx

e8feb0c

Merge branch 'main' into transformers-v5

55e4b3d

remove deprecated transformers.onnx from tests

bb54f64

remove huggingface_hub deprecated

71aa34e

relative to absolute import

0954015

update workflow to v5

1ba9789

remove redundant

f158656

update loading given transformers version

9345143

remove deprecated AutoModelForVision2Seq

b290ae3

update workflow

a4d1dc0

style

ac953ba

update setup

8001884

deprecated is_offline_mode

5f2a007

remove incompatible neural-compressor installation

ad477fe

remove documentation reference

42e98b8

add install transformers step

4ee3f51

Merge branch 'main' into transformers-v5

64c2022

transformers v5

8204264

install diffusers from source for v5

b319d19

remove deprecated CLIPFeatureExtractor

42300e4

openvino 2025.3.0

2a76102

add ov cache classes

f38703a

merge main in branch

46144d1

openvino nightly for modeling tests

2d3c734

openvino 2025.3 for modeling tests

b6dcefd

stop moving misplaced parameters from config to generation_config

ea24727

fix transformers version for doc building

07ff06b

fix transformers version for doc building

1270db0

urakozz mentioned this pull request Apr 29, 2026

[OpenVINO] Fix beam_idx wiring in patch_stateful_hybrid_ssm for hybrid SSM/attention models #1705

Closed

3 tasks

rkazants requested review from IlyasMoutawwakil, Copilot, echarlaix and popovaan April 30, 2026 06:30

Copilot started reviewing on behalf of rkazants April 30, 2026 06:31 View session

rkazants requested a review from regisss April 30, 2026 06:31

Copilot AI reviewed Apr 30, 2026

View reviewed changes

rkazants added 7 commits April 30, 2026 16:37

Avoid changes in convert.py

b468f9c

Merge

6ca3d08

Document Qwen3.5MOE and Qwen3.6

f456deb

Simplify config

bfc09e8

Apply code-formatting

a051f9b

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Add a comment for patcher

7c91c1e

Merge remote-tracking branch 'upstream/main' into support_qwen3_5

f7be570

regisss added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label May 4, 2026

Add tests

6f94681

rkazants removed openvino-slow Runs OpenVINO slow tests with different versions of transformers labels May 4, 2026

regisss approved these changes May 4, 2026

View reviewed changes

rkazants commented May 4, 2026

View reviewed changes

Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated

Apply suggestion from @rkazants

e55dbe3

rkazants commented May 4, 2026

View reviewed changes

Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated

Apply suggestion from @rkazants

9ac9d79

rkazants commented May 4, 2026

View reviewed changes

Comment thread .github/workflows/test_openvino_preview_models.yml Outdated

rkazants commented May 4, 2026

View reviewed changes

Comment thread .github/workflows/test_openvino_preview_models.yml Outdated

Apply suggestions from code review

6b05ccc

Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>

popovaan reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689
rkazants wants to merge 214 commits intohuggingface:mainfrom
rkazants:support_qwen3_5

rkazants commented Apr 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rkazants commented May 4, 2026

Uh oh!

regisss left a comment

Uh oh!

rkazants May 4, 2026

Uh oh!

rkazants May 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

popovaan May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

		block.attn.forward = block.attn._orig_forward


		def patched_qwen3_5_moe_sparse_moe_block(self, hidden_states: torch.Tensor) -> torch.Tensor:

Conversation

rkazants commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rkazants commented May 4, 2026

Uh oh!

regisss left a comment

Choose a reason for hiding this comment

Uh oh!

rkazants May 4, 2026

Choose a reason for hiding this comment

Uh oh!

rkazants May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

popovaan May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rkazants commented Apr 15, 2026 •

edited

Loading