Skip to content

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689

Open
rkazants wants to merge 214 commits intohuggingface:mainfrom
rkazants:support_qwen3_5
Open

[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689
rkazants wants to merge 214 commits intohuggingface:mainfrom
rkazants:support_qwen3_5

Conversation

@rkazants
Copy link
Copy Markdown
Collaborator

@rkazants rkazants commented Apr 15, 2026

What does this PR do?

Re-created PR #1634

Fixes 181271, 181280, 182003

Installation instructions:

pip install -U git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python

Exporting cmd-line:

optimum-cli export openvino -m Qwen/Qwen3.5-0.8B Qwen3.5-0.8B

Inference script:

from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM

model_dir = "Qwen/Qwen3.5-0.8B"

processor = AutoProcessor.from_pretrained(model_dir)
model = OVModelForVisualCausalLM.from_pretrained(model_dir)

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • [] Did you write any new necessary tests?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenVINO export + runtime support for Qwen3.5-family visual-language models by introducing new OpenVINO configs/patchers and integrating them into the model loading/export paths.

Changes:

  • Add Qwen3.5/Qwen3.5-MoE model types to OpenVINO exporter utilities, stateful patching logic, and model patchers.
  • Implement an OpenVINO Visual Causal LM wrapper for Qwen3.5 (including position-id handling and vision embedding utilities).
  • Update docs and tests metadata to include Qwen3.5.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/openvino/utils_tests.py Adds tiny test model id and expected INT8 node counts for qwen3_5.
optimum/intel/openvino/modeling_visual_language.py Adds Qwen3.5 VLM runtime class + position id handling + model-type mapping.
optimum/intel/openvino/modeling_decoder.py Ensures full-context attention mask behavior for Qwen3.5 text model types during decoding.
optimum/exporters/openvino/utils.py Registers Qwen3.5 model types for submodel detection and SSM model handling.
optimum/exporters/openvino/stateful.py Extends stateful patching to handle VLMs whose text_config.model_type is SSM-based.
optimum/exporters/openvino/model_patcher.py Adds Qwen3.5(+MoE) patchers and patched forward implementations for export.
optimum/exporters/openvino/model_configs.py Registers new OpenVINO configs + dummy input generators for Qwen3.5 model types.
optimum/exporters/openvino/convert.py Adds cleanup logic to undo certain 16-bit traceability patches post-conversion.
docs/source/openvino/models.mdx Adds Qwen3.5 to the supported architectures list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated
Comment thread optimum/exporters/openvino/model_configs.py Outdated
Comment thread optimum/exporters/openvino/model_patcher.py
Comment thread docs/source/openvino/models.mdx
Comment thread optimum/intel/openvino/modeling_visual_language.py
@regisss regisss added the openvino-slow Runs OpenVINO slow tests with different versions of transformers label May 4, 2026
@rkazants rkazants removed openvino-slow Runs OpenVINO slow tests with different versions of transformers labels May 4, 2026
@rkazants
Copy link
Copy Markdown
Collaborator Author

rkazants commented May 4, 2026

@regisss, let us make regular tests to be passing. Then I will switch on slow tests.
It is needed to avoid HF exceeding downloading rate error when multiple tests are downloading in parallel.

Copy link
Copy Markdown
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if: ${{ matrix.test-pattern == '*seq2seq*' }}
run: |
uv pip install "transformers>=5.2,<5.3"
uv pip install "openvino==2026.1.0" "openvino-tokenizers==2026.1.0.0" "openvino-genai==2026.1.0.0"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uv pip install "openvino==2026.1.0" "openvino-tokenizers==2026.1.0.0" "openvino-genai==2026.1.0.0"

let us use nightly build

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check manually, because there was a regression in some recent nightly build

Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated
Comment thread optimum/intel/openvino/modeling_visual_language.py Outdated
Comment thread .github/workflows/test_openvino_preview_models.yml Outdated
Comment thread .github/workflows/test_openvino_preview_models.yml Outdated
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
block.attn.forward = block.attn._orig_forward


def patched_qwen3_5_moe_sparse_moe_block(self, hidden_states: torch.Tensor) -> torch.Tensor:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants