[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689
[OpenVINO] Support Qwen3.5, Qwen3.5-MoE and Qwen3.6#1689rkazants wants to merge 214 commits intohuggingface:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds OpenVINO export + runtime support for Qwen3.5-family visual-language models by introducing new OpenVINO configs/patchers and integrating them into the model loading/export paths.
Changes:
- Add Qwen3.5/Qwen3.5-MoE model types to OpenVINO exporter utilities, stateful patching logic, and model patchers.
- Implement an OpenVINO Visual Causal LM wrapper for Qwen3.5 (including position-id handling and vision embedding utilities).
- Update docs and tests metadata to include Qwen3.5.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/openvino/utils_tests.py | Adds tiny test model id and expected INT8 node counts for qwen3_5. |
| optimum/intel/openvino/modeling_visual_language.py | Adds Qwen3.5 VLM runtime class + position id handling + model-type mapping. |
| optimum/intel/openvino/modeling_decoder.py | Ensures full-context attention mask behavior for Qwen3.5 text model types during decoding. |
| optimum/exporters/openvino/utils.py | Registers Qwen3.5 model types for submodel detection and SSM model handling. |
| optimum/exporters/openvino/stateful.py | Extends stateful patching to handle VLMs whose text_config.model_type is SSM-based. |
| optimum/exporters/openvino/model_patcher.py | Adds Qwen3.5(+MoE) patchers and patched forward implementations for export. |
| optimum/exporters/openvino/model_configs.py | Registers new OpenVINO configs + dummy input generators for Qwen3.5 model types. |
| optimum/exporters/openvino/convert.py | Adds cleanup logic to undo certain 16-bit traceability patches post-conversion. |
| docs/source/openvino/models.mdx | Adds Qwen3.5 to the supported architectures list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
|
@regisss, let us make regular tests to be passing. Then I will switch on slow tests. |
| if: ${{ matrix.test-pattern == '*seq2seq*' }} | ||
| run: | | ||
| uv pip install "transformers>=5.2,<5.3" | ||
| uv pip install "openvino==2026.1.0" "openvino-tokenizers==2026.1.0.0" "openvino-genai==2026.1.0.0" |
There was a problem hiding this comment.
| uv pip install "openvino==2026.1.0" "openvino-tokenizers==2026.1.0.0" "openvino-genai==2026.1.0.0" |
let us use nightly build
There was a problem hiding this comment.
need to check manually, because there was a regression in some recent nightly build
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
| block.attn.forward = block.attn._orig_forward | ||
|
|
||
|
|
||
| def patched_qwen3_5_moe_sparse_moe_block(self, hidden_states: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Moe block looks the same as LFM2:
https://github.com/huggingface/transformers/blob/v5.0.0/src/transformers/models/lfm2_moe/modeling_lfm2_moe.py#L167
https://github.com/huggingface/transformers/blob/v5.2.0/src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py#L823
Can we reuse lfm2 MoE patching?
What does this PR do?
Re-created PR #1634
Fixes 181271, 181280, 182003
Installation instructions:
Exporting cmd-line:
optimum-cli export openvino -m Qwen/Qwen3.5-0.8B Qwen3.5-0.8BInference script:
Before submitting