[Bug] Unable to load Qwen3.5 Models to GPU

_Continuing discussion from #1689_ 

<details>
  <summary>Original Post</summary>

> I've been attempting to test this locally. However, there is an issue with the models generated and/or the OpenVino implementation for Qwen.
> 
> I've exported copies of `qwen3.5-9b` along with the 27B and 35B-A3B versions of both Qwen3.5 and 3.6. These were all exported using the command `optimum-cli export openvino -m qwen/qwen3.5-XXX --weight-format int4 /models/qwen3.5-xxxx-int4`.
> 
> The 9B appears to works fine (outside of enabling/disabling thinking but that's a different issue). The other two, though, are causing major issues.
> 
> First, neither of them will load to GPU. When I attempt to load `qwen3.6-27b`, I receive this error:
> 
> ```
> Failed to initialize VLMPipeline: Exception from src/inference/src/cpp/core.cpp:117:
> Exception from src/inference/src/dev/plugin.cpp:54:
> Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
> [GPU] ProgramBuilder build failed!
> Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
> [GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
> Traceback (most recent call last):
>   File "/app/src/engine/ov_genai/vlm.py", line 278, in load_model
>     self.model_path = VLMPipeline(
>                       ^^^^^^^^^^^^
> RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
> Exception from src/inference/src/dev/plugin.cpp:54:
> Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
> [GPU] ProgramBuilder build failed!
> Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
> [GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
> ```
> 
> When I attempt to do the same with `qwen3.6-35b-a3b`, I receive an error that originates from the same call but is slightly different:
> 
> ```
> Failed to initialize VLMPipeline: Exception from src/inference/src/cpp/core.cpp:117:
> Exception from src/inference/src/dev/plugin.cpp:54:
> Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
> [GPU] ProgramBuilder build failed!
> Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
> [GPU] clWaitForEvents, error code: -58 CL_INVALID_EVENT
> Traceback (most recent call last):
>   File "/app/src/engine/ov_genai/vlm.py", line 278, in load_model
>     self.model_path = VLMPipeline(
>                       ^^^^^^^^^^^^
> RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
> Exception from src/inference/src/dev/plugin.cpp:54:
> Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
> [GPU] ProgramBuilder build failed!
> Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
> [GPU] clWaitForEvents, error code: -58 CL_INVALID_EVENT
> ```
> 
> Secondly, the models are generating gibberish. I'm able to get them to load to my CPU but the response doesn't make any sense:
> 
> ```
> Prompt: Tell me a joke
> Response: <>D*!<*M@0IM#6/8=IQH4&Q.I"JG170O)JP$CLL8QH4Q&HB;2@>C&PF&;5Q=K$2%+.'?1!C!R!IPN(%E!MN(7G8B0B3+E'FA!)1.OM@,P;0+@3,0>-5QQ-@C1;B0A'$KP)BJ?7@JB788LQ:8/!%2?O%#8'E#9;(A65NG+(..L=:&N7".7"=A?B'0D*#=KI><O3J)?CB.=D2#8M-F.I>6.@R6P(%%8(;#45IB6->A68(>20:&PR0IM!!Q?QADB##I!FML';#7.E><.O0801MR6C)7M,6=4&D%;7NEDQ,*CNLO3:!3'*L45.'5OG,()%M2/L)8?*R?EP2F31&9/1K3?3HQRLEP"!D>OH3:@/.(R,"=!B&L>F28A<I++RK+P.2%98R(-7//M9A6(8)!3<O*GFBQ%B&!6D,Q+%BN='ORF61NI0>1J:@?LA=MH6KBP%(9+HPR26-A+P:QOD+CM2)0Q+=K>4LF;B:='3DDIQ.*K+C39'E2PO$FF7N+7F<2>/8+##-B;G?<"?BJH,#D>KG:9>8HL!1+8%RLP9PDH9@DL9#**K!?8ELCH,,QF9?@=)5'MJ#K5:'R,75L.8E8HN5$E1I7A5$R"'1/P2??&AC6G−7BB4−MMF<JL)27=− 
> ′
>  >/QA9=(−3)G>?=6) 
> ′
>  K%"/DH9GQ:6A9.5P30QOHM<:C&C(0=&'1ICPCK'N.<7%N67%42LI7NM(Q(>1DFO@2-$3?HAPR%:>P<%7P@.LJCN,47C7@@IPG"M%-2FOAP;E%EI?&K5&"(7!P7L7%/17RR!JO&?:M0G<-OR>-')?9;,!-8),/5A3J@490,)GAH4
</details>

Summary:

I've been attempting to test Qwen3.5/3.6 models in OpenVINO. However, I've been running into some issues. I used the command `optimum-cli export openvino -m qwen/qwen3.X-XXX --weight-format int4 /models/qwen3.X-XXX-int4`. 

I tried three different models:
* Qwen3.5-9B 
* Qwen3.6-27B
* Qwen3.6-35B-A3B

The 9B works fine for me except for enabling/disabling thinking but that's a separate issue. 27B and 35B-A3B work fine when loaded to CPU after updating to the latest nightly build of OpenVINO. However, when loaded to GPU, I receive an error instead which varies between 

Test scripts used:

**OpenVINO**

```python
import random
import openvino as ov
from openvino_genai import GenerationConfig, VLMPipeline
from transformers import AutoTokenizer

model_dir = "/models/qwen3.6-27b-int4"

pipe = VLMPipeline(model_dir, device="GPU", ATTENTION_BACKEND="SDPA")

config = GenerationConfig()
config.top_k = random.randint(10,100)
config.top_p = round(random.random(), 2)
config.temperature = 1.5
config.max_new_tokens = 15
config.do_sample = True
prompt = "Generate a random number between 100,000 and 100,000,000,000."
tokenizer = AutoTokenizer.from_pretrained(model_dir)
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="np",
    tokenize=False,
    enable_thinking=False
)
for i in range(0,3):
    print("---")
    result = pipe.generate(text, generation_config=config)
    print(result.texts[0])
```

**Optimum**

```python
from optimum.intel import OVModelForVisualCausalLM
from transformers import AutoTokenizer, pipeline

model_id = "/models/qwen3.6-35B-A3B-int4"
model = OVModelForVisualCausalLM.from_pretrained(model_id, device="GPU", ATTENTION_BACKEND="SDPA")
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

results = pipe("What is the capital of France?", max_new_tokens=10, do_sample=True, top_k=50, top_p=0.95, temperature=1.5)
```

The error messages I receive are below. I have only included the Optimum traces as it produces the same result but with more information.

**Qwen3.6-27B:**

```
Traceback (most recent call last):
  File "/models/test_vlm_optimum.py", line 5, in <module>
    model = OVModelForVision2Seq.from_pretrained(model_id, device="GPU", ATTENTION_BACKEND="SDPA")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 617,
in from_pretrained
    return super().from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/modeling_base.py", line 407, in from_pretrained
    return from_pretrained_method(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_seq2seq.py", line 1073, in _from_pretrained
    return super()._from_pretrained(model_id, config, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_seq2seq.py", line 508, in _from_pretrained
    encoder = cls.load_model(file_names["encoder"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 372,
in load_model
    core.read_model(file_name.resolve(), file_name.with_suffix(".bin").resolve())
  File "/app/.venv/lib/python3.12/site-packages/openvino/_ov_api.py", line 603, in read_model
    return Model(super().read_model(model, weights, config))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:84:
Check 'util::directory_exists(path) || util::file_exists(path)' failed at src/frontends/common/src/frontend.cpp:117:
FrontEnd API failed with GeneralFailure:
ir: Could not open the file: "/models/qwen3.6-27b-int4/openvino_encoder_model.xml"
root@342d49194b73:/models# python test_vlm_optimum.py
Traceback (most recent call last):
  File "/models/test_vlm_optimum.py", line 5, in <module>
    model = OVModelForVisualCausalLM.from_pretrained(model_id, device="GPU", ATTENTION_BACKEND="SDPA")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 617,
in from_pretrained
    return super().from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/modeling_base.py", line 407, in from_pretrained
    return from_pretrained_method(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 592, in _from_pretrained
    model = model_cls(
            ^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 4984, in __init__
    super().__init__(
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 428, in __init__
    self.language_model = OVModelWithEmbedForCausalLM(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 96, in __init__
    super().__init__(
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 214, in __init__
    self.compile()
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 114, in compile
    super().compile()
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 439, in compile
    super().compile()
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 914,
in compile
    self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 420,
in _compile_model
    compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/openvino/_ov_api.py", line 646, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:168:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_memory.cpp:569:
[GPU] clWaitForEvents, error code: -14 CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST
```

**Qwen3.6-35B-A3B:**

```
Traceback (most recent call last):
  File "/models/test_vlm_optimum.py", line 5, in <module>
    model = OVModelForVisualCausalLM.from_pretrained(model_id, device="GPU", ATTENTION_BACKEND="SDPA")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 617,
in from_pretrained
    return super().from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/modeling_base.py", line 407, in from_pretrained
    return from_pretrained_method(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 592, in _from_pretrained
    model = model_cls(
            ^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 4984, in __init__
    super().__init__(
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 428, in __init__
    self.language_model = OVModelWithEmbedForCausalLM(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 96, in __init__
    super().__init__(
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 214, in __init__
    self.compile()
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_visual_language.py", line 114, in compile
    super().compile()
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 439, in compile
    super().compile()
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 914,
in compile
    self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 420,
in _compile_model
    compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/openvino/_ov_api.py", line 646, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:117:
Exception from src/inference/src/dev/plugin.cpp:54:
Exception from src/plugins/intel_gpu/src/plugin/program_builder.cpp:268:
Input moecompressed:__module.model.model.language_model.layers.0.mlp/aten::sum/ReduceSum_Reshape/MOECompressed hasn't been found in primitive_ids map
```

Note: I previously received a different error for Qwen3.6 before updating. I assume that the change is because of the new MOE. Since the model was produced before the merge yesterday, I will be producing a new export later and will check if the error varies.

System information:

**CPU**: AMD Ryzen 5600X
**GPU**: Intel Arc B70 Pro
**OS**: Ubuntu 24.04, Linux Kernel 6.17.0-22-generic

System Packages:
* **libze-intel-gpu1**: 26.09.37435.12-1-24.04-ppa1
* **libze1**: 1.28.0-1.24.04-ppa1
* **intel-opencl-icd**: 26.09.37435.12-1-24.04-ppa1

Python packages:
* **optimum**: 2.1.0.dev0 
* **optimum-intel**: 1.27.0.dev0+8ec3275
* **optimum-onnx**: 0.1.0.dev0
* **openvino**: 2026.2.0 (built from c4b175d)
* **openvino-genai**: 2026.2.0.0 (built from yatarkan:yt/qwen3_5 @ 7c9eb93)
* **openvino-tokenizers**: 2026.2.0.0 (built from yatarkan:yt/qwen3_5 @ 7c9eb93)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Unable to load Qwen3.5 Models to GPU #1720

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Unable to load Qwen3.5 Models to GPU #1720

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions