feat: add Intel XPU transformers support by zy6p · Pull Request #4801 · opendatalab/MinerU

zy6p · 2026-04-16T06:31:59Z

Summary

This PR adds a minimal Intel XPU path for the transformers backend on Linux.

Changes:

detect xpu in get_device() when torch.xpu.is_available()
prefer transformers over vllm on Linux when the selected device is xpu
load Qwen2VLForConditionalGeneration on CPU first and then move it to xpu

Why

On Intel Arc A750 with the current oneAPI / torch xpu stack, the default Linux auto-engine path selects vllm, but the end-to-end MinerU VLM service is not stable through that route. The transformers backend can work, but it needs two adjustments:

MinerU must recognize xpu as a device type.
Qwen2VL should avoid the standard device_map={"": device} loading path on XPU and instead load on CPU first, then call .to("xpu").

Validation

Validated on a private Intel Arc A750 deployment:

torch.xpu.is_available() == True
MinerU API service can parse PDF -> Markdown through the transformers backend on XPU
basic syntax check: python3 -m py_compile on the touched files

Scope

This PR intentionally stays small and does not add new deployment docs or packaging changes.

github-actions · 2026-04-16T06:32:12Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

zy6p · 2026-04-16T06:41:59Z

I have read the CLA Document and I hereby sign the CLA

myhloli · 2026-04-16T11:11:02Z

vLLM provides Docker images that support XPU, why is it still necessary to explicitly specify using the transformers backend?

zy6p · 2026-04-17T03:23:13Z

Thanks. Here is the concrete reason I switched to the conservative settings, with the actual logs inline.

The short version is:

I did not switch away from vLLM on Intel XPU based on preference.
I switched because the actual MinerU + Qwen2VL multimodal startup path on Intel Arc A750 was not stable in my testing, even after isolating the failure surface with more conservative settings.
transformers + xpu did work on the same machine for the same MinerU model path.

Why I changed those parameters

I changed them as a failure-isolation sequence, not as random tuning:

mm_encoder_attn_backend=TORCH_SDPA
- to bypass the default XPU FLASH_ATTN path for the visual encoder
mm_encoder_attn_backend=TRITON_ATTN
- to test another supported non-default visual-attention backend
enforce_eager=True
- to remove compile / graph-capture variables
skip_mm_profiling=True
- to remove multimodal profiling variables

This was based on the XPU ViT attention backend logic in the vLLM image, which supports:

@classmethod
def get_supported_vit_attn_backends(cls):
    return [
        AttentionBackendEnum.FLASH_ATTN,
        AttentionBackendEnum.TRITON_ATTN,
        AttentionBackendEnum.TORCH_SDPA,
    ]

and defaults to FLASH_ATTN if nothing is specified.

What I observed on the real machine

The Intel GPU was actually visible and usable from the container. This was not a “device not passed through” problem:

[mineru-vllm] visible SYCL devices:
INFO: Output filtered by ONEAPI_DEVICE_SELECTOR environment variable, which is set to level_zero:gpu.

[level_zero:gpu] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A750 Graphics 12.55.8 [1.14.36300+8]
[mineru-vllm] torch.xpu.is_available = True
[mineru-vllm] torch.xpu.device_count = 1
[mineru-vllm] device 0: name='Intel(R) Arc(TM) A750 Graphics' total_memory=8096681984

On the default run, vLLM selected FLASH_ATTN for the multimodal visual encoder path, loaded the model, and then failed during MM encoder startup:

(APIServer pid=1) INFO 04-15 15:54:13 [__init__.py:254] Automatically detected platform xpu.
(APIServer pid=1) INFO 04-15 15:54:14 [api_server.py:962] vLLM API server version 0.1.dev14456+gde3f7fe65
...
(EngineCore_DP0 pid=121) INFO 04-15 15:54:20 [loader.py:489] Starting to load model /data/models/mineru-vl...
(EngineCore_DP0 pid=121) INFO 04-15 15:54:20 [xpu.py:114] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention
(EngineCore_DP0 pid=121) INFO 04-15 15:54:20 [mm_encoder_attention.py:215] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention.
...
(EngineCore_DP0 pid=121) INFO 04-15 15:54:24 [loader.py:542] Model loading took 2.16 GiB memory and 3.400012 seconds
(EngineCore_DP0 pid=121) INFO 04-15 15:54:24 [cache_utils.py:513] Encoder cache will be initialized with a budget of 8192 tokens, and profiled with 1 video items of the maximum feature size.
(EngineCore_DP0 pid=121) ERROR 04-15 15:54:25 [core.py:494] EngineCore failed to start.
...
(EngineCore_DP0 pid=121) ERROR 04-15 15:54:25 [core.py:494]   File "/opt/venv/lib/python3.12/site-packages/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 423, in forward_xpu
(EngineCore_DP0 pid=121) ERROR 04-15 15:54:25 [core.py:494]     return self._forward_fa(query, key, value, cu_seqlens, max_seqlen)

In the same run, my diagnosis note recorded that this corresponded to the XPU FlashAttention kernel failure:

Only XE2 cutlass kernel is supported currently.

I am calling out explicitly that this exact line came from the same failing run, but the full stderr capture in the session log was truncated, so I do not have the complete raw block for that one line anymore.

I then switched to TORCH_SDPA specifically to get off the default flash path. That did change the selected backend, but the engine still failed:

(APIServer pid=1) INFO 04-15 16:02:27 [api_server.py:969] non-default args: {'host': '0.0.0.0', 'port': 30000, 'model': '/data/models/mineru-vl', 'served_model_name': ['OpenDataLab/MinerU2.5-Pro-2604-1.2B'], 'logits_processors': ['mineru_vl_utils:MinerULogitsProcessor'], 'gpu_memory_utilization': 0.7, 'mm_encoder_attn_backend': 'TORCH_SDPA'}
...
(EngineCore_DP0 pid=121) INFO 04-15 16:02:34 [xpu.py:111] Using backend AttentionBackendEnum.TORCH_SDPA for vit attention
(EngineCore_DP0 pid=121) INFO 04-15 16:02:34 [mm_encoder_attention.py:215] Using AttentionBackendEnum.TORCH_SDPA for MMEncoderAttention.
...
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html
...
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

So this was already evidence that the issue was not just “default FLASH_ATTN is too aggressive”.

I then went even more conservative:

mm_encoder_attn_backend=TRITON_ATTN
enforce_eager=True
skip_mm_profiling=True

That run still failed:

(APIServer pid=1) INFO ... non-default args: {'model_tag': '/data/models/mineru-vl', 'host': '0.0.0.0', 'port': 30000, 'model': '/data/models/mineru-vl', 'enforce_eager': True, 'served_model_name': ['OpenDataLab/MinerU2.5-Pro-2604-1.2B'], 'logits_processors': ['mineru_vl_utils:MinerULogitsProcessor'], 'gpu_memory_utilization': 0.7, 'mm_encoder_attn_backend': 'TRITON_ATTN', 'skip_mm_profiling': True}
...
(EngineCore_DP0 pid=121) INFO ... [xpu.py:111] Using backend AttentionBackendEnum.TRITON_ATTN for vit attention
(EngineCore_DP0 pid=121) INFO ... [mm_encoder_attention.py:215] Using AttentionBackendEnum.TRITON_ATTN for MMEncoderAttention.
(EngineCore_DP0 pid=121) WARNING ... Enforce eager set, disabling torch.compile and CUDAGraphs.
...
(EngineCore_DP0 pid=121) INFO ... Model loading took 2.16 GiB memory and 2.215206 seconds
(EngineCore_DP0 pid=121) INFO ... Skipping memory profiling for multimodal encoder and encoder cache.
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html
...
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Why that matters

At that point the experiment had already shown:

the Arc A750 was visible to SYCL and torch.xpu
the model could start loading
the default MM encoder flash path failed
switching to TORCH_SDPA did not make the engine stable
switching to TRITON_ATTN and also removing compile/profiling variables still did not make the engine stable

So my conclusion was not “vLLM XPU never works”.

My conclusion was:

on Intel Arc A750
for the current MinerU + Qwen2VL multimodal startup path
the vLLM + XPU route was not stable enough in my testing to be used as the default backend

Why transformers was preferred in the PR

Because transformers + xpu was the path that actually worked for the same hardware and model family.

I also found that the model loading path had to be conservative there as well. Direct XPU device_map loading was not reliable on Arc A750, while loading on CPU first and then moving the model to XPU did work:

loading processor
loading model on cpu
moving to xpu
model device xpu:0
done

So the current PR behavior is intended as a compatibility fallback:

prefer transformers on Intel XPU today
because that path was validated on real hardware
while the current vLLM multimodal path was not stable enough in the same testing

If the vLLM + XPU + Qwen2VL multimodal path becomes stable on Arc-class Intel GPUs later, I agree that this preference can be revisited.

zy6p · 2026-04-17T04:37:23Z

recheck

255doesnotexist · 2026-05-03T04:58:05Z

Any progress?

feat: add Intel XPU transformers support

582c287

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 16, 2026

dosubot Bot added the enhancement New feature or request label Apr 16, 2026

zy6p mentioned this pull request Apr 16, 2026

Intel XPU支持需求 #675

Closed

github-actions Bot added a commit that referenced this pull request Apr 17, 2026

@zy6p has signed the CLA in #4801

2c9918f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Intel XPU transformers support#4801

feat: add Intel XPU transformers support#4801
zy6p wants to merge 1 commit into
opendatalab:masterfrom
zy6p:feat/intel-xpu

zy6p commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

zy6p commented Apr 16, 2026

Uh oh!

myhloli commented Apr 16, 2026

Uh oh!

zy6p commented Apr 17, 2026

Uh oh!

zy6p commented Apr 17, 2026

Uh oh!

255doesnotexist commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zy6p commented Apr 16, 2026

Summary

Why

Validation

Scope

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zy6p commented Apr 16, 2026

Uh oh!

myhloli commented Apr 16, 2026

Uh oh!

zy6p commented Apr 17, 2026

Uh oh!

zy6p commented Apr 17, 2026

Uh oh!

255doesnotexist commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 16, 2026 •

edited

Loading