Commit 8575a7d
intel_gpu: fix unnecessary tmp_out buffer per-layer in paged_attention
paged_attention_opt__multi_tokens allocates a tmp_out scratch buffer sized
total_tokens * heads_num * v_head_size * num_of_partitions * sizeof(float).
For Qwen3-30B with chunk_size=4096 and 8K KV context this is 2 GB per layer.
With 48 layers all executing sequentially, this totalled 96 GB of demand-paged
USM device allocation. On Intel iGPU (ARLS, i915 driver), the driver pins the
entire allocation as Unevictable on first GPU access regardless of pages touched,
causing CL_OUT_OF_RESOURCES on a 31 GB machine.
Root cause: can_share_internal_buffer(false) in paged_attention_node unconditionally
blocked the memory pool for ALL internal buffers. This was added in PR openvinotoolkit#33204 to
prevent CPU/GPU races on lockable buffers (blocks_indexes_start/end,
blocked_gws_subseq_mapping) written by prepare_internal_buffers(). However it also
blocked pool reuse for non-lockable GPU-only buffers (exp_sums, max_logits, tmp_out)
which are safe to share across sequential layers.
Fix:
- Remove can_share_internal_buffer(false) from paged_attention_node; per-buffer
lockability already tracked via BufferDescriptor::m_lockable.
so CPU-written (lockable=true, usm_host) buffers remain non-shareable while
GPU-only (lockable=false, usm_device) buffers can be reused from the pool.
- In allocate_internal_buffers(): pass buffer_descs[i].m_lockable to the call
(previously dropped, causing wrong alloc type on initial allocation).
Result: 48 layers share one 2 GB tmp_out buffer instead of allocating 48 separate
2 GB buffers. Peak Unevictable drops from OOM crash (~28+ GB) to ~18.9 GB on ARLS
(Intel Arc 8086:7d67, Arrow Lake-S iGPU, 31 GB).
Verified: Qwen3-30B-A3B-Instruct-2507-int4-ov with chunk_size=4096, 8K prompt,
ContinuousBatching on ARLS completes successfully with exit code 0 and 20 coherent
output tokens. Not affected on ARLH (supports_immad=true takes micro_sdpa path
which does not allocate tmp_out at all).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 8a030fd commit 8575a7d
2 files changed
Lines changed: 5 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2492 | 2492 | | |
2493 | 2493 | | |
2494 | 2494 | | |
2495 | | - | |
| 2495 | + | |
2496 | 2496 | | |
2497 | 2497 | | |
2498 | 2498 | | |
| |||
2513 | 2513 | | |
2514 | 2514 | | |
2515 | 2515 | | |
2516 | | - | |
| 2516 | + | |
2517 | 2517 | | |
2518 | 2518 | | |
2519 | 2519 | | |
| |||
0 commit comments