Skip to content

[Opt] GsaOnDevice cuda bugfix & optimization#659

Merged
Infinite666 merged 9 commits intoModelEngine-Group:developfrom
wangwenxin0312:dev_gsa_device_pr
Jan 27, 2026
Merged

[Opt] GsaOnDevice cuda bugfix & optimization#659
Infinite666 merged 9 commits intoModelEngine-Group:developfrom
wangwenxin0312:dev_gsa_device_pr

Conversation

@wangwenxin0312
Copy link
Copy Markdown
Contributor

@wangwenxin0312 wangwenxin0312 commented Jan 20, 2026

Purpose

What this PR does / why we need it?
1、GsaOnDevice bugfix & optimization
2、Update vllm-adapt-sparse.patch

Modifications

Does this PR introduce any user-facing change?

  • Decode metadata (decode_req_ids, block_table_decode, decode_seq_lens) are constructed in build_sparse_meta
  • On external prefix-cache hits, prefix_slot_mapping and prefix_block_ids are rebuilt to ensure that k_hash computation covers the full required prefix
  • Decode-only batches are optimized using tensor slicing

Test

How was this patch tested?
export MODEL_PATH="/home/models/Qwen3-Coder-30B-A3B-Instruct-FP8"
export VLLM_HASH_ATTENTION=1
python examples/offline_inference_gsaondevice.py
image

Comment thread ucm/sparse/gsa_on_device/gsa_on_device.py Outdated
Comment thread ucm/sparse/gsa_on_device/gsa_on_device.py Outdated
Comment thread examples/offline_inference_gsaondevice.py
Comment thread ucm/sparse/gsa_on_device/gsa_on_device.py Outdated
@Infinite666 Infinite666 changed the title [feat] GsaOnDevice cuda bugfix & optimization [Opt] GsaOnDevice cuda bugfix & optimization Jan 21, 2026
Comment thread ucm/sparse/gsa_on_device/gsa_on_device.py
Infinite666
Infinite666 previously approved these changes Jan 27, 2026
Comment thread ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
@Infinite666 Infinite666 merged commit 4adcfe7 into ModelEngine-Group:develop Jan 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants