Skip to content

Commit bec53be

Browse files
authored
feat(sparsity): add VecAttention sparse prefill for VLM (#320)
Integrate VecAttention into AngelSlim as a sparse attention method for Vision-Language Models (Qwen2.5-VL). Add vecattention subpackage under compressor/sparsity/ Add vllm-flash-attention as git submodule for sparse_attn_func kernel Add Triton kernels for MinP threshold selection and query pooling Add run_vecattention.py tool for image/video inference
1 parent 3ff0717 commit bec53be

16 files changed

Lines changed: 1784 additions & 1 deletion

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "angelslim/compressor/sparsity/vecattention/ops/vllm-flash-attention"]
2+
path = angelslim/compressor/sparsity/vecattention/ops/vllm-flash-attention
3+
url = git@github.com:anminliu/vllm-flash-attention.git

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,12 @@ A more accessible, comprehensive, and efficient toolkit for large model compress
135135
</td>
136136
<td>
137137
<ul style="padding-left: 0; list-style-position: inside;">
138+
<li>
139+
<strong>Sparse Attention</strong>
140+
<ul style="padding-left: 1.5rem">
141+
<li><a href="https://github.com/anminliu/VecAttention">VecAttention</a></li>
142+
</ul>
143+
</li>
138144
<li>
139145
<strong>Token Pruning</strong>
140146
<ul style="padding-left: 1.5rem">

README_cn.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,12 @@
136136
</td>
137137
<td>
138138
<ul style="padding-left: 0; list-style-position: inside;">
139+
<li>
140+
<strong>稀疏注意力</strong>
141+
<ul style="padding-left: 1.5rem">
142+
<li><a href="https://github.com/anminliu/VecAttention">VecAttention</a></li>
143+
</ul>
144+
</li>
139145
<li>
140146
<strong>Token剪枝</strong>
141147
<ul style="padding-left: 1.5rem">

angelslim/compressor/sparsity/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,6 @@
1313
# limitations under the License.
1414

1515
from .stem import StemInference # noqa: F401
16+
from .vecattention import VecAttentionInference # noqa: F401
1617

17-
__all__ = ["StemInference"]
18+
__all__ = ["StemInference", "VecAttentionInference"]
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Copyright 2025 Tencent Inc. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from .vecattention import VecAttentionInference # noqa: F401
16+
17+
__all__ = ["VecAttentionInference"]
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Copyright 2025 Tencent Inc. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
"""VecAttention-patched attention forward methods for VLM."""
16+
17+
from .forward import qwen_vl_attn_forward
18+
19+
__all__ = ["qwen_vl_attn_forward"]

0 commit comments

Comments
 (0)