Skip to content

Commit af1374b

Browse files
committed
add sparse method patches for vllm 0.11.0
1 parent 6f90147 commit af1374b

3 files changed

Lines changed: 877 additions & 2 deletions

File tree

docs/source/getting-started/quickstart_vllm.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
This document describes how to install unified-cache-management with vllm on cuda platform.
33

44
## Prerequisites
5-
- vllm >=0.9.1, device=cuda (vllm == 0.9.2 to use the Sparse Feature)
5+
- vllm >=0.9.1, device=cuda (Sparse Feature is supported in vllm 0.9.2 and v0.11.0)
66

77
## Step 1: UCM Installation
88

@@ -44,6 +44,7 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
4444
1. Prepare vLLM Environment
4545

4646
For the sake of environment isolation and simplicity, we recommend preparing the vLLM environment by pulling the official, pre-built vLLM Docker image.
47+
> Note: v0.11.0 is newly supported (replace the tag with v0.11.0 if needed).
4748
4849
```bash
4950
docker pull vllm/vllm-openai:v0.9.2
@@ -87,6 +88,15 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
8788
```
8889
Apply the patch that matches your development needs:
8990

91+
#### vLLM 0.11.0
92+
93+
Note: v0.11.0 only requires the sparse attention patch.
94+
95+
```bash
96+
git apply unified-cache-management/ucm/integration/vllm/patch/0.11.0/vllm-adapt-sparse.patch
97+
```
98+
99+
#### vLLM 0.9.2
90100
- Full UCM integration (recommended):
91101
```bash
92102
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch

examples/offline_inference_kvcomphbm.py renamed to examples/offline_inference_gsa_on_device.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
7777
},
7878
}
7979
],
80-
"ucm_sparse_config": {"KvCompOnDevice": {}},
80+
"ucm_sparse_config": {"GSAOnDevice": {}},
8181
},
8282
)
8383

0 commit comments

Comments
 (0)