Skip to content

Commit 8e2fc27

Browse files
Merge branch 'develop' into dev_kvcomp_hbm_1223
2 parents 666d8fc + 0ab52c2 commit 8e2fc27

16 files changed

Lines changed: 2455 additions & 1311 deletions

File tree

docker/Dockerfile-NPU

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Set to other image if needed
2-
FROM quay.io/ascend/vllm-ascend:v0.9.2rc1
2+
FROM quay.io/ascend/vllm-ascend:v0.9.2rc1-openeuler
33

44
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
55

@@ -10,8 +10,16 @@ COPY . /workspace/unified-cache-management
1010

1111
RUN pip config set global.index-url ${PIP_INDEX_URL}
1212

13-
RUN export PLATFORM="ascend" && \
13+
RUN export PLATFORM="ascend" ENABLE_SPARSE=true && \
1414
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
1515
pip install -v -e /workspace/unified-cache-management --no-build-isolation
1616

17+
# Apply patch for vLLM
18+
RUN cd /vllm-workspace/vllm \
19+
&& git apply /workspace/unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
20+
21+
# Apply patch for vLLM-ascend
22+
RUN cd /vllm-workspace/vllm-ascend \
23+
&& git apply /workspace/unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-ascend-adapt.patch
24+
1725
CMD ["/bin/bash"]

docs/source/getting-started/quickstart_vllm.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,33 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
7777
pip install -v -e . --no-build-isolation
7878
```
7979

80+
3. Apply vLLM Integration Patches (Required)
81+
82+
To enable Unified Cache Management (UCM) integration with vLLM, you must **manually apply the corresponding vLLM patch**.
83+
84+
You may directly navigate to the vLLM source directory:
85+
```bash
86+
cd <path_to_vllm>
87+
```
88+
Apply the patch that matches your development needs:
89+
90+
- Full UCM integration (recommended):
91+
```bash
92+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
93+
```
94+
95+
- Sparse attention only:
96+
```bash
97+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-sparse.patch
98+
```
99+
100+
- ReRoPE support only:
101+
```bash
102+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt-rerope.patch
103+
```
104+
105+
Choose the patch according to your development needs.
106+
If you are working on **sparse attention** or **ReRoPE** independently, applying only the corresponding patch is sufficient.
80107

81108

82109
### Option 3: Install by pip
@@ -91,6 +118,7 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
91118
export PLATFORM=cuda
92119
pip install uc-manager
93120
```
121+
> **Note:** If installing via `pip install`, you need to manually add the `config.yaml` file, similar to `unified-cache-management/examples/ucm_config_example.yaml`, because PyPI packages do not include YAML files.
94122

95123
## Step 2: Configuration
96124

docs/source/getting-started/quickstart_vllm_ascend.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ We offer 3 options to install UCM.
1212

1313
### Option 1: Build from source
1414

15-
Follow commands below to install unified-cache-management from source code:
15+
1、Follow commands below to install unified-cache-management from source code:
1616
**Note:** The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` before you build.
1717
```bash
1818
# Replace <branch_or_tag_name> with the branch or tag name needed
@@ -23,13 +23,39 @@ pip install -v -e . --no-build-isolation
2323
cd ..
2424
```
2525

26+
2、Apply vLLM and vLLM-Ascend Integration Patches (Required)
27+
To enable Unified Cache Management (UCM) integration, you need to apply patches to both vLLM and vLLM-Ascend source trees.
28+
29+
**Step 1:** Apply the vLLM Patch
30+
31+
First, apply the standard vLLM integration patch in the vLLM source directory:
32+
33+
```bash
34+
cd <path_to_vllm>
35+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
36+
```
37+
38+
**Step 2:** Apply the vLLM-Ascend Patch
39+
40+
Then, switch to the vLLM-Ascend source directory and apply the Ascend-specific patch:
41+
42+
```bash
43+
cd <path_to_vllm_ascend>
44+
git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-ascend-adapt.patch
45+
```
46+
47+
**Note:**
48+
The ReRoPE algorithm is not supported on Ascend at the moment.
49+
Only the standard UCM integration is applicable for vLLM-Ascend.
50+
2651

2752
### Option 2: Install by pip
2853
Install by pip or find the pre-build wheels on [Pypi](https://pypi.org/project/uc-manager/).
2954
```
3055
export PLATFORM=ascend
3156
pip install uc-manager
3257
```
58+
> **Note:** If installing via `pip install`, you need to manually add the `config.yaml` file, similar to `unified-cache-management/examples/ucm_config_example.yaml`, because PyPI packages do not include YAML files.
3359
3460
### Option 3: Setup from docker
3561
Download the pre-built `vllm-ascend` docker image and build unified-cache-management docker image by commands below:
@@ -39,6 +65,14 @@ Download the pre-built `vllm-ascend` docker image and build unified-cache-manage
3965
cd unified-cache-management
4066
docker build -t ucm-vllm:latest -f ./docker/Dockerfile-NPU ./
4167
```
68+
vllm-ascend provides two variants: **Ubuntu** and **openEuler**.
69+
The `Dockerfile-NPU` uses the **openEuler** variant by default.
70+
71+
If you want to use the **Ubuntu** variant, please remove the `-openeuler` suffix and use the following image instead:
72+
73+
```text
74+
quay.io/ascend/vllm-ascend:v0.9.2rc1
75+
```
4276
Then run your container using following command. You can add or remove Docker parameters as needed.
4377
```bash
4478
# Update DEVICE according to your device (/dev/davinci[0-7])

docs/source/user-guide/prefix-cache/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,4 +80,5 @@ performance.
8080
:::{toctree}
8181
:maxdepth: 1
8282
nfs_store
83+
pipeline_store
8384
:::

docs/source/user-guide/prefix-cache/pipline_store.md renamed to docs/source/user-guide/prefix-cache/pipeline_store.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -227,12 +227,12 @@ This log indicates that the **Cache Store** has received a **load or dump task**
227227
| `subtask_number` | Number of subtasks executed in this operation |
228228
| `size` | Total size of data transferred in bytes (across all tasks) |
229229

230-
```test
230+
```text
231231
[UC][D] Cache task({task_id},{operation},{subtask_number},{size}) finished, cost {time}ms. [PID,TID]
232232
```
233233
This log indicates that a load or dump task in the **Cache Store** has completed, along with its execution time **in ms**.
234234

235-
```test
235+
```text
236236
[UC][D] Posix task({task_id},{operation},{subtask_number},{size}) dispatching. [PID,TID]
237237
```
238238
This log indicates that the **Posix Store** has received a **load or dump task**
@@ -243,7 +243,7 @@ This log indicates that the **Posix Store** has received a **load or dump task**
243243
| `subtask_number` | Number of subtasks executed in this operation |
244244
| `size` | Total size of data transferred in bytes (across all tasks) |
245245

246-
```test
246+
```text
247247
[UC][D] Posix task({task_id},{operation},{subtask_number},{size}) finished, cost {time}ms. [PID,TID]
248248
```
249249
This log indicates that a load or dump task in the **Posix Store** has completed, along with its execution time in **in ms**.

examples/offline_inference_kvcomphbm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
7777
},
7878
}
7979
],
80-
"ucm_sparse_config": {"GSA": {}},
80+
"ucm_sparse_config": {"KvCompOnDevice": {}},
8181
},
8282
)
8383

examples/ucm_config_example.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,7 @@ load_only_first_rank: false
3131
# Or for GSA:
3232
# GSA: {}
3333
# Or for KvCompOnDevice:
34-
# KvCompOnDevice:
35-
# "kvcompOnDevice_config_path": "workspace/unified-cache-management/ucm/sparse/kvcomp/configs/kvcomp_qwen3_32B_config.json"
34+
# KvCompOnDevice: {}
3635

3736

3837
# Whether to use layerwise loading/saving (optional, default: True for UCMConnector)

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ def build_cmake(self, ext: CMakeExtension):
139139

140140
setup(
141141
name="uc-manager",
142-
version="0.2.0rc2",
142+
version="0.2.0",
143143
description="Unified Cache Management",
144144
author="Unified Cache Team",
145145
packages=find_packages(),

0 commit comments

Comments
 (0)