Skip to content

Commit 054a3a8

Browse files
authored
[doc] Update quickstart and calculator config (#890)
1 parent d5520f6 commit 054a3a8

4 files changed

Lines changed: 25 additions & 25 deletions

File tree

docker/Dockerfile.vllm_npu

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,6 @@ FROM ${BASE_IMAGE}
55

66
ARG PIP_INDEX_URL="https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
77

8-
# Apply the UCM monkey patch for vllm & vllm_ascend
9-
ENV ENABLE_UCM_PATCH=1
10-
118
WORKDIR /workspace
129

1310
# Install unified-cache-management

docs/source/_static/kv_cache_calculator.html

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1052,19 +1052,28 @@ <h2 class="section-title">
10521052
"num_key_value_heads": 8
10531053
},
10541054
// GLM Models (Zhipu AI)
1055-
"zai-org/GLM-4.6": {
1055+
"zai-org/GLM-4.5":{
1056+
"hidden_size": 6144,
1057+
"num_attention_heads": 64,
1058+
"num_hidden_layers": 78,
1059+
"num_key_value_heads": 64,
1060+
"kv_lora_rank": 512,
1061+
"qk_rope_head_dim": 64
1062+
},
1063+
"zai-org/GLM-4.7":{
10561064
"hidden_size": 5120,
10571065
"num_attention_heads": 96,
1058-
"num_hidden_layers": 62,
1059-
"num_key_value_heads": 4
1066+
"num_hidden_layers": 92,
1067+
"num_key_value_heads": 8,
1068+
"head_dim": 128
10601069
},
1061-
"zai-org/GLM-4.7": {
1070+
"zai-org/GLM-4.6":{
10621071
"hidden_size": 5120,
10631072
"num_attention_heads": 96,
1064-
"num_hidden_layers": 62,
1065-
"num_key_value_heads": 4
1073+
"num_hidden_layers": 92,
1074+
"num_key_value_heads": 8,
1075+
"head_dim": 128
10661076
},
1067-
10681077
// Kimi Models (Moonshot AI)
10691078
"moonshotai/Kimi-K2-Instruct-0905": {
10701079
"hidden_size": 7168,
@@ -1075,18 +1084,13 @@ <h2 class="section-title">
10751084
"qk_rope_head_dim": 64
10761085
},
10771086
// MiniMax Models
1078-
"MiniMaxAI/MiniMax-M2": {
1079-
"hidden_size": 5632,
1080-
"num_attention_heads": 44,
1087+
"MiniMaxAI/MiniMax-M2.5": {
1088+
"hidden_size": 3072,
1089+
"num_attention_heads": 48,
10811090
"num_hidden_layers": 62,
1082-
"num_key_value_heads": 4
1091+
"num_key_value_heads": 8,
1092+
"head_dim": 128
10831093
},
1084-
"MiniMaxAI/MiniMax-M2.1": {
1085-
"hidden_size": 5632,
1086-
"num_attention_heads": 44,
1087-
"num_hidden_layers": 62,
1088-
"num_key_value_heads": 4
1089-
}
10901094
};
10911095
}
10921096

docs/source/getting-started/quickstart_vllm.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,10 +49,9 @@ docker build -t ucm-vllm-sparse:latest -f ./docker/Dockerfile.vllm_gpu_v0110 ./
4949
1. Prepare vLLM Environment
5050

5151
For the sake of environment isolation and simplicity, we recommend preparing the vLLM environment by pulling the official, pre-built vLLM Docker image.
52-
> Note: v0.11.0 is newly supported (replace the tag with v0.11.0 if needed).
5352

5453
```bash
55-
docker pull vllm/vllm-openai:v0.11.0
54+
docker pull vllm/vllm-openai:<vllm_version>
5655
```
5756
Use the following command to run your own container:
5857
```bash
@@ -65,7 +64,7 @@ docker build -t ucm-vllm-sparse:latest -f ./docker/Dockerfile.vllm_gpu_v0110 ./
6564
-v <path_to_your_storage>:/home/storage \
6665
--entrypoint /bin/bash \
6766
--name <name_of_your_container> \
68-
-it vllm/vllm-openai:v0.9.2
67+
-it vllm/vllm-openai:<vllm_version>
6968
```
7069
Refer to [Set up using docker](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#set-up-using-docker) for more information to run your own vLLM container.
7170

docs/source/getting-started/quickstart_vllm_ascend.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ cd ..
2525

2626
>**Note:** For the Atlas A3 series, the `PLATFORM` variable should be set to `ascend-a3`.
2727
28-
2、Apply vLLM and vLLM-Ascend Integration Patches (Required)
28+
2、Apply vLLM and vLLM-Ascend Integration Patches (Not required for versions >= v0.17.0rc1)
2929
To enable Unified Cache Management (UCM) integration, you need to apply patches to both vLLM and vLLM-Ascend source trees.
3030

3131
#### Option A: Monkey Patch (Recommended)
@@ -38,7 +38,7 @@ export ENABLE_UCM_PATCH=1
3838
```
3939
>**Note:** Enabling ENABLE_UCM_PATCH is required to use the Prefix Caching feature with UCM.
4040
41-
2. Enable Sparse Attention (Optional):
41+
2. Enable Sparse Attention (supported on v0.11.0):
4242
```bash
4343
export ENABLE_SPARSE=1
4444
```

0 commit comments

Comments
 (0)