Skip to content

Commit b2f18f4

Browse files
committed
update doc
1 parent 3ccbd66 commit b2f18f4

3 files changed

Lines changed: 3 additions & 3 deletions

File tree

docs/source/getting-started/quickstart_vllm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,7 @@ You may directly edit the example file at `unified-cache-management/examples/ucm
149149

150150
### Feature 2: Sparsity
151151

152-
The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` and re-compile the code you built. And uncomment `ucm_sparse_config` code block in `unified-cache-management/examples/ucm_config_example.yaml`.
152+
The sparse module was not compiled by default. To enable it, set the environment variable `export ENABLE_SPARSE=TRUE` and re-compile the code you built. And uncomment `ucm_sparse_config` code block in `unified-cache-management/examples/ucm_config_example.yaml`. Additionally, if you want to run GSAOnDevice, you also need to set the environment variable `export VLLM_HASH_ATTENTION=1`.
153153

154154
## Step 3: Launching Inference
155155

File renamed without changes.

examples/ucm_config_example.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ load_only_first_rank: false
3030
# retrieval_stride: 5
3131
# Or for GSA:
3232
# GSA: {}
33-
# Or for KvCompOnDevice:
34-
# KvCompOnDevice: {}
33+
# Or for GSAOnDevice:
34+
# GSAOnDevice: {}
3535

3636

3737
# Whether to use layerwise loading/saving (optional, default: True for UCMConnector)

0 commit comments

Comments
 (0)