add sparse method patches for vllm 0.11.0

AooooooA-C · AooooooA-C · commit af1374b331fc · 2026-01-21T01:49:43.000-08:00
diff --git a/docs/source/getting-started/quickstart_vllm.md b/docs/source/getting-started/quickstart_vllm.md
@@ -2,7 +2,7 @@
 This document describes how to install unified-cache-management with vllm on cuda platform.
 
 ## Prerequisites
-- vllm >=0.9.1, device=cuda (vllm == 0.9.2 to use the Sparse Feature)
+- vllm >=0.9.1, device=cuda (Sparse Feature is supported in vllm 0.9.2 and v0.11.0)
 
 ## Step 1: UCM Installation
 
@@ -44,6 +44,7 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
 1. Prepare vLLM Environment
 
     For the sake of environment isolation and simplicity, we recommend preparing the vLLM environment by pulling the official, pre-built vLLM Docker image.
+    > Note: v0.11.0 is newly supported (replace the tag with v0.11.0 if needed).
 
     ```bash
     docker pull vllm/vllm-openai:v0.9.2
@@ -87,6 +88,15 @@ Download the pre-built `vllm/vllm-openai:v0.9.2` docker image and build unified-
     ```
     Apply the patch that matches your development needs:
 
+    #### vLLM 0.11.0 
+
+    Note: v0.11.0 only requires the sparse attention patch.
+
+    ```bash
+    git apply unified-cache-management/ucm/integration/vllm/patch/0.11.0/vllm-adapt-sparse.patch
+    ```
+
+    #### vLLM 0.9.2 
     - Full UCM integration (recommended):
     ```bash
     git apply unified-cache-management/ucm/integration/vllm/patch/0.9.2/vllm-adapt.patch
diff --git a/examples/offline_inference_gsa_on_device.py b/examples/offline_inference_gsa_on_device.py
@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):
                     },
                 }
             ],
-            "ucm_sparse_config": {"KvCompOnDevice": {}},
+            "ucm_sparse_config": {"GSAOnDevice": {}},
         },
     )
 
diff --git a/ucm/integration/vllm/patch/0.11.0/vllm-adapt-sparse.patch b/ucm/integration/vllm/patch/0.11.0/vllm-adapt-sparse.patch

Original file line number	Diff line number	Diff line change
`@@ -77,7 +77,7 @@ def build_llm_with_uc(module_path: str, name: str, model: str):`
`77`	`77`	`},`
`78`	`78`	`}`
`79`	`79`	`],`
`80`		`- "ucm_sparse_config": {"KvCompOnDevice": {}},`
	`80`	`+ "ucm_sparse_config": {"GSAOnDevice": {}},`
`81`	`81`	`},`
`82`	`82`	`)`
`83`	`83`