Skip to content

Commit 3b26ce4

Browse files
authored
[BugFix] Add patch for local_cache_hit calculation in vllm 0.18.0 met… (#941)
## Purpose Fix negative `local_cache_hit` calculation that can crash Prometheus counters. Related vLLM issue: vllm-project/vllm#36755 When preemption occurs with async scheduling, there's a race condition where: - `schedule(N+1)` can preempt a request and reset its state - `update_from_output(N)` reads the already-mutated request state This can result in `num_external_computed_tokens` exceeding `num_cached_tokens + recomputed`, causing `local_cache_hit` to become negative. Prometheus counters then crash with: ValueError: Counters can only be incremented by non-negative amounts. ## Modifications - Patch `vllm.v1.metrics.stats.PromptTokenStats.update_from_output` to wrap `local_cache_hit` calculation with `max(0, ...)` - Add `v0180/vllm/pc/metrics/stats.py` with the patched method - Add `v0180/vllm/pc_patch.py` to register the patch - Update `apply_patch.py` to support vllm 0.18.0 version
1 parent f0323d4 commit 3b26ce4

6 files changed

Lines changed: 37 additions & 0 deletions

File tree

ucm/integration/vllm/patch/apply_patch.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,9 @@ def apply_all_patches() -> None:
136136
if ENABLE_SPARSE:
137137
logger.info("UCM patching vllm for sparse...")
138138
import ucm.integration.vllm.patch.v0110.vllm.sparse_patch
139+
case "0.18.0":
140+
logger.info("UCM patching vllm for pc...")
141+
import ucm.integration.vllm.patch.v0180.vllm.pc_patch
139142
case _:
140143
pass
141144

File renamed without changes.

ucm/integration/vllm/patch/v0180/vllm/pc/__init__.py

Whitespace-only changes.

ucm/integration/vllm/patch/v0180/vllm/pc/metrics/__init__.py

Whitespace-only changes.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
def update_from_output(
2+
self,
3+
num_cached_tokens: int,
4+
num_external_computed_tokens: int,
5+
prompt_len: int,
6+
) -> None:
7+
"""Update stats from a prefill output."""
8+
recomputed = 1 if (num_cached_tokens + 1 == prompt_len) else 0
9+
10+
self.computed += prompt_len - num_cached_tokens
11+
self.external_kv_transfer += num_external_computed_tokens
12+
self.local_cache_hit += max(
13+
0, num_cached_tokens + recomputed - num_external_computed_tokens
14+
)
15+
self.cached_tokens += num_cached_tokens
16+
self.recomputed_tokens += recomputed
17+
self.total += prompt_len
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
from ucm.integration.vllm.patch.utils import patch_or_inject, when_imported
2+
from ucm.logger import init_logger
3+
4+
logger = init_logger(__name__)
5+
6+
7+
@when_imported("vllm.v1.metrics.stats")
8+
def patch_stats(mod):
9+
logger.debug(f"Patched {mod} called")
10+
11+
from ucm.integration.vllm.patch.v0180.vllm.pc.metrics import stats
12+
13+
patch_or_inject(
14+
mod.PromptTokenStats,
15+
"update_from_output",
16+
stats.update_from_output,
17+
)

0 commit comments

Comments
 (0)