Skip to content

[BugFix] Add patch for local_cache_hit calculation in vllm 0.18.0 met…#941

Merged
qyh111 merged 1 commit intoModelEngine-Group:developfrom
sumingZero:patch_018
Apr 29, 2026
Merged

[BugFix] Add patch for local_cache_hit calculation in vllm 0.18.0 met…#941
qyh111 merged 1 commit intoModelEngine-Group:developfrom
sumingZero:patch_018

Conversation

@sumingZero
Copy link
Copy Markdown
Contributor

Purpose

Fix negative local_cache_hit calculation that can crash Prometheus counters.
Related vLLM issue: vllm-project/vllm#36755
When preemption occurs with async scheduling, there's a race condition where:

  • schedule(N+1) can preempt a request and reset its state
  • update_from_output(N) reads the already-mutated request state
    This can result in num_external_computed_tokens exceeding num_cached_tokens + recomputed, causing local_cache_hit to become negative. Prometheus counters then crash with:
    ValueError: Counters can only be incremented by non-negative amounts.

Modifications

  • Patch vllm.v1.metrics.stats.PromptTokenStats.update_from_output to wrap local_cache_hit calculation with max(0, ...)
  • Add v0180/vllm/pc/metrics/stats.py with the patched method
  • Add v0180/vllm/pc_patch.py to register the patch
  • Update apply_patch.py to support vllm 0.18.0 version

@qyh111 qyh111 merged commit 3b26ce4 into ModelEngine-Group:develop Apr 29, 2026
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants