Expose per-run cached prompt tokens metric by JannikSt · Pull Request #43 · PrimeIntellect-ai/router

JannikSt · 2026-06-11T21:14:34Z

Parses usage.prompt_tokens_details.cached_tokens from vLLM responses and emits it as a new per-run counter so the platform can price cached input separately.

New metric: vllm_router_run_cached_prompt_tokens_total{run_id}
Existing prompt_tokens_total is unchanged (still total input including cached)
Field is optional in the parser, so non-vLLM upstreams continue to work

Note

Low Risk
Additive metrics and optional JSON parsing on existing usage extraction; no change to prompt-token totals or request counting semantics.

Overview
Adds per-run billing visibility for KV/prefix-cached prompt tokens from vLLM without changing how total prompt tokens are counted.

Upstream usage.prompt_tokens_details.cached_tokens is parsed (optional; defaults to 0 when missing) in usage_metrics and passed into RouterMetrics::record_run_usage. A new counter vllm_router_run_cached_prompt_tokens_total{run_id} records that subset only when it is > 0; vllm_router_run_prompt_tokens_total still reflects full input tokens. Tests cover present, absent, and empty prompt_tokens_details shapes.

^{Reviewed by Cursor Bugbot for commit 33c2e41. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Expose per-run cached prompt tokens metric from upstream KV/prefix cache

Adds a new vllm_router_run_cached_prompt_tokens_total counter in metrics.rs, incremented per run_id when cached tokens are present.
Parses prompt_tokens_details.cached_tokens from upstream usage responses in usage_metrics.rs and passes the value to RouterMetrics::record_run_usage.
Adds three unit tests covering presence, absence, and empty prompt_tokens_details cases.

^{Macroscope summarized 33c2e41.}

Parse usage.prompt_tokens_details.cached_tokens from upstream vLLM responses and emit it as vllm_router_run_cached_prompt_tokens_total labeled by run_id. The existing prompt_tokens counter is unchanged (still reports total input tokens including cached); the new counter is purely additive so downstream billing can apply a separate price to the cached portion.

macroscopeapp · 2026-06-11T21:16:15Z

Approvability

Verdict: Approved

Additive metrics change that exposes a new counter for cached prompt tokens. The implementation is backward-compatible (handles missing fields gracefully), well-tested, and doesn't alter existing billing or processing logic.

^{You can customize Macroscope's approvability policy. Learn more.}

JannikSt · 2026-06-11T21:59:55Z

@codex review

chatgpt-codex-connector · 2026-06-11T22:02:24Z

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

macroscopeapp Bot approved these changes Jun 11, 2026

View reviewed changes

JannikSt merged commit 045cba2 into main Jun 12, 2026
9 checks passed

JannikSt mentioned this pull request Jun 12, 2026

release: v0.1.28 #44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose per-run cached prompt tokens metric#43

Expose per-run cached prompt tokens metric#43
JannikSt merged 1 commit into
mainfrom
improvement/cache-hit-metrics

JannikSt commented Jun 11, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 11, 2026

Uh oh!

JannikSt commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JannikSt commented Jun 11, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Expose per-run cached prompt tokens metric from upstream KV/prefix cache

Uh oh!

macroscopeapp Bot commented Jun 11, 2026

Approvability

Uh oh!

JannikSt commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JannikSt commented Jun 11, 2026 •

edited by macroscopeapp Bot

Loading