Log per-server inference metrics by samsja · Pull Request #2650 · PrimeIntellect-ai/prime-rl

samsja · 2026-05-27T03:09:25Z

Summary

add per-server inference metric scopes under inference/server/<server>/...
add direct server liveness metrics via .../up
add KV cache hit aliases and remaining KV cache metrics
keep aggregate and P/D role-aware metrics intact

Testing

/shared/research-prod/prime-rl/.venv/bin/ruff check src/prime_rl/orchestrator/inference_metrics.py tests/unit/orchestrator/test_inference_metrics.py
PYTHONPATH=src:packages/prime-rl-configs/src /shared/research-prod/prime-rl/.venv/bin/python -m pytest tests/unit/orchestrator/test_inference_metrics.py

Note: uv run was not usable in the standalone /tmp worktree because workspace submodules were not initialized there, so I used the existing repo venv against the /tmp worktree.

Note

Low Risk
Observability-only changes to metrics naming and W&B logging; no changes to training, auth, or request handling paths.

Overview
Extends the orchestrator’s W&B inference metrics so you can see each vLLM admin endpoint separately, without changing existing aggregate or prefill/decode role scopes.

Per-server scopes are emitted under inference_debug/server_XX_<host>_<port>/... (stable names from client URL + index). inference_debug/<server>/up tracks liveness for every configured server; failed polls still log up and drop smoothed per-server series so stale values don’t linger. build_scope_metrics now accepts a namespace (default inference) so the same metric builder drives both global and per-server keys.

Adds KV cache aliases on the logged dict: prefix hit rates mirror as kv_cache_hit_rate / cpu_kv_cache_hit_rate, and usage mean/max map to remaining cache metrics (kv_cache_left_perc_*, cpu_kv_cache_left_perc_*).

^{Reviewed by Cursor Bugbot for commit 62512a1. Bugbot is set up for automated code reviews on this repo. Configure here.}

S1ro1 · 2026-05-31T10:31:19Z

Can we move this to a separate wandb section? This pollutes the metrics seen in inference quite a lot, would rather make this a separate debug section or something like taht

Per S1ro1 review feedback: keeps the inference/ W&B section uncluttered by moving per-server scopes (and the .../up liveness gauge) under inference_debug/server/<name>/... instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 852015d. Configure here.}

cursor · 2026-06-01T22:07:17Z

+                metrics[f"{prefix}cpu_kv_cache_left_perc_mean"] = cls.cache_left(value)
+            elif key.endswith("/cpu_kv_cache_usage_max"):
+                prefix = key.removesuffix("cpu_kv_cache_usage_max")
+                metrics[f"{prefix}cpu_kv_cache_left_perc_min"] = cls.cache_left(value)


Unreachable elif branches in cache alias method

Medium Severity

Three elif branches in add_cache_alias_metrics are unreachable dead code. The cpu_-prefixed metric keys (e.g. inference/agg/cpu_prefix_cache_hit_rate) are always caught by earlier branches because "/cpu_prefix_cache_hit_rate".endswith("/prefix_cache_hit_rate") is True. Branches checking for /cpu_prefix_cache_hit_rate, /cpu_kv_cache_usage_mean, and /cpu_kv_cache_usage_max can never execute. The output happens to be correct today because removesuffix with the shorter suffix preserves the cpu_ prefix, but this is coincidental and fragile.

^{Reviewed by Cursor Bugbot for commit 852015d. Configure here.}

cursor · 2026-06-01T22:07:17Z

+        """Do not keep logging stale per-server metrics when a server fails to respond."""
+        for key in list(smoothed_metrics):
+            if key.startswith("inference_debug/server/") and key not in current_metrics:
+                del smoothed_metrics[key]


Stale history contaminates metrics after server recovery

Low Severity

drop_stale_server_metrics removes stale per-server keys from smoothed_metrics but does not clear the corresponding deques in self.metric_history. When a downed server recovers, smooth_metrics averages the first new value with up to 19 stale pre-outage values still sitting in the deque, producing inaccurate smoothed output for up to 100 seconds. For example, a server that had 100 running requests before an outage and 5 after recovery would initially report ~95.

Additional Locations (1)

src/prime_rl/orchestrator/inference_metrics.py#L474-L479

^{Reviewed by Cursor Bugbot for commit 852015d. Configure here.}

S1ro1

lgtm

Drop the redundant `server/` segment so per-server metrics are logged under `inference_debug/<server>/...` instead of `inference_debug/server/<server>/...`.

mikasenghaas · 2026-06-01T23:48:18Z


    async def start(self):
        wandb.define_metric("inference/*", step_metric="_timestamp")
+        wandb.define_metric("inference_debug/*", step_metric="_timestamp")


inference debug?

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/inference_metrics.py

Log per-server inference metrics

6e1758e

samsja force-pushed the codex/per-server-inference-metrics branch from 78dd289 to 6e1758e Compare May 27, 2026 05:09

cursor Bot reviewed Jun 1, 2026

View reviewed changes

Merge branch 'main' into codex/per-server-inference-metrics

a91fa75

S1ro1 previously approved these changes Jun 1, 2026

View reviewed changes

Flatten per-server metrics path under inference_debug

62512a1

Drop the redundant `server/` segment so per-server metrics are logged under `inference_debug/<server>/...` instead of `inference_debug/server/<server>/...`.

samsja dismissed S1ro1’s stale review via 62512a1 June 1, 2026 23:06

mikasenghaas reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log per-server inference metrics#2650

Log per-server inference metrics#2650
samsja wants to merge 4 commits into
mainfrom
codex/per-server-inference-metrics

samsja commented May 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

S1ro1 commented May 31, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 1, 2026

Uh oh!

cursor Bot Jun 1, 2026

Uh oh!

S1ro1 left a comment

Uh oh!

mikasenghaas Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

samsja commented May 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

S1ro1 commented May 31, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 1, 2026

Choose a reason for hiding this comment

Unreachable elif branches in cache alias method

Uh oh!

cursor Bot Jun 1, 2026

Choose a reason for hiding this comment

Stale history contaminates metrics after server recovery

Uh oh!

S1ro1 left a comment

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samsja commented May 27, 2026 •

edited by cursor Bot

Loading