Skip to content

[Core][Metrics] Expose scheduler queue pressure by waiting reason#25546

Open
mukeshbaphna wants to merge 2 commits into
sgl-project:mainfrom
mukeshbaphna:mukesh/queue-pressure
Open

[Core][Metrics] Expose scheduler queue pressure by waiting reason#25546
mukeshbaphna wants to merge 2 commits into
sgl-project:mainfrom
mukeshbaphna:mukesh/queue-pressure

Conversation

@mukeshbaphna
Copy link
Copy Markdown

@mukeshbaphna mukeshbaphna commented May 17, 2026

Add a Prometheus gauge for normalized scheduler queue pressure so control planes can compare replicas with different capacities.

  • sglang:scheduler_queue_pressure with reason=capacity|deferred
  • normalizes queue depth by max_running_requests
  • surfaces both the main waiting queue and deferred/disaggregation queues
  • adds metrics coverage for the new series

CI States

Latest PR Test (Base): ❌ Missing run-ci label — add it to run CI tests.
Latest PR Test (Extra): ❌ Blockedrun-ci is required first.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces normalized scheduler queue pressure metrics to track both main waiting queue capacity and secondary deferred queues. It adds a new Prometheus gauge sglang:scheduler_queue_pressure with 'capacity' and 'deferred' labels, along with helper functions to compute and log these values across the scheduler. Feedback indicates that these metrics should also be updated within report_decode_stats to avoid stale data during decode-only periods, and suggests refactoring the conditional logic used to calculate deferred request counts for improved readability.

Comment on lines +622 to +626
self.stats.scheduler_queue_pressure_deferred = (
compute_normalized_queue_pressure(
deferred_queue_reqs, self.max_running_requests
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new scheduler queue pressure metrics (scheduler_queue_pressure_capacity and scheduler_queue_pressure_deferred) are updated in report_prefill_stats and _maybe_log_idle_metrics, but they are missing from report_decode_stats (which starts around line 637). This will result in these Prometheus metrics remaining stale during periods where only decode iterations are running. Please ensure these stats are also computed and updated in report_decode_stats to maintain accuracy under load.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Queue pressure now refreshes in report_decode_stats as well as the other scheduler paths, so the metric does not go stale during decode-only periods

Comment on lines +558 to +562
if self.disaggregation_mode not in (
DisaggregationMode.PREFILL,
DisaggregationMode.DECODE,
):
deferred_queue_reqs = 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for determining deferred_queue_reqs uses multiple independent if blocks and a membership check. This can be refactored into an if/elif/else structure for better clarity and consistency with the implementation in report_prefill_stats. Additionally, initializing deferred_queue_reqs = 0 at the start of the block would simplify the logic further.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, deferred queue accounting was factored into a small helper for readability and reuse.

@mukeshbaphna
Copy link
Copy Markdown
Author

mukeshbaphna commented May 18, 2026

Addressed the review feedback in the latest push:

  • queue pressure now refreshes in report_decode_stats as well as the other scheduler paths, so the metric does not go stale during decode-only periods
  • deferred queue accounting was factored into a small helper for readability and reuse

Pushed in 645079e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants