[Core][Metrics] Expose scheduler queue pressure by waiting reason#25546
[Core][Metrics] Expose scheduler queue pressure by waiting reason#25546mukeshbaphna wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces normalized scheduler queue pressure metrics to track both main waiting queue capacity and secondary deferred queues. It adds a new Prometheus gauge sglang:scheduler_queue_pressure with 'capacity' and 'deferred' labels, along with helper functions to compute and log these values across the scheduler. Feedback indicates that these metrics should also be updated within report_decode_stats to avoid stale data during decode-only periods, and suggests refactoring the conditional logic used to calculate deferred request counts for improved readability.
| self.stats.scheduler_queue_pressure_deferred = ( | ||
| compute_normalized_queue_pressure( | ||
| deferred_queue_reqs, self.max_running_requests | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The new scheduler queue pressure metrics (scheduler_queue_pressure_capacity and scheduler_queue_pressure_deferred) are updated in report_prefill_stats and _maybe_log_idle_metrics, but they are missing from report_decode_stats (which starts around line 637). This will result in these Prometheus metrics remaining stale during periods where only decode iterations are running. Please ensure these stats are also computed and updated in report_decode_stats to maintain accuracy under load.
There was a problem hiding this comment.
Fixed. Queue pressure now refreshes in report_decode_stats as well as the other scheduler paths, so the metric does not go stale during decode-only periods
| if self.disaggregation_mode not in ( | ||
| DisaggregationMode.PREFILL, | ||
| DisaggregationMode.DECODE, | ||
| ): | ||
| deferred_queue_reqs = 0 |
There was a problem hiding this comment.
The logic for determining deferred_queue_reqs uses multiple independent if blocks and a membership check. This can be refactored into an if/elif/else structure for better clarity and consistency with the implementation in report_prefill_stats. Additionally, initializing deferred_queue_reqs = 0 at the start of the block would simplify the logic further.
There was a problem hiding this comment.
Done, deferred queue accounting was factored into a small helper for readability and reuse.
|
Addressed the review feedback in the latest push:
Pushed in 645079e. |
Add a Prometheus gauge for normalized scheduler queue pressure so control planes can compare replicas with different capacities.
CI States
Latest PR Test (Base): ❌ Missing
run-cilabel — add it to run CI tests.Latest PR Test (Extra): ❌ Blocked —
run-ciis required first.