Problem Statement
Add a simple running-requests scorer to score models based on their current in-flight request count, ranking the least loaded model highest by assigning a normalized score in [0.0, 1.0]. The most loaded model receives a score of 0.0, and the least loaded receives a score of 1.0.
Proposed Solution
RunningRequestsScorer implements the Scorer interface embedding Plugin.
- The scorer reads the
running-requests attribute from each model, populated by RunningRequestsExtractor in the datalayer.
Models without a running-requests attribute are treated as idle (0 requests).
If all models have the same count (including all zero), all receive a score of 1.0.
If there is only one model, it receives a score of 1.0.
Alternatives Considered
No response
Willingness to Contribute
Yes, I can submit a PR
Additional Context
No response
Problem Statement
Add a simple running-requests scorer to score models based on their current in-flight request count, ranking the least loaded model highest by assigning a normalized score in [0.0, 1.0]. The most loaded model receives a score of 0.0, and the least loaded receives a score of 1.0.
Proposed Solution
RunningRequestsScorerimplements the Scorer interface embedding Plugin.running-requestsattribute from each model, populated byRunningRequestsExtractorin thedatalayer.Models without a
running-requestsattribute are treated as idle (0 requests).If all models have the same count (including all zero), all receive a score of 1.0.
If there is only one model, it receives a score of 1.0.
Alternatives Considered
No response
Willingness to Contribute
Yes, I can submit a PR
Additional Context
No response