Replies: 1 comment 4 replies
-
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
Exposition
When running
llama-serverin router mode with--models-presetflag, the/metricsendpoint currently requires specifying a model via query parameter (e.g.,/metrics?model=my-model). This means metrics can only be retrieved for one model at a time.Proposal
Provide an aggregated metrics endpoint at
/metrics(without themodelparameter), same as the current single-model behaviour, that exports Prometheus metrics from all currently loaded models, using amodellabel to differentiate them.Example output:
Motivation
sum(rate(llamacpp_tokens_predicted_total[5m])) by (model)- throughput per modelsum(llamacpp_requests_processing)- total active requests across all modelsPossible Implementation
The router already tracks loaded models and their ports in server_models. A possible implementation:
/metricsis called without a model parameter in router mode, the router iterates over all loaded model instancesmodel="<model-name>"label to each metricConsiderations
llamacpp_router_models_loaded,llamacpp_router_requests_total)If the proposal is acceptable, I could take a stab at the implementation.
Beta Was this translation helpful? Give feedback.
All reactions