[Feat Request] Aggregated Prometheus metrics endpoint in router mode #19197

ebr · 2026-01-30T07:39:15Z

ebr
Jan 30, 2026

Exposition

When running llama-server in router mode with --models-preset flag, the /metrics endpoint currently requires specifying a model via query parameter (e.g., /metrics?model=my-model). This means metrics can only be retrieved for one model at a time.

Proposal

Provide an aggregated metrics endpoint at /metrics (without the model parameter), same as the current single-model behaviour, that exports Prometheus metrics from all currently loaded models, using a model label to differentiate them.

Example output:

# HELP llamacpp_tokens_predicted_total Number of generation tokens processed.
# TYPE llamacpp_tokens_predicted_total counter
llamacpp_tokens_predicted_total{model="gemma-3-4b-it"} 1234
llamacpp_tokens_predicted_total{model="llama-3-8b-instruct"} 5678
 
# HELP llamacpp_prompt_tokens_total Number of prompt tokens processed.
# TYPE llamacpp_prompt_tokens_total counter
llamacpp_prompt_tokens_total{model="gemma-3-4b-it"} 9000
llamacpp_prompt_tokens_total{model="llama-3-8b-instruct"} 12000

# HELP llamacpp_requests_processing Number of requests currently being processed.
# TYPE llamacpp_requests_processing gauge
llamacpp_requests_processing{model="gemma-3-4b-it"} 2
llamacpp_requests_processing{model="llama-3-8b-instruct"} 0

Motivation

Single scrape target for Prometheus: Currently, to monitor all models, you would need to query each model's metrics individually. This is cumbersome and doesn't work well with Prometheus's service discovery model, which expects a single /metrics endpoint per target.
Dynamic model loading: In router mode, models can be loaded and unloaded dynamically. With the current design, a Prometheus configuration would need to be updated every time models change. An aggregated endpoint solves this - Prometheus scrapes one endpoint and automatically gets metrics for whatever models are currently loaded.
Standard Prometheus pattern: Using labels (like model="...") to differentiate instances is the idiomatic Prometheus approach. This enables powerful PromQL queries like:
- sum(rate(llamacpp_tokens_predicted_total[5m])) by (model) - throughput per model
- sum(llamacpp_requests_processing) - total active requests across all models
Operational simplicity: Operators running multi-model deployments can use a single Grafana dashboard with model selectors rather than managing separate dashboards or complex federation setups.

Possible Implementation

The router already tracks loaded models and their ports in server_models. A possible implementation:

When /metrics is called without a model parameter in router mode, the router iterates over all loaded model instances
For each loaded model, the router makes an internal HTTP request to the child's /metrics endpoint
The router parses each child's Prometheus output and adds a model="<model-name>" label to each metric
The aggregated output is returned to the client

Considerations

Caching: To avoid hammering child processes, the router could cache metrics for a short period (e.g., 5-15 seconds)
Timeouts: If a child is slow to respond, the router should use a short timeout and either skip that model or return partial results
Router-level metrics: The router could also export its own metrics (e.g., llamacpp_router_models_loaded, llamacpp_router_requests_total)

If the proposal is acceptable, I could take a stab at the implementation.

pfn · 2026-02-05T05:48:15Z

pfn
Feb 5, 2026

So, I wanted to have this sort of support. It is currently doable without any changes to llama-server. I have all my models monitored like so, by using http service-discovery:

this runs in a uvicorn docker image to make requests to llama-server's model endpoint to determine loaded models, and return target information to prometheus

from fastapi import FastAPI
from fastapi.responses import JSONResponse
import httpx
import json
from urllib.parse import quote
import traceback

app = FastAPI()

@app.get("/targets.json")
async def get_targets():
    try:
        # Fetch models from the models endpoint
        async with httpx.AsyncClient() as client:
            models_response = await client.get("http://host.docker.internal:8080/models")
            models_data = models_response.json()

        # Parse models and find loaded ones
        targets = []
        for model in models_data.get("data", []):
            model_id = model["id"]
            status_value = model.get("status", {}).get("value", "")

            if status_value == "loaded":
                targets.append({
                    "targets": ["host.docker.internal:8080"],
                    "labels": {
                        "llama_model_id": model_id,
                    }
                })

        return JSONResponse(targets)

    except Exception as e:
        # Return empty targets list on error to keep current targets
        traceback.print_exc()
        return JSONResponse([str(e)])
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8080)

prometheus scrape config:

  - job_name: 'llama-server'
    scrape_interval: 5s
    http_sd_configs:
      - url: 'http://host.docker.internal:9091/targets.json'
        refresh_interval: 5s
    relabel_configs:
      - source_labels: [llama_model_id]
        target_label: __param_model
      - source_labels: [__param_autoload]
        target_label: __param_autoload
        replacement: "false"

With this setup, I am able to import all my metrics into prometheus, and then chart them in grafana. Models that are loaded and unloaded are automatically detected and metrics fetching starts for loaded models and stops for unloaded models.

edit: I suppose I could proxy all the /metrics requests directly at this point, but I don't think there's no real value in doing that, other than making the prometheus configuration a little simpler.

4 replies

thiswillbeyourgithub Apr 20, 2026

Hi, I'm having trouble importing from prometheus to grafana because I haven't found a dashboard json anywhere online, to my great surprise. Would you care to send me yours or give me pointers on how to make it? Thanks!

pfn Apr 20, 2026

you can start with this perhaps @thiswillbeyourgithub

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 4,
  "links": [],
  "panels": [
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [
            {
              "options": {
                "match": "null+nan",
                "result": {
                  "index": 0,
                  "text": "0"
                }
              },
              "type": "special"
            }
          ],
          "noValue": "0",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "C"
            },
            "properties": [
              {
                "id": "displayName",
                "value": "${__field.labels.llama_model_id}"
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 15,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "editorMode": "code",
          "expr": "increase(llamacpp:prompt_tokens_total[$__rate_interval])",
          "hide": true,
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "efc2oeiklvksgb"
          },
          "editorMode": "code",
          "expr": "increase(llamacpp:prompt_seconds_total[$__rate_interval])",
          "hide": true,
          "instant": false,
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "B"
        },
        {
          "conditions": [
            {
              "evaluator": {
                "params": [
                  0,
                  0
                ],
                "type": "gt"
              },
              "operator": {
                "type": "and"
              },
              "query": {
                "params": []
              },
              "reducer": {
                "params": [],
                "type": "avg"
              },
              "type": "query"
            }
          ],
          "datasource": {
            "name": "Expression",
            "type": "__expr__",
            "uid": "__expr__"
          },
          "downsampler": "mean",
          "expression": "($B > 0) * $A / (($B > 0) * $B + ($B == 0) * 1) ",
          "hide": false,
          "reducer": "mean",
          "refId": "C",
          "type": "math",
          "upsampler": "fillna"
        }
      ],
      "title": "Prompt Processing Rate (t/s)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [
            {
              "options": {
                "match": "null+nan",
                "result": {
                  "index": 0,
                  "text": "0"
                }
              },
              "type": "special"
            }
          ],
          "noValue": "0",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "C"
            },
            "properties": [
              {
                "id": "displayName",
                "value": "${__field.labels.llama_model_id}"
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "id": 16,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "editorMode": "code",
          "expr": "increase(llamacpp:tokens_predicted_total[$__rate_interval])",
          "hide": true,
          "legendFormat": "__auto",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "efc2oeiklvksgb"
          },
          "editorMode": "code",
          "expr": "increase(llamacpp:tokens_predicted_seconds_total[$__rate_interval])",
          "hide": true,
          "instant": false,
          "legendFormat": "__auto",
          "range": true,
          "refId": "B"
        },
        {
          "datasource": {
            "name": "Expression",
            "type": "__expr__",
            "uid": "__expr__"
          },
          "expression": "($B > 0) * $A / (($B > 0) * $B + ($B == 0) * 1) ",
          "hide": false,
          "refId": "C",
          "type": "math"
        }
      ],
      "title": "Token Generation Rate (t/s)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 8
      },
      "id": 13,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "editorMode": "code",
          "expr": "increase(llamacpp:prompt_tokens_total[$__rate_interval])",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Prompt Tokens",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 8
      },
      "id": 14,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "editorMode": "code",
          "expr": "increase(llamacpp:tokens_predicted_total[$__rate_interval])",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Generated Tokens",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 16
      },
      "id": 5,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "efc2oeiklvksgb"
          },
          "editorMode": "code",
          "expr": "increase(llamacpp:prompt_seconds_total[$__rate_interval])",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Prompt Processing Time",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 16
      },
      "id": 7,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "efc2oeiklvksgb"
          },
          "editorMode": "code",
          "expr": "increase(llamacpp:tokens_predicted_seconds_total[$__rate_interval])",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Token Generation Time",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 24
      },
      "id": 12,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "editorMode": "code",
          "expr": "increase(llamacpp:n_decode_total{}[$__rate_interval])",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "llama_decode()",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "decimals": 0,
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 24
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "efc2oeiklvksgb"
          },
          "editorMode": "code",
          "expr": "llamacpp:requests_processing",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Concurrent Requests",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "efc2oeiklvksgb"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "showValues": false,
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": 0
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 32
      },
      "id": 11,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "hideZeros": true,
          "mode": "multi",
          "sort": "none"
        }
      },
      "pluginVersion": "12.3.2",
      "targets": [
        {
          "editorMode": "builder",
          "expr": "llamacpp:n_tokens_max",
          "legendFormat": "{{llama_model_id}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Context Watermark",
      "type": "timeseries"
    }
  ],
  "preload": false,
  "refresh": "auto",
  "schemaVersion": 42,
  "tags": [
    "llama",
    "llm"
  ],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "10s",
      "15s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h"
    ]
  },
  "timezone": "browser",
  "title": "LLM Metrics Dashboard",
  "uid": "llama-model-metrics",
  "version": 17
}

thiswillbeyourgithub Apr 22, 2026

Thank you very much I couldn't have pulled it off without that. I had to change the "uid": "efc2.*" to "name": "${datasource}" and a few things here and there

lowlyocean May 2, 2026

A model that's technically asleep will still show as "loaded" - is there a way we can filter those out from the targets.json so that scraping doesn't wake up a sleeping model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat Request] Aggregated Prometheus metrics endpoint in router mode #19197

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Feat Request] Aggregated Prometheus metrics endpoint in router mode #19197

Uh oh!

ebr Jan 30, 2026

Exposition

Proposal

Example output:

Motivation

Possible Implementation

Considerations

Replies: 1 comment · 4 replies

Uh oh!

Uh oh!

pfn Feb 5, 2026

Uh oh!

thiswillbeyourgithub Apr 20, 2026

Uh oh!

pfn Apr 20, 2026

Uh oh!

thiswillbeyourgithub Apr 22, 2026

Uh oh!

lowlyocean May 2, 2026

ebr
Jan 30, 2026

Replies: 1 comment 4 replies

pfn
Feb 5, 2026