|
2 | 2 |
|
3 | 3 | ## Prometheus |
4 | 4 |
|
5 | | -To collect and export fleet and run metrics to Prometheus, enable the |
| 5 | +To collect and export fleet and run as well as server health metrics to Prometheus, enable the |
6 | 6 | `DSTACK_ENABLE_PROMETHEUS_METRICS` environment variable and configure Prometheus to fetch metrics from |
7 | 7 | `<dstack server URL>/metrics`. |
8 | 8 |
|
@@ -140,3 +140,21 @@ telemetry, and more. |
140 | 140 | | `dstack_run_type` | *string* | Run configuration type | `task`, `dev-environment` | |
141 | 141 | | `dstack_backend` | *string* | Backend | `aws`, `runpod` | |
142 | 142 | | `dstack_gpu` | *string?* | GPU name | `H100` | |
| 143 | + |
| 144 | +### Server health metrics |
| 145 | + |
| 146 | +These are operational metrics to monitor the health of the dstack server. For now, these only include HTTP metrics, but more will be added later. |
| 147 | + |
| 148 | +=== "Metrics" |
| 149 | + | Name | Type | Description | Examples | |
| 150 | + |------------------------------------------|-----------|-----------------------------------|--------------| |
| 151 | + | `dstack_server_requests_total` | *counter* | Total number of HTTP requests | `100.0` | |
| 152 | + | `dstack_server_request_duration_seconds` | *histogram* | HTTP request duration in seconds | `1.0`| |
| 153 | + |
| 154 | +=== "Labels" |
| 155 | + | Name | Type | Description | Examples | |
| 156 | + |------------------------|-----------|:--------------|----------------------------------------| |
| 157 | + | `method` | *string* | HTTP method | `POST` | |
| 158 | + | `endpoint` | *string* | Endpoint path | `/api/project/main/repos/get` | |
| 159 | + | `http_status` | *string* | HTTP status code | `200` | |
| 160 | + | `project_name` | *string?* | Project name | `main` | |
0 commit comments