diff --git a/docs/source/user-guide/metrics/metrics.md b/docs/source/user-guide/metrics/metrics.md index 1784bae82..16a3d0b98 100644 --- a/docs/source/user-guide/metrics/metrics.md +++ b/docs/source/user-guide/metrics/metrics.md @@ -2,21 +2,24 @@ UCM exports metrics through the vLLM `/metrics` endpoint. The metrics are registered from `examples/metrics/metrics_configs.yaml`, accumulated inside UCM, -and exposed through `prometheus_client` in Prometheus multiprocess mode. +and fanned out to the enabled Python-side consumers. ## How Metrics Flow 1. `metrics_configs.yaml` defines counters, gauges, and histograms. -2. `PrometheusStatsLogger` creates matching `prometheus_client` metrics with - `model_name` and `worker_id` labels. -3. Histogram bucket boundaries are taken from the Python Prometheus histogram +2. The Python metrics dispatcher drains the C++ metrics snapshot once and fans + it out to the enabled `multiproc` and `vllm_connector` consumers. +3. `multiproc` creates `prometheus_client` metrics with `model_name` and + `worker_id` labels. `vllm_connector` creates vLLM KV connector metrics with + `model_name`, `engine`, and `worker_rank` labels. +4. Histogram bucket boundaries are taken from the Python Prometheus histogram and registered into the C++ metrics library. -4. UCM code calls `UpdateStats()` on the hot path. -5. The C++ metrics library records counter, gauge, and histogram bucket deltas in +5. UCM code calls `UpdateStats()` on the hot path. +6. The C++ metrics library records counter, gauge, and histogram bucket deltas in per-thread double buffers. -6. Every `log_interval` seconds, the observability thread calls - `get_all_stats_and_clear()` and applies the deltas to `prometheus_client`. -7. vLLM exposes the resulting cumulative Prometheus series through `/metrics`. +7. The dispatcher applies deltas to each enabled Python consumer without one + consumer clearing the other's accumulated snapshot. +8. vLLM exposes the resulting cumulative Prometheus series through `/metrics`. Histograms are bucketed at update time. UCM no longer stores raw histogram sample vectors, so there is no `histogram_max_length` setting and no histogram @@ -83,14 +86,14 @@ vllm bench serve \ --ignore-eos ``` -Check that UCM metrics are present: +Check that UCM vLLM connector metrics are present: ```bash -curl http://:8000/metrics | grep ucm: +curl http://:8000/metrics | grep 'ucm:' ``` -Prometheus multiprocess `.db` files should also appear in -`$PROMETHEUS_MULTIPROC_DIR`. +If the `multiproc` consumer is enabled, Prometheus multiprocess `.db` files +should also appear in `$PROMETHEUS_MULTIPROC_DIR`. ### 2. Start Prometheus and Grafana @@ -158,16 +161,16 @@ dashboards while preserving the time range and `model_name` value. Each dashboard has a `job` selector. It defaults to **All** and uses regex matching, so dashboards also work for metrics that do not carry a `job` label. -The UCM dashboards also have a `View` selector and a `worker_id` selector: +The UCM dashboards also have a `View` selector and a `worker_rank` selector: - **Aggregated**: default service-level view. Worker labels are collapsed. -- **Per Worker**: split panels by `worker_id` for worker-specific diagnosis. -- **worker_id**: defaults to **All**. Select a specific worker ID to filter all +- **Per Worker**: split panels by `worker_rank` for worker-specific diagnosis. +- **worker_rank**: defaults to **All**. Select a specific worker rank to filter all UCM panels to that worker only. Heatmap panels and panels grouped by another dimension may ignore the `View` selector because their grouping is already defined by the panel. They still use -the `worker_id` filter. +the `worker_rank` filter. ## Metrics Configuration @@ -175,8 +178,13 @@ Metrics are configured in `examples/metrics/metrics_configs.yaml`: ```yaml log_interval: 5 -multiproc_dir: "/vllm-workspace" -metric_prefix: "ucm:" +# multiproc_dir: "/vllm-workspace" +# multiproc_prefix: "ucm_multiproc:" +vllm_connector_prefix: "ucm:" + +consumers: + # multiproc: true + vllm_connector: true counter: - name: "cache_load_bytes_total" @@ -193,9 +201,12 @@ histogram: buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000] ``` -Metric names are exported with the configured prefix. For example, -`cache_load_duration_ms` becomes `ucm:cache_load_duration_ms`. Prometheus also -exports histogram helper series such as `_bucket`, `_sum`, and `_count`. +Metric names are exported per consumer. For example, `cache_load_duration_ms` +is exported as `ucm:cache_load_duration_ms` by the default `vllm_connector` +consumer. If `multiproc` is also enabled, use a separate prefix such as +`ucm_multiproc:` so both consumers do not register the same Prometheus metric. +Prometheus also exports histogram helper series such as `_bucket`, `_sum`, +and `_count`. Counter values are increments. Gauge values replace the current value. Histogram values are observations that are immediately assigned to configured @@ -203,7 +214,9 @@ buckets in the C++ metrics library. ## Available Metrics -The default metrics configuration contains the following UCM metrics. +The default metrics configuration contains the following UCM metric names. The +table uses the default `vllm_connector_prefix`. UCM duration metrics are +exported in milliseconds. ### Counters @@ -240,23 +253,28 @@ The default metrics configuration contains the following UCM metrics. | `ucm:load_speed` | Speed of loading from UCM in GB/s. | | `ucm:save_requests_num` | Number of requests saved to UCM. | | `ucm:save_blocks_num` | Number of blocks saved to UCM. | -| `ucm:save_duration` | Time to save to UCM in milliseconds. | -| `ucm:save_speed` | Speed of saving to UCM in GB/s. | +| `ucm:save_duration` | Time from UCM connector `wait_for_save` entry to async dump task completion in milliseconds. | +| `ucm:save_completion_wait_duration` | Time spent blocked while confirming async UCM connector dump completion in milliseconds. | | `ucm:interval_lookup_hit_rates` | Hit rates of UCM lookup requests. | | `ucm:cache_lookup_duration_ms` | Cache buffer lookup wall-clock time. | | `ucm:cache_lookup_backend_duration_ms` | Backend lookup wall-clock time when descending due to no buffer or buffer miss. | | `ucm:cache_load_duration_ms` | End-to-end Cache stage load task duration in milliseconds. | | `ucm:cache_dump_duration_ms` | End-to-end Cache stage dump task duration in milliseconds. | -| `ucm:cache_load_bandwidth_gbps` | Cache stage effective load bandwidth in GB/s. | -| `ucm:cache_dump_bandwidth_gbps` | Cache stage effective dump bandwidth in GB/s. | +| `ucm:cache_load_bandwidth_gbps` | Cache stage effective load throughput in GB/s over the whole task lifetime, including queue/backend waits. Not a DMA bandwidth (see `cache_h2d_bandwidth_gbps`). | +| `ucm:cache_dump_bandwidth_gbps` | Cache stage effective dump throughput in GB/s over the whole task lifetime, including queue and compute-event waits. Not a DMA bandwidth (see `cache_d2h_bandwidth_gbps`). | | `ucm:cache_load_queue_wait_duration_ms` | Time a Cache load task spent queued before dispatch worker pickup. | | `ucm:cache_dump_queue_wait_duration_ms` | Time a Cache dump task spent queued before dispatch worker pickup. | -| `ucm:cache_load_dispatch_duration_ms` | Cache load dispatch cost: buffer allocation plus backend submission. | +| `ucm:cache_load_backend_submit_duration_ms` | Cache load backend submit duration: buffer allocation plus backend load submission. | | `ucm:cache_shard_backend_wait_ms` | Cache load per-shard `WaitBackendTaskReady()` duration. | -| `ucm:cache_shard_h2d_ms` | Cache load per-shard H2D async submit duration. | -| `ucm:cache_dump_mkbuf_duration_ms` | Cache dump mk_buf phase duration. | -| `ucm:cache_d2h_duration_ms` | Cache dump D2H stream sync phase duration. | +| `ucm:cache_h2d_submit_ms` | Cache load per-shard H2D async submit CPU cost. Submission only, not the transfer. | +| `ucm:cache_h2d_sync_ms` | Cache load residual H2D stream drain after the last shard submit. Large values mean H2D copy is the bottleneck. | +| `ucm:cache_h2d_bandwidth_gbps` | Cache load pure H2D copy bandwidth, directly comparable to memcpy microbenchmarks. | +| `ucm:cache_dump_mkbuf_duration_ms` | Cache dump mk_buf phase duration (buffer allocation/reuse plus D2H async submit). | +| `ucm:cache_dump_prereq_wait_ms` | Cache dump wait for the prerequisite compute event before D2H can start. Large values mean dump is compute-gated. | +| `ucm:cache_d2h_duration_ms` | Cache dump pure D2H copy drain, compute-event wait excluded. | +| `ucm:cache_d2h_bandwidth_gbps` | Cache dump pure D2H copy bandwidth, directly comparable to memcpy microbenchmarks. | | `ucm:cache_dump_backend_submit_duration_ms` | Cache dump synchronous backend submit duration. | +| `ucm:cache_dump_backend_wait_duration_ms` | Cache dump wait for the lower tier to finish writing. Large values mean storage write is the bottleneck. | | `ucm:posix_load_task_duration_ms` | End-to-end Posix load task duration. | | `ucm:posix_dump_task_duration_ms` | End-to-end Posix dump task duration. | | `ucm:posix_s2h_bandwidth_gbps` | Posix stage read bandwidth per task in GB/s. | @@ -264,6 +282,14 @@ The default metrics configuration contains the following UCM metrics. | `ucm:posix_load_queue_wait_duration_ms` | Time a Posix load task spent queued before first worker pickup. | | `ucm:posix_dump_queue_wait_duration_ms` | Time a Posix dump task spent queued before first worker pickup. | | `ucm:layerwise_batch_total_ms` | Layerwise batch wall-clock time from `start_load_kv()` entry to `wait_for_save()` return. | +| `ucm:layerwise_batch_total_load_only_ms` | Layerwise load-only batch wall-clock time. | +| `ucm:layerwise_batch_total_save_only_ms` | Layerwise save-only batch wall-clock time. | +| `ucm:layerwise_batch_total_load_save_ms` | Layerwise load-and-save batch wall-clock time. | +| `ucm:layerwise_batch_total_no_transfer_ms` | Layerwise batch wall-clock time with neither load nor save work. | +| `ucm:layerwise_batch_load_wait_total_load_only_ms` | Total `wait_for_layer_load()` blocking time accumulated within one load-only layerwise batch. | +| `ucm:layerwise_batch_load_wait_total_load_save_ms` | Total `wait_for_layer_load()` blocking time accumulated within one load-and-save layerwise batch. | +| `ucm:layerwise_batch_save_tail_save_only_ms` | `wait_for_save()` tail duration within one save-only layerwise batch. | +| `ucm:layerwise_batch_save_tail_load_save_ms` | `wait_for_save()` tail duration within one load-and-save layerwise batch. | | `ucm:layerwise_wait_blocking_ms` | Time `wait_for_layer_load()` blocked before returning. | | `ucm:layerwise_wait_tasks_count` | Number of per-request load tasks awaited in a single layer wait. | | `ucm:layerwise_inter_wait_interval_ms` | Interval between consecutive `wait_for_layer_load()` calls. | diff --git a/examples/metrics/grafana_connector.json b/examples/metrics/grafana_connector.json index 6fec9b972..a79441a37 100644 --- a/examples/metrics/grafana_connector.json +++ b/examples/metrics/grafana_connector.json @@ -109,7 +109,6 @@ "value": "model_name" }, "hide": 0, - "includeAll": false, "label": "View", "multi": false, "name": "perWorker", @@ -122,10 +121,10 @@ { "selected": false, "text": "Per Worker", - "value": "model_name, worker_id" + "value": "model_name, engine, worker_rank" } ], - "query": "Aggregated : model_name,Per Worker : model_name\\, worker_id", + "query": "Aggregated : model_name,Per Worker : model_name\\, engine\\, worker_rank", "queryValue": "", "skipUrlSync": false, "type": "custom" @@ -141,15 +140,43 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, worker_id)", + "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, engine)", + "hide": 0, + "includeAll": true, + "label": "engine", + "multi": false, + "name": "engine", + "options": [], + "query": { + "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, engine)", + "refId": "StandardVariableQuery" + }, + "refresh": 1, + "regex": "", + "skipUrlSync": false, + "sort": 2, + "type": "query" + }, + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\", engine=~\"$engine\"}, worker_rank)", "hide": 0, "includeAll": true, - "label": "worker_id", + "label": "worker_rank", "multi": false, - "name": "worker_id", + "name": "worker_rank", "options": [], "query": { - "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, worker_id)", + "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\", engine=~\"$engine\"}, worker_rank)", "refId": "StandardVariableQuery" }, "refresh": 1, @@ -168,21 +195,21 @@ "timezone": "", "weekStart": "", "refresh": "", - "title": "vLLM - UCM Connector", - "uid": "ucm-connector-overview", + "title": "vLLM - UCM Connector (vLLM Metrics)", + "uid": "ucm-vllm-connector-overview", "version": 1, "id": null, "tags": [ - "ucm", + "ucm-vllm-connector-metrics", "connector" ], - "description": "UCM observability dashboard - connector module. Part of a set; the other UCM dashboards (tag: ucm) are linked from the header.", + "description": "UCM observability dashboard - connector module. Part of a set; the other UCM dashboards (tag: ucm-vllm-connector-metrics) are linked from the header.", "links": [ { "type": "dashboards", - "title": "Other UCM dashboards", + "title": "Other UCM vLLM metrics dashboards", "tags": [ - "ucm" + "ucm-vllm-connector-metrics" ], "asDropdown": true, "includeVars": true, @@ -197,7 +224,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Hit rates of ucm lookup requests", + "description": "External KV connector prefix-cache hit rate from vLLM built-in counters.", "fieldConfig": { "defaults": { "color": { @@ -285,23 +312,36 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:interval_lookup_hit_rates_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:interval_lookup_hit_rates_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (model_name) (rate(vllm:external_prefix_cache_hits_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\"}[$__rate_interval]))\n/\nclamp_min(sum by (model_name) (rate(vllm:external_prefix_cache_queries_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\"}[$__rate_interval])), 1)", "hide": false, "instant": false, - "legendFormat": "{{worker_id}}", + "legendFormat": "Connector Prefix Cache", "range": true, "refId": "A" } ], - "title": "Connector Interval Lookup Hit Rates", + "title": "Connector Prefix Cache Hit Rate", "type": "timeseries" }, + { + "collapsed": false, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 8 + }, + "id": 14, + "panels": [], + "title": "Direct Connector", + "type": "row" + }, { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "How many start_load_kv calls have data to load, per second. Always defined when loads are happening; no 0/0 NaN gaps.", + "description": "Wall-clock per-worker load throughput in GB/s, summed from load_bytes_total over time. Use engine and worker_rank filters to inspect specific DP engines or workers. UNLIKE the (per task) panel above, this view does correctly sum across workers.", "fieldConfig": { "defaults": { "color": { @@ -345,20 +385,20 @@ "mode": "absolute", "steps": [ { - "color": "green", + "color": "red", "value": null }, { "color": "yellow", - "value": 5000 + "value": 0.005 }, { - "color": "red", - "value": 10000 + "color": "green", + "value": 0.01 } ] }, - "unit": "ops" + "unit": "gbytes" }, "overrides": [] }, @@ -366,15 +406,12 @@ "h": 8, "w": 12, "x": 0, - "y": 8 + "y": 9 }, - "id": 2, + "id": 12, "options": { "legend": { - "calcs": [ - "lastNotNull", - "max" - ], + "calcs": [], "displayMode": "list", "maxHeight": "50%", "placement": "bottom", @@ -392,13 +429,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:load_requests_num_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "{{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:load_bytes_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])) / 1e9", + "legendFormat": "{{engine}} {{worker_rank}}", "range": true, "refId": "A" } ], - "title": "Connector Load Requests Rate", + "title": "Connector Load Bandwidth (aggregated)", "type": "timeseries" }, { @@ -406,7 +443,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "How many wait_for_save calls have data to save, per second.", + "description": "Wall-clock per-worker dump throughput in GB/s, summed from dump_bytes_total over time. Use engine and worker_rank filters to inspect specific DP engines or workers. UNLIKE the (per task) panel above, this view does correctly sum across workers.", "fieldConfig": { "defaults": { "color": { @@ -450,20 +487,20 @@ "mode": "absolute", "steps": [ { - "color": "green", + "color": "red", "value": null }, { "color": "yellow", - "value": 5000 + "value": 0.005 }, { - "color": "red", - "value": 10000 + "color": "green", + "value": 0.01 } ] }, - "unit": "ops" + "unit": "gbytes" }, "overrides": [] }, @@ -471,15 +508,12 @@ "h": 8, "w": 12, "x": 12, - "y": 8 + "y": 9 }, - "id": 3, + "id": 13, "options": { "legend": { - "calcs": [ - "lastNotNull", - "max" - ], + "calcs": [], "displayMode": "list", "maxHeight": "50%", "placement": "bottom", @@ -492,847 +526,19 @@ }, "targets": [ { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:save_requests_num_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "{{worker_id}}", - "range": true, - "refId": "A" - } - ], - "title": "Connector Save Requests Rate", - "type": "timeseries" - }, - { - "type": "timeseries", - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "id": 4, - "title": "Connector Load Requests Size Distribution", - "description": "p50 / p90 / p99 / avg number of requests per start_load_kv batch. Quiet windows naturally show no data.", - "gridPos": { - "h": 8, - "w": 12, - "x": 0, - "y": 16 - }, - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:load_requests_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", - "range": true, - "refId": "A" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:load_requests_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", - "range": true, - "refId": "B" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:load_requests_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", - "range": true, - "refId": "C" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:load_requests_num_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:load_requests_num_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", - "range": true, - "refId": "D" - } - ], - "options": { - "legend": { - "calcs": [ - "lastNotNull", - "max" - ], - "displayMode": "table", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - } - ] - }, - "unit": "" - }, - "overrides": [ - { - "matcher": { - "id": "byRegexp", - "options": "^p50.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p90.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 10, - 6 - ] - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p99.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 4, - 4 - ] - } - }, - { - "id": "custom.lineWidth", - "value": 2 - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^avg.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - }, - { - "id": "custom.lineWidth", - "value": 1 - }, - { - "id": "custom.fillOpacity", - "value": 5 - } - ] - } - ] - } - }, - { - "type": "timeseries", - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "id": 5, - "title": "Connector Load Blocks Size Distribution", - "description": "p50 / p90 / p99 / avg number of blocks per start_load_kv batch.", - "gridPos": { - "h": 8, - "w": 12, - "x": 12, - "y": 16 - }, - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:load_blocks_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", - "range": true, - "refId": "A" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:load_blocks_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", - "range": true, - "refId": "B" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:load_blocks_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", - "range": true, - "refId": "C" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:load_blocks_num_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:load_blocks_num_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", - "range": true, - "refId": "D" - } - ], - "options": { - "legend": { - "calcs": [ - "lastNotNull", - "max" - ], - "displayMode": "table", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - } - ] - }, - "unit": "" - }, - "overrides": [ - { - "matcher": { - "id": "byRegexp", - "options": "^p50.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p90.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 10, - 6 - ] - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p99.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 4, - 4 - ] - } - }, - { - "id": "custom.lineWidth", - "value": 2 - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^avg.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - }, - { - "id": "custom.lineWidth", - "value": 1 - }, - { - "id": "custom.fillOpacity", - "value": 5 - } - ] - } - ] - } - }, - { - "type": "timeseries", - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "id": 6, - "title": "Connector Save Requests Size Distribution", - "description": "p50 / p90 / p99 / avg number of requests per wait_for_save batch.", - "gridPos": { - "h": 8, - "w": 12, - "x": 0, - "y": 24 - }, - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_requests_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", - "range": true, - "refId": "A" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_requests_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", - "range": true, - "refId": "B" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_requests_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", - "range": true, - "refId": "C" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:save_requests_num_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_requests_num_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", - "range": true, - "refId": "D" - } - ], - "options": { - "legend": { - "calcs": [ - "lastNotNull", - "max" - ], - "displayMode": "table", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - } - ] - }, - "unit": "" - }, - "overrides": [ - { - "matcher": { - "id": "byRegexp", - "options": "^p50.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p90.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 10, - 6 - ] - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p99.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 4, - 4 - ] - } - }, - { - "id": "custom.lineWidth", - "value": 2 - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^avg.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - }, - { - "id": "custom.lineWidth", - "value": 1 - }, - { - "id": "custom.fillOpacity", - "value": 5 - } - ] - } - ] - } - }, - { - "type": "timeseries", - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "id": 7, - "title": "Connector Save Blocks Size Distribution", - "description": "p50 / p90 / p99 / avg number of blocks per wait_for_save batch.", - "gridPos": { - "h": 8, - "w": 12, - "x": 12, - "y": 24 - }, - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_blocks_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", - "range": true, - "refId": "A" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_blocks_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", - "range": true, - "refId": "B" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_blocks_num_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", - "range": true, - "refId": "C" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:save_blocks_num_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_blocks_num_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", - "range": true, - "refId": "D" - } - ], - "options": { - "legend": { - "calcs": [ - "lastNotNull", - "max" - ], - "displayMode": "table", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - } - ] - }, - "unit": "" - }, - "overrides": [ - { - "matcher": { - "id": "byRegexp", - "options": "^p50.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p90.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 10, - 6 - ] - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p99.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 4, - 4 - ] - } - }, - { - "id": "custom.lineWidth", - "value": 2 - } - ] + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" }, - { - "matcher": { - "id": "byRegexp", - "options": "^avg.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - }, - { - "id": "custom.lineWidth", - "value": 1 - }, - { - "id": "custom.fillOpacity", - "value": 5 - } - ] - } - ] - } + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:save_bytes_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])) / 1e9", + "legendFormat": "{{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + } + ], + "title": "Connector Dump Bandwidth (aggregated)", + "type": "timeseries" }, { "datasource": { @@ -1499,7 +705,7 @@ "h": 8, "w": 12, "x": 0, - "y": 32 + "y": 17 }, "id": 8, "options": { @@ -1525,8 +731,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1536,8 +742,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1547,8 +753,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p95 {{worker_id}}", + "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p95 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -1558,8 +764,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:load_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "D" }, @@ -1569,8 +775,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:load_duration_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:load_duration_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:load_duration_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:load_duration_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "E" } @@ -1583,7 +789,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "P50, P90, P95, P99 and Average load speed in GB/s for each start_load_kv. Per-call observed load speed in GB/s, sampled once per start_load_kv invocation (size_in_call / duration_of_call). Toggle View=Aggregated pools observations across workers into one distribution; the quantile is still 'typical single-call speed', NOT a sum. Use the (aggregated) panel below for true summed throughput across workers.", + "description": "P50, P90, P95, P99 and Average dump duration in milliseconds for each wait_for_save.", "fieldConfig": { "defaults": { "color": { @@ -1627,20 +833,20 @@ "mode": "absolute", "steps": [ { - "color": "red", + "color": "green", "value": null }, { "color": "yellow", - "value": 0.005 + "value": 8000 }, { - "color": "green", - "value": 0.01 + "color": "red", + "value": 15000 } ] }, - "unit": "gb/s" + "unit": "ms" }, "overrides": [ { @@ -1743,9 +949,9 @@ "h": 8, "w": 12, "x": 12, - "y": 32 + "y": 17 }, - "id": 9, + "id": 10, "options": { "legend": { "calcs": [ @@ -1769,8 +975,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1780,8 +986,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1791,8 +997,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p95 {{worker_id}}", + "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p95 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -1802,8 +1008,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "D" }, @@ -1813,13 +1019,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:load_speed_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:load_speed_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:save_duration_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_duration_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "E" } ], - "title": "Connector Load Speed (per task)", + "title": "Connector Dump Duration", "type": "timeseries" }, { @@ -1827,7 +1033,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "P50, P90, P95, P99 and Average save duration in milliseconds for each save_kv.", + "description": "P50, P90, P95, P99 and Average load speed in GB/s for each start_load_kv. Per-call observed load speed in GB/s, sampled once per start_load_kv invocation (size_in_call / duration_of_call). Use engine and worker_rank filters to inspect specific DP engines or workers.", "fieldConfig": { "defaults": { "color": { @@ -1871,20 +1077,20 @@ "mode": "absolute", "steps": [ { - "color": "green", + "color": "red", "value": null }, { "color": "yellow", - "value": 8000 + "value": 0.005 }, { - "color": "red", - "value": 15000 + "color": "green", + "value": 0.01 } ] }, - "unit": "ms" + "unit": "gb/s" }, "overrides": [ { @@ -1987,9 +1193,9 @@ "h": 8, "w": 12, "x": 0, - "y": 40 + "y": 25 }, - "id": 10, + "id": 9, "options": { "legend": { "calcs": [ @@ -2013,8 +1219,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2024,8 +1230,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -2035,8 +1241,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p95 {{worker_id}}", + "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p95 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -2046,8 +1252,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_duration_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:load_speed_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "D" }, @@ -2057,13 +1263,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:save_duration_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_duration_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:load_speed_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:load_speed_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "E" } ], - "title": "Connector Save Duration", + "title": "Connector Load Speed (per task)", "type": "timeseries" }, { @@ -2071,7 +1277,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "P50, P90, P95, P99 and Average save speed in GB/s for each save_kv. Per-call observed save speed in GB/s, sampled once per wait_for_save invocation (size_in_call / duration_of_call). Toggle View=Aggregated pools observations across workers into one distribution; the quantile is still 'typical single-call speed', NOT a sum. Use the (aggregated) panel below for true summed throughput across workers.", + "description": "P50, P90, P95, P99 and Average blocking wait duration in milliseconds while confirming async UCM connector dump completion.", "fieldConfig": { "defaults": { "color": { @@ -2120,15 +1326,15 @@ }, { "color": "yellow", - "value": 0.004 + "value": 50 }, { "color": "green", - "value": 0.008 + "value": 100 } ] }, - "unit": "gb/s" + "unit": "ms" }, "overrides": [ { @@ -2231,7 +1437,7 @@ "h": 8, "w": 12, "x": 12, - "y": 40 + "y": 25 }, "id": 11, "options": { @@ -2257,8 +1463,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2268,8 +1474,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -2279,8 +1485,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:save_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p95 {{worker_id}}", + "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p95 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -2290,8 +1496,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_speed_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "D" }, @@ -2301,217 +1507,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:save_speed_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_speed_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:save_completion_wait_duration_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_completion_wait_duration_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "E" } ], - "title": "Connector Save Speed (per task)", - "type": "timeseries" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "description": "Wall-clock per-worker load throughput in GB/s, summed from load_bytes_total over time. Toggle View=Aggregated to see the service-level sum; toggle View=Per Worker to break it down by worker_id. UNLIKE the (per task) panel above, this view does correctly sum across workers when Aggregated.", - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "red", - "value": null - }, - { - "color": "yellow", - "value": 0.005 - }, - { - "color": "green", - "value": 0.01 - } - ] - }, - "unit": "gbytes" - }, - "overrides": [] - }, - "gridPos": { - "h": 8, - "w": 12, - "x": 0, - "y": 48 - }, - "id": 12, - "options": { - "legend": { - "calcs": [], - "displayMode": "list", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:load_bytes_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])) / 1e9", - "legendFormat": "{{worker_id}}", - "range": true, - "refId": "A" - } - ], - "title": "Connector Load Bandwidth (aggregated)", - "type": "timeseries" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "description": "Wall-clock per-worker save throughput in GB/s, summed from save_bytes_total over time. Toggle View=Aggregated to see the service-level sum; toggle View=Per Worker to break it down by worker_id. UNLIKE the (per task) panel above, this view does correctly sum across workers when Aggregated.", - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "red", - "value": null - }, - { - "color": "yellow", - "value": 0.005 - }, - { - "color": "green", - "value": 0.01 - } - ] - }, - "unit": "gbytes" - }, - "overrides": [] - }, - "gridPos": { - "h": 8, - "w": 12, - "x": 12, - "y": 48 - }, - "id": 13, - "options": { - "legend": { - "calcs": [], - "displayMode": "list", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } - }, - "targets": [ - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:save_bytes_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])) / 1e9", - "legendFormat": "{{worker_id}}", - "range": true, - "refId": "A" - } - ], - "title": "Connector Save Bandwidth (aggregated)", + "title": "Connector Dump Completion Wait Duration", "type": "timeseries" } ] diff --git a/examples/metrics/grafana_layerwise.json b/examples/metrics/grafana_layerwise.json index 12e91098f..b80164d17 100644 --- a/examples/metrics/grafana_layerwise.json +++ b/examples/metrics/grafana_layerwise.json @@ -109,7 +109,6 @@ "value": "model_name" }, "hide": 0, - "includeAll": false, "label": "View", "multi": false, "name": "perWorker", @@ -122,10 +121,10 @@ { "selected": false, "text": "Per Worker", - "value": "model_name, worker_id" + "value": "model_name, engine, worker_rank" } ], - "query": "Aggregated : model_name,Per Worker : model_name\\, worker_id", + "query": "Aggregated : model_name,Per Worker : model_name\\, engine\\, worker_rank", "queryValue": "", "skipUrlSync": false, "type": "custom" @@ -141,15 +140,43 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, worker_id)", + "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, engine)", + "hide": 0, + "includeAll": true, + "label": "engine", + "multi": false, + "name": "engine", + "options": [], + "query": { + "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, engine)", + "refId": "StandardVariableQuery" + }, + "refresh": 1, + "regex": "", + "skipUrlSync": false, + "sort": 2, + "type": "query" + }, + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\", engine=~\"$engine\"}, worker_rank)", "hide": 0, "includeAll": true, - "label": "worker_id", + "label": "worker_rank", "multi": false, - "name": "worker_id", + "name": "worker_rank", "options": [], "query": { - "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, worker_id)", + "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\", engine=~\"$engine\"}, worker_rank)", "refId": "StandardVariableQuery" }, "refresh": 1, @@ -168,21 +195,21 @@ "timezone": "", "weekStart": "", "refresh": "", - "title": "vLLM - UCM Layerwise", - "uid": "ucm-layerwise", + "title": "vLLM - UCM Layerwise (vLLM Metrics)", + "uid": "ucm-vllm-layerwise", "version": 1, "id": null, "tags": [ - "ucm", + "ucm-vllm-connector-metrics", "layerwise" ], - "description": "UCM observability dashboard - layerwise module. Part of a set; the other UCM dashboards (tag: ucm) are linked from the header.", + "description": "UCM observability dashboard - layerwise module. Part of a set; the other UCM dashboards (tag: ucm-vllm-connector-metrics) are linked from the header.", "links": [ { "type": "dashboards", - "title": "Other UCM dashboards", + "title": "Other UCM vLLM metrics dashboards", "tags": [ - "ucm" + "ucm-vllm-connector-metrics" ], "asDropdown": true, "includeVars": true, @@ -197,7 +224,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Wall-clock time from start_load_kv() entry to wait_for_save() return for one layerwise batch. Includes forward intervals, load waits, and save tail; not pure storage latency. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Per-second rate of layerwise batches split by whether the batch had load work, save work, or both. Use this before comparing averages across batch categories.", "fieldConfig": { "defaults": { "color": { @@ -212,7 +239,7 @@ "barAlignment": 0, "barWidthFactor": 0.6, "drawStyle": "line", - "fillOpacity": 0, + "fillOpacity": 20, "gradientMode": "none", "hideFrom": { "legend": false, @@ -230,7 +257,7 @@ "spanNulls": 60000, "stacking": { "group": "A", - "mode": "none" + "mode": "normal" }, "thresholdsStyle": { "mode": "off" @@ -246,7 +273,7 @@ } ] }, - "unit": "ms" + "unit": "ops" }, "overrides": [ { @@ -333,7 +360,7 @@ "x": 0, "y": 0 }, - "id": 11, + "id": 12, "options": { "legend": { "calcs": [ @@ -357,8 +384,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "load only {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -368,8 +395,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "save only {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -379,24 +406,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "load + save {{engine}} {{worker_rank}}", "range": true, "refId": "C" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", - "range": true, - "refId": "D" } ], - "title": "Layerwise / Batch Total Duration", + "title": "Layerwise / Batch Mix Rate", "type": "timeseries" }, { @@ -404,7 +420,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time wait_for_layer_load() actually blocks while waiting for the current layer load tasks to finish. Near 0 means the forward pass successfully hid load latency; large values mean layerwise loading is visible on the critical path. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Total batch duration for batches with load work and no save work. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -450,14 +466,6 @@ { "color": "green", "value": null - }, - { - "color": "yellow", - "value": 5 - }, - { - "color": "red", - "value": 20 } ] }, @@ -544,11 +552,11 @@ }, "gridPos": { "h": 8, - "w": 24, + "w": 12, "x": 0, "y": 8 }, - "id": 1, + "id": 13, "options": { "legend": { "calcs": [ @@ -572,8 +580,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -583,8 +591,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -594,8 +602,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -605,13 +613,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Layerwise / wait_for_layer_load Blocking", + "title": "Layerwise / Batch Duration - Load Only", "type": "timeseries" }, { @@ -619,7 +627,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time from the previous wait_for_layer_load() return to the current wait_for_layer_load() entry. This approximates the forward/save window available to hide the next asynchronous load. Compare it with wait_for_layer_load blocking time. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Total batch duration for batches with save work and no load work. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -751,11 +759,11 @@ }, "gridPos": { "h": 8, - "w": 24, - "x": 0, - "y": 16 + "w": 12, + "x": 12, + "y": 8 }, - "id": 3, + "id": 14, "options": { "legend": { "calcs": [ @@ -779,8 +787,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -790,8 +798,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -801,8 +809,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -812,13 +820,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Layerwise / wait_for_layer_load Inter-Call Interval", + "title": "Layerwise / Batch Duration - Save Only", "type": "timeseries" }, { @@ -826,7 +834,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Total wait_for_save() wall-clock duration at the end of forward. This is the final save wait that cannot be overlapped with the current forward pass. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Total batch duration for batches that include both load and save work. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -872,14 +880,6 @@ { "color": "green", "value": null - }, - { - "color": "yellow", - "value": 10 - }, - { - "color": "red", - "value": 50 } ] }, @@ -968,9 +968,9 @@ "h": 8, "w": 12, "x": 0, - "y": 24 + "y": 16 }, - "id": 6, + "id": 15, "options": { "legend": { "calcs": [ @@ -994,8 +994,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_save_tail_total_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1005,8 +1005,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_save_tail_total_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1016,8 +1016,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_save_tail_total_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -1027,82 +1027,1272 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_save_tail_total_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_save_tail_total_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Layerwise / wait_for_save Total", + "title": "Layerwise / Batch Duration - Load + Save", "type": "timeseries" }, { - "collapsed": true, + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Stacked average batch duration for load-only batches. Load wait is the batch-level sum of wait_for_layer_load() blocking; other is the remaining batch time.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 25, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "normal" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [] + }, "gridPos": { - "h": 1, - "w": 24, + "h": 8, + "w": 12, "x": 0, - "y": 32 + "y": 24 }, - "id": 10, - "panels": [ + "id": 17, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time spent in start_load_kv() submitting first-layer asynchronous load tasks. This is a submit-path diagnostic, not storage or transfer latency; high values indicate many requests/blocks, synchronous submit overhead, or submit-side contention. Series: p50 / p90 / p99 / avg by selected view.", - "fieldConfig": { - "defaults": { - "color": { - "mode": "palette-classic" - }, - "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", - "hideFrom": { - "legend": false, - "tooltip": false, - "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" - } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - } - ] - }, - "unit": "ms" + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "load wait {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "clamp_min((sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))) - (sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))), 0)", + "legendFormat": "other {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + } + ], + "title": "Layerwise / Load Only Batch Avg Breakdown", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Stacked average batch duration for save-only batches. Save tail is wait_for_save() duration from the same save-only batch population; other is the remaining batch time.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 25, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false }, - "overrides": [ + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "normal" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 24 + }, + "id": 18, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_save_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_save_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "wait_for_save {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "clamp_min((sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_save_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))) - (sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_save_only_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_save_only_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))), 0)", + "legendFormat": "other {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + } + ], + "title": "Layerwise / Save Only Batch Avg Breakdown", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Stacked average batch duration for batches with both load and save work. All stack components are batch-level averages over the same batch category.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 25, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "normal" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 32 + }, + "id": 19, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_save_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "load wait {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_load_save_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "wait_for_save {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "clamp_min((sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_total_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))) - (sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_save_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_load_wait_total_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))) - (sum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_load_save_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_batch_save_tail_load_save_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))), 0)", + "legendFormat": "other {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + } + ], + "title": "Layerwise / Load + Save Batch Avg Breakdown", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time wait_for_layer_load() actually blocks while waiting for the current layer load tasks to finish. Near 0 means the forward pass successfully hid load latency; large values mean layerwise loading is visible on the critical path. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "yellow", + "value": 5 + }, + { + "color": "red", + "value": 20 + } + ] + }, + "unit": "ms" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 48 + }, + "id": 1, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_wait_blocking_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Layerwise / wait_for_layer_load Blocking", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time from the previous wait_for_layer_load() return to the current wait_for_layer_load() entry. This approximates the forward/save window available to hide the next asynchronous load. Compare it with wait_for_layer_load blocking time. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 56 + }, + "id": 3, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_inter_wait_interval_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Layerwise / wait_for_layer_load Inter-Call Interval", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time get_finished(), preemption flush, or the layerwise polling path actually blocks while waiting for async dump completion. Uses the same metric as the connector dump completion wait panel.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "yellow", + "value": 10 + }, + { + "color": "red", + "value": 50 + } + ] + }, + "unit": "ms" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 64 + }, + "id": 6, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p95 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:save_completion_wait_duration_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:save_completion_wait_duration_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:save_completion_wait_duration_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "E" + } + ], + "title": "Layerwise / Dump Completion Wait Duration", + "type": "timeseries" + }, + { + "collapsed": true, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 72 + }, + "id": 10, + "panels": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time spent in start_load_kv() submitting first-layer asynchronous load tasks. This is a submit-path diagnostic, not storage or transfer latency; high values indicate many requests/blocks, synchronous submit overhead, or submit-side contention. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 73 + }, + "id": 4, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Layerwise / start_load_kv First Layer Submit", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time spent in wait_for_layer_load() submitting the next layer asynchronous load tasks after the current layer wait. This is a submit-path diagnostic, not storage or transfer latency, and should normally stay small. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [ { "matcher": { "id": "byRegexp", @@ -1184,10 +2374,10 @@ "gridPos": { "h": 8, "w": 12, - "x": 0, - "y": 33 + "x": 12, + "y": 73 }, - "id": 4, + "id": 5, "options": { "legend": { "calcs": [ @@ -1211,205 +2401,422 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Layerwise / wait_for_layer_load Next Layer Submit", + "type": "timeseries" + } + ], + "title": "Layerwise -- Submit Diagnostics", + "type": "row" + }, + { + "type": "row", + "collapsed": true, + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 8, + "title": "Layerwise -- Distributions", + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 81 + }, + "panels": [ + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 20, + "title": "Layerwise / Batch Duration - Load Only Distribution", + "description": "Distribution heatmap for load-only batch total duration. Cell color/value is request rate from rate(bucket), not raw bucket count.", + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 82 + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (le) (rate(ucm:layerwise_batch_total_load_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", + "range": true, + "refId": "A" + } + ], + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false + }, + "rowsFrame": { + "layout": "auto", + "value": "Request rate" + }, + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 + }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } + }, + "fieldConfig": { + "defaults": { + "custom": { + "scaleDistribution": { + "type": "linear" + }, + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + } + }, + "overrides": [] + }, + "pluginVersion": "10.4.0" + }, + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 21, + "title": "Layerwise / Batch Duration - Save Only Distribution", + "description": "Distribution heatmap for save-only batch total duration. Cell color/value is request rate from rate(bucket), not raw bucket count.", + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 90 + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (le) (rate(ucm:layerwise_batch_total_save_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", + "range": true, + "refId": "A" + } + ], + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false + }, + "rowsFrame": { + "layout": "auto", + "value": "Request rate" + }, + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 + }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } + }, + "fieldConfig": { + "defaults": { + "custom": { + "scaleDistribution": { + "type": "linear" + }, + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + } + }, + "overrides": [] + }, + "pluginVersion": "10.4.0" + }, + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 22, + "title": "Layerwise / Batch Duration - Load + Save Distribution", + "description": "Distribution heatmap for load-and-save batch total duration. Cell color/value is request rate from rate(bucket), not raw bucket count.", + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 98 + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (le) (rate(ucm:layerwise_batch_total_load_save_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", "range": true, "refId": "A" + } + ], + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", - "range": true, - "refId": "B" + "rowsFrame": { + "layout": "auto", + "value": "Request rate" }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", - "range": true, - "refId": "C" + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 + }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } + }, + "fieldConfig": { + "defaults": { + "custom": { + "scaleDistribution": { + "type": "linear" + }, + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + } }, + "overrides": [] + }, + "pluginVersion": "10.4.0" + }, + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 24, + "title": "Layerwise / Load Only Load Wait Total Distribution", + "description": "Distribution heatmap for batch-level total wait_for_layer_load() blocking time in load-only batches. Cell color/value is request rate from rate(bucket), not raw bucket count.", + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 114 + }, + "targets": [ { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_first_layer_submit_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (le) (rate(ucm:layerwise_batch_load_wait_total_load_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", "range": true, - "refId": "D" + "refId": "A" } ], - "title": "Layerwise / start_load_kv First Layer Submit", - "type": "timeseries" - }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false + }, + "rowsFrame": { + "layout": "auto", + "value": "Request rate" + }, + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 + }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } }, - "description": "Time spent in wait_for_layer_load() submitting the next layer asynchronous load tasks after the current layer wait. This is a submit-path diagnostic, not storage or transfer latency, and should normally stay small. Series: p50 / p90 / p99 / avg by selected view.", "fieldConfig": { "defaults": { - "color": { - "mode": "palette-classic" - }, "custom": { - "axisBorderShow": false, - "axisCenteredZero": false, - "axisColorMode": "text", - "axisLabel": "", - "axisPlacement": "auto", - "barAlignment": 0, - "barWidthFactor": 0.6, - "drawStyle": "line", - "fillOpacity": 0, - "gradientMode": "none", + "scaleDistribution": { + "type": "linear" + }, "hideFrom": { "legend": false, "tooltip": false, "viz": false - }, - "insertNulls": false, - "lineInterpolation": "linear", - "lineWidth": 1, - "pointSize": 5, - "scaleDistribution": { - "type": "linear" - }, - "showPoints": "auto", - "spanNulls": 60000, - "stacking": { - "group": "A", - "mode": "none" - }, - "thresholdsStyle": { - "mode": "off" } - }, - "mappings": [], - "thresholds": { - "mode": "absolute", - "steps": [ - { - "color": "green", - "value": null - } - ] - }, - "unit": "ms" - }, - "overrides": [ - { - "matcher": { - "id": "byRegexp", - "options": "^p50.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p90.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 10, - 6 - ] - } - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^p99.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "dash", - "dash": [ - 4, - 4 - ] - } - }, - { - "id": "custom.lineWidth", - "value": 2 - } - ] - }, - { - "matcher": { - "id": "byRegexp", - "options": "^avg.*" - }, - "properties": [ - { - "id": "custom.lineStyle", - "value": { - "fill": "solid" - } - }, - { - "id": "custom.lineWidth", - "value": 1 - }, - { - "id": "custom.fillOpacity", - "value": 5 - } - ] } - ] + }, + "overrides": [] }, + "pluginVersion": "10.4.0" + }, + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 25, + "title": "Layerwise / Load + Save Load Wait Total Distribution", + "description": "Distribution heatmap for batch-level total wait_for_layer_load() blocking time in load-and-save batches. Cell color/value is request rate from rate(bucket), not raw bucket count.", "gridPos": { "h": 8, - "w": 12, - "x": 12, - "y": 33 - }, - "id": 5, - "options": { - "legend": { - "calcs": [ - "lastNotNull", - "max" - ], - "displayMode": "table", - "maxHeight": "50%", - "placement": "bottom", - "showLegend": true - }, - "tooltip": { - "mode": "multi", - "sort": "desc" - } + "w": 24, + "x": 0, + "y": 122 }, "targets": [ { @@ -1418,68 +2825,239 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "sum by (le) (rate(ucm:layerwise_batch_load_wait_total_load_save_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", "range": true, "refId": "A" + } + ], + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false }, - { - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", - "range": true, - "refId": "B" + "rowsFrame": { + "layout": "auto", + "value": "Request rate" + }, + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } + }, + "fieldConfig": { + "defaults": { + "custom": { + "scaleDistribution": { + "type": "linear" + }, + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + } + }, + "overrides": [] + }, + "pluginVersion": "10.4.0" + }, + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 26, + "title": "Layerwise / Save Only wait_for_save Distribution", + "description": "Distribution heatmap for wait_for_save() tail duration in save-only batches. Cell color/value is request rate from rate(bucket), not raw bucket count.", + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 130 + }, + "targets": [ { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "sum by (le) (rate(ucm:layerwise_batch_save_tail_save_only_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", "range": true, - "refId": "C" + "refId": "A" + } + ], + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false + }, + "rowsFrame": { + "layout": "auto", + "value": "Request rate" + }, + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 + }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } + }, + "fieldConfig": { + "defaults": { + "custom": { + "scaleDistribution": { + "type": "linear" + }, + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + } }, + "overrides": [] + }, + "pluginVersion": "10.4.0" + }, + { + "type": "heatmap", + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "id": 27, + "title": "Layerwise / Load + Save wait_for_save Distribution", + "description": "Distribution heatmap for wait_for_save() tail duration in load-and-save batches. Cell color/value is request rate from rate(bucket), not raw bucket count.", + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 138 + }, + "targets": [ { "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:layerwise_next_layer_submit_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (le) (rate(ucm:layerwise_batch_save_tail_load_save_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "format": "heatmap", + "legendFormat": "{{le}}", "range": true, - "refId": "D" + "refId": "A" } ], - "title": "Layerwise / wait_for_layer_load Next Layer Submit", - "type": "timeseries" - } - ], - "title": "Layerwise -- Submit Diagnostics", - "type": "row" - }, - { - "type": "row", - "collapsed": true, - "datasource": { - "type": "prometheus", - "uid": "${DS_PROMETHEUS}" - }, - "id": 8, - "title": "Layerwise -- Distributions", - "gridPos": { - "h": 1, - "w": 24, - "x": 0, - "y": 41 - }, - "panels": [ + "options": { + "calculate": false, + "yAxis": { + "axisPlacement": "left", + "unit": "s", + "decimals": 0, + "reverse": false + }, + "rowsFrame": { + "layout": "auto", + "value": "Request rate" + }, + "color": { + "mode": "scheme", + "scheme": "Spectral", + "fill": "dark-orange", + "exponent": 0.5, + "steps": 64, + "reverse": false + }, + "cellGap": 1, + "cellValues": { + "unit": "suffix: req/s" + }, + "filterValues": { + "le": 1e-09 + }, + "tooltip": { + "show": true, + "yHistogram": true, + "mode": "single" + }, + "legend": { + "show": true + }, + "exemplars": { + "color": "rgba(255,0,255,0.7)" + } + }, + "fieldConfig": { + "defaults": { + "custom": { + "scaleDistribution": { + "type": "linear" + }, + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + } + } + }, + "overrides": [] + }, + "pluginVersion": "10.4.0" + }, { "type": "heatmap", "datasource": { @@ -1493,7 +3071,7 @@ "h": 8, "w": 24, "x": 0, - "y": 42 + "y": 146 }, "targets": [ { @@ -1502,7 +3080,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:layerwise_wait_blocking_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -1534,7 +3112,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, diff --git a/examples/metrics/grafana_pipeline_store.json b/examples/metrics/grafana_pipeline_store.json index ea1f62495..e9b84c9cc 100644 --- a/examples/metrics/grafana_pipeline_store.json +++ b/examples/metrics/grafana_pipeline_store.json @@ -109,7 +109,6 @@ "value": "model_name" }, "hide": 0, - "includeAll": false, "label": "View", "multi": false, "name": "perWorker", @@ -122,10 +121,10 @@ { "selected": false, "text": "Per Worker", - "value": "model_name, worker_id" + "value": "model_name, engine, worker_rank" } ], - "query": "Aggregated : model_name,Per Worker : model_name\\, worker_id", + "query": "Aggregated : model_name,Per Worker : model_name\\, engine\\, worker_rank", "queryValue": "", "skipUrlSync": false, "type": "custom" @@ -141,15 +140,43 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, worker_id)", + "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, engine)", + "hide": 0, + "includeAll": true, + "label": "engine", + "multi": false, + "name": "engine", + "options": [], + "query": { + "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, engine)", + "refId": "StandardVariableQuery" + }, + "refresh": 1, + "regex": "", + "skipUrlSync": false, + "sort": 2, + "type": "query" + }, + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "definition": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\", engine=~\"$engine\"}, worker_rank)", "hide": 0, "includeAll": true, - "label": "worker_id", + "label": "worker_rank", "multi": false, - "name": "worker_id", + "name": "worker_rank", "options": [], "query": { - "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\"}, worker_id)", + "query": "label_values({__name__=~\"ucm:.*\", job=~\"$job\", model_name=\"$model_name\", engine=~\"$engine\"}, worker_rank)", "refId": "StandardVariableQuery" }, "refresh": 1, @@ -168,21 +195,21 @@ "timezone": "", "weekStart": "", "refresh": "", - "title": "vLLM - UCM Cache / Posix Store", - "uid": "ucm-pipeline-store", + "title": "vLLM - UCM Cache / Posix Store (vLLM Metrics)", + "uid": "ucm-vllm-pipeline-store", "version": 3, "id": null, "tags": [ - "ucm", + "ucm-vllm-connector-metrics", "pipeline-store" ], - "description": "UCM observability dashboard - cache/posix store module. Part of a set; the other UCM dashboards (tag: ucm) are linked from the header.", + "description": "UCM observability dashboard - cache/posix store module. Part of a set; the other UCM dashboards (tag: ucm-vllm-connector-metrics) are linked from the header.", "links": [ { "type": "dashboards", - "title": "Other UCM dashboards", + "title": "Other UCM vLLM metrics dashboards", "tags": [ - "ucm" + "ucm-vllm-connector-metrics" ], "asDropdown": true, "includeVars": true, @@ -298,10 +325,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_backend_shards_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/ clamp_min(\n sum by (${perWorker:raw}) (rate(ucm:cache_load_shards_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])), 1\n)", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_backend_shards_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/ clamp_min(\n sum by (${perWorker:raw}) (rate(ucm:cache_load_shards_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])), 1\n)", "hide": false, "instant": false, - "legendFormat": "{{worker_id}}", + "legendFormat": "{{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -314,7 +341,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Cache load task rate derived from rate(cache_load_duration_ms_count). One count is one Cache load task. Series by selected view.", + "description": "Cache load task rate derived from rate(cache_load_duration_ms_count). One count is one Cache load task. Series split by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -394,10 +421,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "hide": false, "instant": false, - "legendFormat": "load {{worker_id}}", + "legendFormat": "load {{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -410,7 +437,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Cache dump task rate derived from rate(cache_dump_duration_ms_count). One count is one Cache dump task. Series by selected view.", + "description": "Cache dump task rate derived from rate(cache_dump_duration_ms_count). One count is one Cache dump task. Series split by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -490,10 +517,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "hide": false, "instant": false, - "legendFormat": "dump {{worker_id}}", + "legendFormat": "dump {{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -666,8 +693,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -677,8 +704,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -688,8 +715,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -699,8 +726,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -873,8 +900,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -884,8 +911,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -895,8 +922,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -906,8 +933,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -920,7 +947,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Stacked average Cache load task duration. Uses task-level queue wait and dispatch metrics; WaitBackendTask + H2D is calculated as cache_load_duration_ms avg minus queue wait avg minus dispatch avg. Per-shard detail metrics are shown in detail panels and are intentionally not stacked here. Uses a fixed 40s rate window for stability.", + "description": "Stacked average Cache load task duration. Queue wait, backend dispatch, and H2D sync are task-level phase metrics; backend wait & other is calculated as cache_load_duration_ms avg minus those phase averages. Per-shard backend wait and H2D submit cost are detail metrics and are intentionally not stacked here. Uses a fixed 40s rate window for stability.", "fieldConfig": { "defaults": { "color": { @@ -1003,8 +1030,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "queue wait {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "queue wait {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1014,8 +1041,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "dispatch {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "backend dispatch {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1025,10 +1052,21 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n ),\n 0\n)", - "legendFormat": "WaitBackendTask + H2D {{worker_id}}", + "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n(\n sum by (${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n or\n (\n 0 * sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n ),\n 0\n)", + "legendFormat": "backend wait & other {{engine}} {{worker_rank}}", "range": true, "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "(\n sum by (${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n)\nor\n(\n 0 * sum by (${perWorker:raw}) (rate(ucm:cache_load_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n)", + "legendFormat": "H2D sync {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" } ], "title": "Cache Load Avg Breakdown", @@ -1039,7 +1077,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Stacked average Cache dump task duration. Queue wait, mkbuf, D2H, and backend submit are task-level phase metrics; Other / epilog is calculated as cache_dump_duration_ms avg minus those phase averages. Uses a fixed 40s rate window for stability.", + "description": "Stacked average Cache dump task duration. Queue wait, mkbuf, D2H sync including compute wait, and backend submit are task-level phase metrics; other is calculated as cache_dump_duration_ms avg minus those phase averages. Uses a fixed 40s rate window for stability.", "fieldConfig": { "defaults": { "color": { @@ -1122,8 +1160,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "queue wait {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "queue wait {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1133,8 +1171,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "mkbuf {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "mkbuf {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1144,10 +1182,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "D2H {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "D2H sync (include wait compute) {{engine}} {{worker_rank}}", "range": true, - "refId": "C" + "refId": "D" }, { "datasource": { @@ -1155,10 +1193,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "backend submit {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "backend submit {{engine}} {{worker_rank}}", "range": true, - "refId": "D" + "refId": "E" }, { "datasource": { @@ -1166,10 +1204,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n ),\n 0\n)", - "legendFormat": "Other / epilog {{worker_id}}", + "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n ),\n 0\n)", + "legendFormat": "other {{engine}} {{worker_rank}}", "range": true, - "refId": "E" + "refId": "F" } ], "title": "Cache Dump Avg Breakdown", @@ -1229,17 +1267,909 @@ } ] }, - "unit": "gbytes" - }, - "overrides": [] + "unit": "gbytes" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 33 + }, + "id": 27, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_bytes_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])) / 1e9", + "legendFormat": "{{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + } + ], + "title": "Cache Load Bandwidth (aggregated)", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Wall-clock per-worker Cache dump throughput in GB/s. Aggregates all concurrent Cache dump tasks inside one worker process and includes idle gaps, so this is the real bytes/sec the worker is pushing through the Cache stage. Pair with (per task) above: high per-task with low per-worker means tasks are fast but sparse.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "gbytes" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 33 + }, + "id": 28, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_bytes_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])) / 1e9", + "legendFormat": "{{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + } + ], + "title": "Cache Dump Bandwidth (aggregated)", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Pure Cache load H2D copy bandwidth in GB/s, calculated from copied bytes divided by cache_h2d_sync_ms. This excludes queue time and backend readiness wait. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "gbytes" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 49 + }, + "id": 57, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_h2d_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_h2d_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Cache Load H2D Bandwidth (per task)", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Effective Cache dump D2H bandwidth in GB/s, calculated from copied bytes divided by cache_d2h_duration_ms. The duration includes stream synchronize wait, including prerequisite compute wait. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "gbytes" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 49 + }, + "id": 55, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_d2h_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_d2h_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Cache Dump D2H Bandwidth (per task)", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time Cache load tasks spend queued before dispatch worker pickup. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 57 + }, + "id": 35, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "desc" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" + } + ], + "title": "Cache Load Queue Wait Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Time Cache dump tasks spend queued before dispatch worker pickup. This is the main signal for explaining gaps between cache_dump_duration_ms and the mkbuf/D2H/backend-submit phase metrics. Series: p50 / p90 / p99 / avg by engine and worker_rank.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] }, "gridPos": { "h": 8, "w": 12, - "x": 0, - "y": 33 + "x": 12, + "y": 57 }, - "id": 27, + "id": 45, "options": { "legend": { "calcs": [ @@ -1263,13 +2193,46 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_bytes_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])) / 1e9", - "legendFormat": "{{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" } ], - "title": "Cache Load Bandwidth (aggregated)", + "title": "Cache Dump Queue Wait Duration", "type": "timeseries" }, { @@ -1277,7 +2240,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Wall-clock per-worker Cache dump throughput in GB/s. Aggregates all concurrent Cache dump tasks inside one worker process and includes idle gaps, so this is the real bytes/sec the worker is pushing through the Cache stage. Pair with (per task) above: high per-task with low per-worker means tasks are fast but sparse.", + "description": "Cache dump time waiting for the prerequisite compute event to fire before D2H can start. Large values indicate dump is compute-gated rather than copy-gated. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -1326,17 +2289,94 @@ } ] }, - "unit": "gbytes" + "unit": "ms" }, - "overrides": [] + "overrides": [ + { + "matcher": { + "id": "byRegexp", + "options": "^p50.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p90.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 10, + 6 + ] + } + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^p99.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "dash", + "dash": [ + 4, + 4 + ] + } + }, + { + "id": "custom.lineWidth", + "value": 2 + } + ] + }, + { + "matcher": { + "id": "byRegexp", + "options": "^avg.*" + }, + "properties": [ + { + "id": "custom.lineStyle", + "value": { + "fill": "solid" + } + }, + { + "id": "custom.lineWidth", + "value": 1 + }, + { + "id": "custom.fillOpacity", + "value": 5 + } + ] + } + ] }, "gridPos": { "h": 8, "w": 12, "x": 12, - "y": 33 + "y": 65 }, - "id": 28, + "id": 54, "options": { "legend": { "calcs": [ @@ -1360,13 +2400,46 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_bytes_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])) / 1e9", - "legendFormat": "{{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_prereq_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_prereq_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_prereq_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", + "range": true, + "refId": "C" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_prereq_wait_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_prereq_wait_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", + "range": true, + "refId": "D" } ], - "title": "Cache Dump Bandwidth (aggregated)", + "title": "Cache Dump Wait Compute Duration", "type": "timeseries" }, { @@ -1374,7 +2447,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time Cache load tasks spend queued before dispatch worker pickup. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Cache load residual H2D stream drain after the last shard submit. Large values indicate H2D copy is the bottleneck; near-zero values with large shard backend wait indicate storage/backend readiness is the bottleneck. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -1508,9 +2581,9 @@ "h": 8, "w": 12, "x": 0, - "y": 41 + "y": 73 }, - "id": 35, + "id": 53, "options": { "legend": { "calcs": [ @@ -1534,8 +2607,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1545,8 +2618,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1556,8 +2629,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -1567,13 +2640,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_h2d_sync_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Load Queue Wait Duration", + "title": "Cache Load H2D Duration", "type": "timeseries" }, { @@ -1581,7 +2654,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time Cache dump tasks spend queued before dispatch worker pickup. This is the main signal for explaining gaps between cache_dump_duration_ms and the mkbuf/D2H/backend-submit phase metrics. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Cache dump D2H stream synchronize duration, including prerequisite compute wait. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -1715,9 +2788,9 @@ "h": 8, "w": 12, "x": 12, - "y": 41 + "y": 73 }, - "id": 45, + "id": 58, "options": { "legend": { "calcs": [ @@ -1741,8 +2814,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1752,8 +2825,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1763,8 +2836,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -1774,13 +2847,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Dump Queue Wait Duration", + "title": "Cache Dump D2H Duration (include wait compute)", "type": "timeseries" }, { @@ -1788,7 +2861,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Cache load dispatch cost after worker pickup: buffer allocation/reuse plus backend submission for cache misses. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Cache load backend submit duration: buffer allocation plus synchronous backend load submission. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -1922,7 +2995,7 @@ "h": 8, "w": 12, "x": 0, - "y": 57 + "y": 81 }, "id": 43, "options": { @@ -1948,8 +3021,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -1959,8 +3032,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -1970,8 +3043,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -1981,13 +3054,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_dispatch_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Load Dispatch Duration", + "title": "Cache Load Backend Submit Duration", "type": "timeseries" }, { @@ -1995,7 +3068,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Cache dump mkbuf phase: prerequisite wait + buffer allocation/reuse + D2H async submit before stream sync. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Cache dump mkbuf phase: buffer allocation/reuse plus D2H async submit before stream sync. Prerequisite compute-event wait is split into Cache Dump Wait Compute Duration. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -2129,7 +3202,7 @@ "h": 8, "w": 12, "x": 12, - "y": 57 + "y": 81 }, "id": 9, "options": { @@ -2155,8 +3228,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2166,8 +3239,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -2177,8 +3250,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -2188,8 +3261,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_mkbuf_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -2202,7 +3275,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Per-shard Cache load H2D async submit duration after WaitBackendTaskReady. This is recorded for each shard; it is not a task-level duration, does not include full task stream synchronization, and should not be stacked with total load, queue wait, or dispatch. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Per-shard Cache load H2D async submit CPU cost after WaitBackendTaskReady. Submission only; this is not the actual H2D transfer/drain time. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -2336,7 +3409,7 @@ "h": 8, "w": 12, "x": 0, - "y": 65 + "y": 89 }, "id": 6, "options": { @@ -2362,8 +3435,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_h2d_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2373,8 +3446,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_h2d_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -2384,8 +3457,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_h2d_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_h2d_submit_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -2395,13 +3468,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_shard_h2d_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_shard_h2d_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_h2d_submit_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_h2d_submit_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Load Shard H2D Duration", + "title": "Cache Load H2D Submit Cost", "type": "timeseries" }, { @@ -2409,7 +3482,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Cache dump D2H stream sync phase after mkbuf. Excludes buffer allocation and async submit. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Synchronous Cache dump backend submit duration around backend_->Dump(...). This is a submit cost only and does not include lower-tier disk write completion. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -2543,9 +3616,9 @@ "h": 8, "w": 12, "x": 12, - "y": 65 + "y": 89 }, - "id": 47, + "id": 44, "options": { "legend": { "calcs": [ @@ -2569,8 +3642,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2580,8 +3653,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -2591,8 +3664,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -2602,13 +3675,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_d2h_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Dump D2H Duration", + "title": "Cache Dump Backend Submit Duration", "type": "timeseries" }, { @@ -2616,7 +3689,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Per-shard Cache load WaitBackendTaskReady duration. For a miss this includes waiting for the backend load task; for a hit or shared buffer it captures readiness wait for that shard. This is not a task-level duration and should not be stacked with total load, queue wait, or dispatch. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Average Cache load bandwidth. Series: p50 / p90 / p99 / avg. Per-task observed Cache load throughput, sampled once per Cache load task as total task bytes divided by task wall-clock duration. This is not H2D-only bandwidth.", "fieldConfig": { "defaults": { "color": { @@ -2665,7 +3738,7 @@ } ] }, - "unit": "ms" + "unit": "gbytes" }, "overrides": [ { @@ -2750,9 +3823,9 @@ "h": 8, "w": 12, "x": 0, - "y": 81 + "y": 41 }, - "id": 48, + "id": 4, "options": { "legend": { "calcs": [ @@ -2776,8 +3849,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2787,8 +3860,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -2798,8 +3871,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -2809,13 +3882,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Load Shard Backend Wait Duration", + "title": "Cache Load Bandwidth (per task)", "type": "timeseries" }, { @@ -2823,7 +3896,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Synchronous Cache dump backend submit duration around backend_->Dump(...). This is a submit cost only and does not include lower-tier disk write completion. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Average Cache dump bandwidth. Series: p50 / p90 / p99 / avg. Per-task observed Cache dump throughput, sampled once per Cache dump task as total task bytes divided by task wall-clock duration. This is not D2H-only bandwidth.", "fieldConfig": { "defaults": { "color": { @@ -2872,7 +3945,7 @@ } ] }, - "unit": "ms" + "unit": "gbytes" }, "overrides": [ { @@ -2957,9 +4030,9 @@ "h": 8, "w": 12, "x": 12, - "y": 81 + "y": 41 }, - "id": 44, + "id": 8, "options": { "legend": { "calcs": [ @@ -2983,8 +4056,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -2994,8 +4067,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -3005,8 +4078,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -3016,13 +4089,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_submit_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Dump Backend Submit Duration", + "title": "Cache Dump Bandwidth (per task)", "type": "timeseries" }, { @@ -3030,7 +4103,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Average Cache load bandwidth. Series: p50 / p90 / p99 / avg. Per-task observed Cache load throughput, sampled once per Cache load task as total task bytes divided by task wall-clock duration. This is not H2D-only bandwidth.", + "description": "Per-shard Cache load WaitBackendTaskReady duration. For a miss this includes waiting for the backend load task; for a hit or shared buffer it captures readiness wait for that shard. This is not a task-level duration and should not be stacked with total load, queue wait, or dispatch. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -3079,7 +4152,7 @@ } ] }, - "unit": "gbytes" + "unit": "ms" }, "overrides": [ { @@ -3164,9 +4237,9 @@ "h": 8, "w": 12, "x": 0, - "y": 73 + "y": 65 }, - "id": 4, + "id": 48, "options": { "legend": { "calcs": [ @@ -3190,8 +4263,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -3201,8 +4274,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -3212,8 +4285,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -3223,13 +4296,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_load_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_shard_backend_wait_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Load Bandwidth (per task)", + "title": "Cache Load Backend Wait Duration", "type": "timeseries" }, { @@ -3237,7 +4310,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Average Cache dump bandwidth. Series: p50 / p90 / p99 / avg. Per-task observed Cache dump throughput, sampled once per Cache dump task as total task bytes divided by task wall-clock duration. This is not D2H-only bandwidth.", + "description": "Cache buffer lookup wall-clock time per `Lookup` / `LookupOnPrefix` call (fast path, in-memory hit/miss scan). Sub-millisecond is healthy; multi-millisecond means the scheduler is paying per-decision lookup overhead. Series: p50 / p90 / p99 / avg.", "fieldConfig": { "defaults": { "color": { @@ -3286,7 +4359,7 @@ } ] }, - "unit": "gbytes" + "unit": "ms" }, "overrides": [ { @@ -3370,10 +4443,10 @@ "gridPos": { "h": 8, "w": 12, - "x": 12, - "y": 73 + "x": 0, + "y": 105 }, - "id": 8, + "id": 29, "options": { "legend": { "calcs": [ @@ -3397,8 +4470,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -3408,8 +4481,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -3419,8 +4492,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -3430,13 +4503,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Dump Bandwidth (per task)", + "title": "Cache Lookup Duration", "type": "timeseries" }, { @@ -3444,7 +4517,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Cache buffer lookup wall-clock time per `Lookup` / `LookupOnPrefix` call (fast path, in-memory hit/miss scan). Sub-millisecond is healthy; multi-millisecond means the scheduler is paying per-decision lookup overhead. Series: p50 / p90 / p99 / avg.", + "description": "Backend lookup time when descending due to no Cache buffer or buffer miss. Reflects the metadata-query latency of the lower tier (Posix / Ds3fs). Series: p50 / p90 / p99 / avg.", "fieldConfig": { "defaults": { "color": { @@ -3577,10 +4650,10 @@ "gridPos": { "h": 8, "w": 12, - "x": 0, - "y": 89 + "x": 12, + "y": 105 }, - "id": 29, + "id": 30, "options": { "legend": { "calcs": [ @@ -3604,8 +4677,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -3615,8 +4688,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -3626,8 +4699,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -3637,13 +4710,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_lookup_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Lookup Duration", + "title": "Cache Lookup Backend Duration", "type": "timeseries" }, { @@ -3651,7 +4724,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Backend lookup time when descending due to no Cache buffer or buffer miss. Reflects the metadata-query latency of the lower tier (Posix / Ds3fs). Series: p50 / p90 / p99 / avg.", + "description": "Time spent waiting for the lower tier to finish writing a dumped Cache task. Large values indicate the storage write path is the bottleneck after backend submit. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -3785,9 +4858,9 @@ "h": 8, "w": 12, "x": 12, - "y": 89 + "y": 113 }, - "id": 30, + "id": 56, "options": { "legend": { "calcs": [ @@ -3811,8 +4884,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -3822,8 +4895,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -3833,8 +4906,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:cache_dump_backend_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -3844,13 +4917,13 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_lookup_backend_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:cache_dump_backend_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } ], - "title": "Cache Lookup Backend Duration", + "title": "Cache Dump Backend Wait Duration", "type": "timeseries" }, { @@ -3866,7 +4939,7 @@ "h": 1, "w": 24, "x": 0, - "y": 97 + "y": 121 }, "panels": [ { @@ -3891,7 +4964,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:cache_load_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -3923,7 +4996,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -3976,7 +5049,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:cache_load_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -4008,7 +5081,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -4061,7 +5134,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:cache_dump_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -4093,7 +5166,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -4146,7 +5219,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:cache_dump_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -4178,7 +5251,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -4217,7 +5290,7 @@ "h": 1, "w": 24, "x": 0, - "y": 122 + "y": 146 }, "id": 38, "panels": [], @@ -4229,7 +5302,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Posix load task rate derived from rate(posix_load_task_duration_ms_count). One count is one Posix load task. Series by selected view.", + "description": "Posix load task rate derived from rate(posix_load_task_duration_ms_count). One count is one Posix load task. Series split by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -4286,7 +5359,7 @@ "h": 8, "w": 12, "x": 0, - "y": 123 + "y": 147 }, "id": 51, "options": { @@ -4309,10 +5382,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "hide": false, "instant": false, - "legendFormat": "load {{worker_id}}", + "legendFormat": "load {{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -4325,7 +5398,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Posix dump task rate derived from rate(posix_dump_task_duration_ms_count). One count is one Posix dump task. Series by selected view.", + "description": "Posix dump task rate derived from rate(posix_dump_task_duration_ms_count). One count is one Posix dump task. Series split by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -4382,7 +5455,7 @@ "h": 8, "w": 12, "x": 12, - "y": 123 + "y": 147 }, "id": 52, "options": { @@ -4405,10 +5478,10 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "hide": false, "instant": false, - "legendFormat": "dump {{worker_id}}", + "legendFormat": "dump {{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -4555,7 +5628,7 @@ "h": 8, "w": 12, "x": 0, - "y": 131 + "y": 155 }, "id": 31, "options": { @@ -4581,8 +5654,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -4592,8 +5665,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -4603,8 +5676,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -4614,8 +5687,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -4762,7 +5835,7 @@ "h": 8, "w": 12, "x": 12, - "y": 131 + "y": 155 }, "id": 32, "options": { @@ -4788,8 +5861,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -4799,8 +5872,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -4810,8 +5883,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -4821,8 +5894,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -4892,7 +5965,7 @@ "h": 8, "w": 12, "x": 0, - "y": 139 + "y": 163 }, "id": 41, "options": { @@ -4918,8 +5991,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "queue wait {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "queue wait {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -4929,8 +6002,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n ),\n 0\n)", - "legendFormat": "task after queue {{worker_id}}", + "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_load_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n ),\n 0\n)", + "legendFormat": "task after queue {{engine}} {{worker_rank}}", "range": true, "refId": "B" } @@ -5000,7 +6073,7 @@ "h": 8, "w": 12, "x": 12, - "y": 139 + "y": 163 }, "id": 42, "options": { @@ -5026,8 +6099,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))", - "legendFormat": "queue wait {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))", + "legendFormat": "queue wait {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -5037,8 +6110,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[40s]))\n ),\n 0\n)", - "legendFormat": "task after queue {{worker_id}}", + "expr": "clamp_min(\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_task_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n )\n -\n (\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n /\n sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[40s]))\n ),\n 0\n)", + "legendFormat": "task after queue {{engine}} {{worker_rank}}", "range": true, "refId": "B" } @@ -5108,7 +6181,7 @@ "h": 8, "w": 12, "x": 0, - "y": 147 + "y": 171 }, "id": 25, "options": { @@ -5134,8 +6207,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_s2h_bytes_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])) / 1e9", - "legendFormat": "{{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_s2h_bytes_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])) / 1e9", + "legendFormat": "{{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -5205,7 +6278,7 @@ "h": 8, "w": 12, "x": 12, - "y": 147 + "y": 171 }, "id": 26, "options": { @@ -5231,8 +6304,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_h2s_bytes_total{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])) / 1e9", - "legendFormat": "{{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_h2s_bytes_total{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])) / 1e9", + "legendFormat": "{{engine}} {{worker_rank}}", "range": true, "refId": "A" } @@ -5245,7 +6318,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time Posix load IO spends queued before worker pickup. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Time Posix load IO spends queued before worker pickup. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -5379,7 +6452,7 @@ "h": 8, "w": 12, "x": 0, - "y": 155 + "y": 179 }, "id": 36, "options": { @@ -5405,8 +6478,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -5416,8 +6489,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -5427,8 +6500,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -5438,8 +6511,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_load_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -5452,7 +6525,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Time Posix dump IO spends queued before worker pickup. Series: p50 / p90 / p99 / avg by selected view.", + "description": "Time Posix dump IO spends queued before worker pickup. Series: p50 / p90 / p99 / avg by engine and worker_rank.", "fieldConfig": { "defaults": { "color": { @@ -5586,7 +6659,7 @@ "h": 8, "w": 12, "x": 12, - "y": 155 + "y": 179 }, "id": 46, "options": { @@ -5612,8 +6685,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -5623,8 +6696,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -5634,8 +6707,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -5645,8 +6718,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_dump_queue_wait_duration_ms_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -5659,7 +6732,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Posix-stage storage-to-host bandwidth (load direction). Series: p50 / p90 / p99 / avg by selected view. Task-level observed throughput: total task bytes divided by task wall-clock duration.", + "description": "Posix-stage storage-to-host bandwidth (load direction). Series: p50 / p90 / p99 / avg by engine and worker_rank. Task-level observed throughput: total task bytes divided by task wall-clock duration.", "fieldConfig": { "defaults": { "color": { @@ -5793,7 +6866,7 @@ "h": 8, "w": 12, "x": 0, - "y": 163 + "y": 187 }, "id": 10, "options": { @@ -5819,8 +6892,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -5830,8 +6903,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -5841,8 +6914,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -5852,8 +6925,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_s2h_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -5866,7 +6939,7 @@ "type": "prometheus", "uid": "${DS_PROMETHEUS}" }, - "description": "Posix-stage host-to-storage bandwidth (dump direction). Series: p50 / p90 / p99 / avg by selected view. Task-level observed throughput: total task bytes divided by task wall-clock duration.", + "description": "Posix-stage host-to-storage bandwidth (dump direction). Series: p50 / p90 / p99 / avg by engine and worker_rank. Task-level observed throughput: total task bytes divided by task wall-clock duration.", "fieldConfig": { "defaults": { "color": { @@ -6000,7 +7073,7 @@ "h": 8, "w": 12, "x": 12, - "y": 163 + "y": 187 }, "id": 11, "options": { @@ -6026,8 +7099,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p50 {{worker_id}}", + "expr": "histogram_quantile(0.5, sum by (le, ${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p50 {{engine}} {{worker_rank}}", "range": true, "refId": "A" }, @@ -6037,8 +7110,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p90 {{worker_id}}", + "expr": "histogram_quantile(0.9, sum by (le, ${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p90 {{engine}} {{worker_rank}}", "range": true, "refId": "B" }, @@ -6048,8 +7121,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval])))", - "legendFormat": "p99 {{worker_id}}", + "expr": "histogram_quantile(0.99, sum by (le, ${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval])))", + "legendFormat": "p99 {{engine}} {{worker_rank}}", "range": true, "refId": "C" }, @@ -6059,8 +7132,8 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", - "legendFormat": "avg {{worker_id}}", + "expr": "sum by (${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_sum{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))\n/\nsum by (${perWorker:raw}) (rate(ucm:posix_h2s_bandwidth_gbps_count{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", + "legendFormat": "avg {{engine}} {{worker_rank}}", "range": true, "refId": "D" } @@ -6081,7 +7154,7 @@ "h": 1, "w": 24, "x": 0, - "y": 171 + "y": 195 }, "panels": [ { @@ -6106,7 +7179,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:posix_load_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -6138,7 +7211,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -6191,7 +7264,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:posix_dump_task_duration_ms_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -6223,7 +7296,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -6276,7 +7349,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:posix_s2h_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -6308,7 +7381,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, @@ -6361,7 +7434,7 @@ "uid": "${DS_PROMETHEUS}" }, "editorMode": "code", - "expr": "sum by (le) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", worker_id=~\"$worker_id\"}[$__rate_interval]))", + "expr": "sum by (le) (rate(ucm:posix_h2s_bandwidth_gbps_bucket{model_name=\"$model_name\", job=~\"$job\", engine=~\"$engine\", worker_rank=~\"$worker_rank\"}[$__rate_interval]))", "format": "heatmap", "legendFormat": "{{le}}", "range": true, @@ -6393,7 +7466,7 @@ "unit": "suffix: req/s" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "tooltip": { "show": true, diff --git a/examples/metrics/grafana_vllm.json b/examples/metrics/grafana_vllm.json index ee9d0e8f6..0f9563a3d 100644 --- a/examples/metrics/grafana_vllm.json +++ b/examples/metrics/grafana_vllm.json @@ -29,9 +29,9 @@ "links": [ { "type": "dashboards", - "title": "Other UCM dashboards", + "title": "Other UCM vLLM metrics dashboards", "tags": [ - "ucm" + "ucm-vllm-connector-metrics" ], "asDropdown": true, "includeVars": true, @@ -881,6 +881,123 @@ "title": "Cache Utilization", "type": "timeseries" }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "description": "Breakdown of prefix-cache hit rate from vLLM local GPU prefix cache and external KV connector cache.", + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": 60000, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "red", + "value": null + }, + { + "color": "yellow", + "value": 0.5 + }, + { + "color": "green", + "value": 0.8 + } + ] + }, + "unit": "percentunit", + "min": 0, + "max": 1 + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 24 + }, + "id": 17, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "maxHeight": "50%", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "multi", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum(rate(vllm:prefix_cache_hits_total{model_name=\"$model_name\", job=~\"$job\", instance=\"$instance\"}[$__rate_interval]))\n/\nclamp_min(sum(rate(vllm:prefix_cache_queries_total{model_name=\"$model_name\", job=~\"$job\", instance=\"$instance\"}[$__rate_interval])), 1)", + "instant": false, + "legendFormat": "GPU Prefix Cache", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "${DS_PROMETHEUS}" + }, + "editorMode": "code", + "expr": "sum(rate(vllm:external_prefix_cache_hits_total{model_name=\"$model_name\", job=~\"$job\", instance=\"$instance\"}[$__rate_interval]))\n/\nclamp_min(sum(rate(vllm:external_prefix_cache_queries_total{model_name=\"$model_name\", job=~\"$job\", instance=\"$instance\"}[$__rate_interval])), 1)", + "instant": false, + "legendFormat": "Connector Prefix Cache", + "range": true, + "refId": "B" + } + ], + "title": "KV Cache Hit Rate Breakdown", + "type": "timeseries" + }, { "datasource": { "type": "prometheus", @@ -906,7 +1023,7 @@ "h": 8, "w": 12, "x": 0, - "y": 24 + "y": 32 }, "id": 12, "options": { @@ -929,7 +1046,7 @@ "color": "rgba(255,0,255,0.7)" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "legend": { "show": true @@ -998,7 +1115,7 @@ "h": 8, "w": 12, "x": 12, - "y": 24 + "y": 32 }, "id": 13, "options": { @@ -1021,7 +1138,7 @@ "color": "rgba(255,0,255,0.7)" }, "filterValues": { - "le": 1e-9 + "le": 1e-09 }, "legend": { "show": true @@ -1129,7 +1246,7 @@ "h": 8, "w": 12, "x": 0, - "y": 32 + "y": 40 }, "id": 11, "options": { @@ -1231,7 +1348,7 @@ "h": 8, "w": 12, "x": 12, - "y": 32 + "y": 40 }, "id": 14, "options": { @@ -1332,7 +1449,7 @@ "h": 8, "w": 12, "x": 0, - "y": 40 + "y": 48 }, "id": 15, "options": { @@ -1446,7 +1563,7 @@ "h": 8, "w": 12, "x": 12, - "y": 40 + "y": 48 }, "id": 16, "options": { @@ -1487,7 +1604,7 @@ "refresh": "", "schemaVersion": 39, "tags": [ - "ucm", + "ucm-vllm-connector-metrics", "vllm" ], "templating": { @@ -1601,8 +1718,8 @@ }, "timepicker": {}, "timezone": "", - "title": "vLLM", - "uid": "vllm-overview", + "title": "vLLM (UCM Metrics)", + "uid": "ucm-vllm-overview", "version": 1, "weekStart": "" } diff --git a/examples/metrics/metrics_configs.yaml b/examples/metrics/metrics_configs.yaml index 8c4b28fac..6c6074634 100644 --- a/examples/metrics/metrics_configs.yaml +++ b/examples/metrics/metrics_configs.yaml @@ -2,9 +2,14 @@ # This file defines which metrics should be enabled and their configurations log_interval: 5 # Interval in seconds for logging metrics -multiproc_dir: "/vllm-workspace" # Directory for Prometheus multiprocess mode +# multiproc_dir: "/vllm-workspace" # Directory for Prometheus multiprocess mode -metric_prefix: "ucm:" +# multiproc_prefix: "ucm_multiproc:" +vllm_connector_prefix: "ucm:" + +consumers: + # multiproc: true + vllm_connector: true # ============================================================================ # Common bucket aliases (for readability only - YAML doesn't inline these). @@ -79,7 +84,7 @@ counter: documentation: "Number of Posix read, write, or AIO completion failures" # -- Connector layer: per-worker real throughput counters ---------------- - # The legacy load_speed / save_speed histograms sample per-call speed + # The connector load_speed histogram samples per-call speed # (a statistical distribution), so an aggregated quantile across workers # is still ~per-call speed, not a sum. Use rate(*_bytes_total) / 1e9 to # get the actual GB/s the whole vLLM service is moving through UCM. @@ -130,11 +135,11 @@ histogram: documentation: "Number of blocks saved to ucm" buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000] - name: "save_duration" - documentation: "Time to save to ucm (ms)" + documentation: "Time from UCM connector wait_for_save entry to async dump task completion (ms)" buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000] - - name: "save_speed" - documentation: "Speed of saving to ucm (GB/s)" - buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100] + - name: "save_completion_wait_duration" + documentation: "Time spent blocked while confirming async UCM connector dump completion (ms)" + buckets: [0, 1, 2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000] - name: "interval_lookup_hit_rates" documentation: "Hit rates of ucm lookup requests" buckets: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] @@ -169,24 +174,39 @@ histogram: - name: "cache_dump_queue_wait_duration_ms" documentation: "Time a Cache dump task spent queued before dispatch worker pickup (ms)" buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 500] - - name: "cache_load_dispatch_duration_ms" - documentation: "Cache load dispatch cost: buffer allocation + backend submission (ms)" + - name: "cache_load_backend_submit_duration_ms" + documentation: "Cache load backend submit duration: buffer allocation plus synchronous backend load submission (ms)." buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 500] - name: "cache_shard_backend_wait_ms" documentation: "Cache load per-shard time spent in WaitBackendTaskReady before H2D submit (ms). This is not a task-level duration." buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] - - name: "cache_shard_h2d_ms" - documentation: "Cache load per-shard H2D async submit duration after backend wait (ms). This is not a task-level duration and does not include full task stream synchronization." + - name: "cache_h2d_submit_ms" + documentation: "Cache load per-shard H2D async submit CPU cost after backend wait (ms). Submission only; NOT the actual transfer time (see cache_h2d_sync_ms)." buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] + - name: "cache_h2d_sync_ms" + documentation: "Cache load residual H2D stream drain after the last shard submit (ms). Large => H2D copy is the bottleneck; ~0 with large cache_shard_backend_wait_ms => storage read is the bottleneck." + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] + - name: "cache_h2d_bandwidth_gbps" + documentation: "Cache load pure H2D copy bandwidth (GB/s): copied bytes / cache_h2d_sync_ms. Directly comparable to memcpy microbenchmarks; far below them => host memory/DMA path issue." + buckets: [0.5, 1, 2, 4, 8, 12, 16, 24, 32, 48, 64, 96, 128, 192, 256] - name: "cache_dump_mkbuf_duration_ms" - documentation: "Cache dump mk_buf phase: prerequisite wait + buffer allocation/reuse + D2H async submit before stream sync (ms)" + documentation: "Cache dump mk_buf phase: buffer allocation/reuse + D2H async submit before stream sync (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] + - name: "cache_dump_prereq_wait_ms" + documentation: "Cache dump time waiting for the prerequisite compute event (layer KV ready) to fire before D2H can start (ms). Large => dump is compute-gated, not copy-gated." buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] - name: "cache_d2h_duration_ms" - documentation: "Cache dump D2H stream sync phase after mk_buf; excludes buffer allocation and async submit (ms)" + documentation: "Cache dump stream synchronize duration including prerequisite compute wait and D2H copy (ms). Use cache_dump_prereq_wait_ms to estimate the compute-gated portion." buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] + - name: "cache_d2h_bandwidth_gbps" + documentation: "Cache dump effective D2H bandwidth (GB/s): copied bytes / cache_d2h_duration_ms. Duration includes prerequisite compute wait." + buckets: [0.5, 1, 2, 4, 8, 12, 16, 24, 32, 48, 64, 96, 128, 192, 256] - name: "cache_dump_backend_submit_duration_ms" documentation: "Cache dump backend submit duration: synchronous time to pass buffers to the lower tier (ms). Does NOT include the lower tier's actual write time." buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 500] + - name: "cache_dump_backend_wait_duration_ms" + documentation: "Cache dump time waiting for the lower tier to finish writing a dumped task (ms). Large => storage write is the bottleneck." + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000] # -- Posix stage (disk tier) --------------------------------------------- - name: "posix_load_task_duration_ms" @@ -227,6 +247,30 @@ histogram: - name: "layerwise_batch_total_ms" documentation: "Layerwise batch wall-clock time from start_load_kv entry to wait_for_save return (ms)" buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_total_load_only_ms" + documentation: "Layerwise load-only batch wall-clock time from start_load_kv entry to wait_for_save return (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_total_save_only_ms" + documentation: "Layerwise save-only batch wall-clock time from start_load_kv entry to wait_for_save return (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_total_load_save_ms" + documentation: "Layerwise load-and-save batch wall-clock time from start_load_kv entry to wait_for_save return (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_total_no_transfer_ms" + documentation: "Layerwise batch wall-clock time with neither load nor save work (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_load_wait_total_load_only_ms" + documentation: "Total wait_for_layer_load blocking time accumulated within one load-only layerwise batch (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_load_wait_total_load_save_ms" + documentation: "Total wait_for_layer_load blocking time accumulated within one load-and-save layerwise batch (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000] + - name: "layerwise_batch_save_tail_save_only_ms" + documentation: "wait_for_save tail duration within one save-only layerwise batch (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000] + - name: "layerwise_batch_save_tail_load_save_ms" + documentation: "wait_for_save tail duration within one load-and-save layerwise batch (ms)" + buckets: [0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000] - name: "layerwise_wait_blocking_ms" documentation: "Time wait_for_layer_load blocked before returning (ms). Near 0 = good overlap." buckets: [0, 0.1, 0.5, 1, 2, 5, 10, 20, 50, 100, 200, 500] diff --git a/test/test_ucm_connector_metrics.py b/test/test_ucm_connector_metrics.py new file mode 100644 index 000000000..2e28b3231 --- /dev/null +++ b/test/test_ucm_connector_metrics.py @@ -0,0 +1,1466 @@ +import importlib +import json +import math +import re +import sys +import threading +from dataclasses import dataclass, field +from enum import Enum +from pathlib import Path +from types import ModuleType, SimpleNamespace + +import pytest + +REPO_ROOT = Path(__file__).resolve().parents[1] +sys.path.insert(0, str(REPO_ROOT)) + + +class FakeValue: + def __init__(self): + self.value = 0 + + def inc(self, value): + self.value += value + + +class FakeMetric: + created = {} + + def __init__(self, name, documentation, labelnames=None, buckets=None, **kwargs): + self.name = name + self.documentation = documentation + self.labelnames = list(labelnames or []) + self.buckets = list(buckets or []) + self.children = {} + self.observations = [] + self.increments = [] + self.set_values = [] + self.labelvalues = () + self._init_storage() + self.__class__.created[name] = self + + def _init_storage(self): + pass + + def labels(self, *labelvalues, **labelkwargs): + if labelkwargs: + labelvalues = tuple(labelkwargs[name] for name in self.labelnames) + child = self.__class__.__new__(self.__class__) + child.name = self.name + child.documentation = self.documentation + child.labelnames = self.labelnames + child.buckets = self.buckets + child.children = {} + child.observations = [] + child.increments = [] + child.set_values = [] + child.labelvalues = tuple(labelvalues) + child._init_storage() + self.children[child.labelvalues] = child + return child + + def observe(self, value): + self.observations.append(value) + + def inc(self, value): + self.increments.append(value) + + def set(self, value): + self.set_values.append(value) + + +class FakeGauge(FakeMetric): + created = {} + + +class FakeCounter(FakeMetric): + created = {} + + +class FakeHistogram(FakeMetric): + created = {} + + def _init_storage(self): + self._upper_bounds = list(self.buckets) + if not self._upper_bounds or not math.isinf(self._upper_bounds[-1]): + self._upper_bounds.append(math.inf) + self._buckets = [FakeValue() for _ in self._upper_bounds] + self._sum = FakeValue() + + +class FakeThread: + def __init__(self, target): + self.target = target + self.started = False + self.joined = False + + def start(self): + self.started = True + + def join(self): + self.joined = True + + +def _install_package(name, path): + parent_name, _, child_name = name.rpartition(".") + if parent_name and parent_name not in sys.modules: + _install_package(parent_name, path.parent) + module = ModuleType(name) + module.__path__ = [str(path)] + sys.modules[name] = module + if parent_name: + setattr(sys.modules[parent_name], child_name, module) + return module + + +def _install_module(name, **attrs): + parent_name, _, child_name = name.rpartition(".") + if parent_name and parent_name not in sys.modules: + try: + importlib.import_module(parent_name) + except ModuleNotFoundError: + _install_module(parent_name) + module = ModuleType(name) + for attr, value in attrs.items(): + setattr(module, attr, value) + sys.modules[name] = module + if parent_name: + setattr(sys.modules[parent_name], child_name, module) + return module + + +class KVConnectorRole(Enum): + WORKER = "worker" + SCHEDULER = "scheduler" + + +class KVConnectorBase_V1: + def __init__(self, vllm_config=None, role=None, kv_cache_config=None): + self._vllm_config = vllm_config + self._role = role + self._kv_cache_config = kv_cache_config + + def clear_connector_metadata(self): + pass + + def get_kv_connector_stats(self): + return None + + +class SupportsHMA: + pass + + +class KVConnectorMetadata: + pass + + +@dataclass +class KVConnectorStats: + data: dict = field(default_factory=dict) + + def is_empty(self): + return not self.data + + +class KVConnectorPromMetrics: + def __init__(self, vllm_config, metric_types, labelnames, per_engine_labelvalues): + self._kv_transfer_config = vllm_config.kv_transfer_config + self._gauge_cls = metric_types[FakeGauge] + self._counter_cls = metric_types[FakeCounter] + self._histogram_cls = metric_types[FakeHistogram] + self._labelnames = labelnames + self.per_engine_labelvalues = per_engine_labelvalues + + +class FakeUcmMetrics: + def __init__(self): + self.created = [] + self.updated = [] + self.setup_calls = 0 + self.drained = [] + self.snapshot = ({}, {}, {}) + self.on_drain = None + + def set_up(self, *args, **kwargs): + self.setup_calls += 1 + + def create_stats(self, name, metric_type, buckets=None): + self.created.append((name, metric_type, tuple(buckets or ()))) + + def update_stats(self, stats): + self.updated.append(stats) + + def get_all_stats_and_clear(self): + self.drained.append("all") + if self.on_drain is not None: + self.on_drain() + snapshot = self.snapshot + self.snapshot = ({}, {}, {}) + return snapshot + + +class _Logger: + def info(self, *args, **kwargs): + pass + + def debug(self, *args, **kwargs): + pass + + def error(self, *args, **kwargs): + pass + + def warning(self, *args, **kwargs): + pass + + def warning_once(self, *args, **kwargs): + pass + + +class CaptureLogger: + def __init__(self): + self.infos = [] + + def info(self, message, *args, **kwargs): + if args: + message = message % args + self.infos.append(str(message)) + + def debug(self, *args, **kwargs): + pass + + def error(self, *args, **kwargs): + pass + + def warning(self, *args, **kwargs): + pass + + def warning_once(self, *args, **kwargs): + pass + + +class FakeConfig: + def __init__(self, kv_transfer_config): + self._config = getattr(kv_transfer_config, "launch_config", {}) + + def get_config(self): + return self._config + + +fake_ucmmetrics = FakeUcmMetrics() + +GRAFANA_VLLM_UCM_TAG = "ucm-vllm-connector-metrics" +GRAFANA_UCM_DASHBOARDS = [ + "grafana_connector.json", + "grafana_layerwise.json", + "grafana_pipeline_store.json", +] + + +def _install_stubs(): + _install_module( + "numpy", + ndarray=object, + bool_=bool, + uint64="uint64", + int64="int64", + zeros=lambda shape, *args, **kwargs: [ + [0 for _ in range(shape[1])] for _ in range(shape[0]) + ], + vstack=lambda rows: list(rows), + concatenate=lambda arrays, axis=0: [ + item for array in arrays for item in list(array) + ], + asarray=lambda values, dtype=None: values, + arange=lambda *args, **kwargs: list(range(*args)), + isscalar=lambda value: isinstance(value, (int, float, bool, str, bytes)), + ) + _install_package("ucm", REPO_ROOT / "ucm") + _install_package("ucm.integration", REPO_ROOT / "ucm" / "integration") + _install_package("ucm.integration.vllm", REPO_ROOT / "ucm" / "integration" / "vllm") + _install_module("torch", Tensor=type("Tensor", (), {})) + _install_module( + "prometheus_client", + Counter=FakeCounter, + Gauge=FakeGauge, + Histogram=FakeHistogram, + ) + _install_module("vllm.config", VllmConfig=type("VllmConfig", (), {})) + _install_module( + "vllm.distributed.kv_transfer.kv_connector.v1.base", + KVConnectorBase_V1=KVConnectorBase_V1, + KVConnectorMetadata=KVConnectorMetadata, + KVConnectorRole=KVConnectorRole, + SupportsHMA=SupportsHMA, + ) + _install_module( + "vllm.distributed.kv_transfer.kv_connector.v1.metrics", + KVConnectorPromMetrics=KVConnectorPromMetrics, + KVConnectorStats=KVConnectorStats, + PromMetric=object, + PromMetricT=object, + ) + _install_module( + "vllm.distributed.parallel_state", + get_world_group=lambda: SimpleNamespace(local_rank=0, rank=0), + ) + _install_module( + "vllm.model_executor.models.utils", + extract_layer_index=lambda layer_name: 0, + ) + _install_module( + "vllm.platforms", + current_platform=SimpleNamespace( + is_cuda_alike=lambda: True, + device_type="cuda", + ), + ) + _install_module( + "vllm.v1.core.sched.output", + SchedulerOutput=type("SchedulerOutput", (), {}), + ) + _install_module( + "vllm.v1.kv_cache_interface", + FullAttentionSpec=type("FullAttentionSpec", (), {}), + KVCacheConfig=type("KVCacheConfig", (), {}), + KVCacheSpec=type("KVCacheSpec", (), {}), + MambaSpec=type("MambaSpec", (), {}), + SlidingWindowSpec=type("SlidingWindowSpec", (), {}), + UniformTypeKVCacheSpecs=type("UniformTypeKVCacheSpecs", (), {}), + ) + _install_module( + "vllm.v1.outputs", + KVConnectorOutput=type("KVConnectorOutput", (), {}), + ) + _install_module( + "ucm.integration.vllm.device", + create_device=lambda *args, **kwargs: None, + ) + _install_module("ucm.logger", init_logger=lambda name: _Logger()) + _install_module("ucm.shared.metrics", ucmmetrics=fake_ucmmetrics) + _install_module( + "ucm.store.factory_v1", + UcmConnectorFactoryV1=type("UcmConnectorFactoryV1", (), {}), + ) + _install_module( + "ucm.store.ucmstore_v1", + Task=type("Task", (), {}), + UcmKVStoreBaseV1=type("UcmKVStoreBaseV1", (), {}), + ) + _install_module("ucm.utils", Config=FakeConfig) + _install_module("ucm.sparse.state", has_ucm_sparse=lambda *args, **kwargs: False) + + +_install_stubs() + +import ucm.integration.vllm.ucm_connector as ucm_connector_module +from ucm.integration.vllm.metrics import UCMConnectorStats, UCMPromMetrics +from ucm.integration.vllm.ucm_connector import ( + PendingDumpTask, + UCMConnector, + UCMDirectConnector, +) +from ucm.metrics_config import ( + consumer_enabled, + get_metric_definitions, + get_vllm_connector_metric_definitions, + multiproc_metric_name, + setup_ucm_metrics, + vllm_connector_prefix, +) +from ucm.metrics_dispatcher import get_metrics_dispatcher + + +def _metric_types(): + return { + FakeGauge: FakeGauge, + FakeCounter: FakeCounter, + FakeHistogram: FakeHistogram, + } + + +def _metrics_config(consumers=None): + return { + "multiproc_prefix": "ucm_multiproc:", + "vllm_connector_prefix": "ucm:", + "consumers": consumers or {"multiproc": True, "vllm_connector": True}, + "counter": [ + { + "name": "load_bytes_total", + "documentation": "Total load bytes.", + } + ], + "gauge": [ + { + "name": "cache_lookup_hit_rate", + "documentation": "Latest cache lookup hit rate.", + } + ], + "histogram": [ + { + "name": "load_duration", + "documentation": "Load duration in ms.", + "buckets": [50, 100], + }, + { + "name": "cache_load_duration_ms", + "documentation": "Cache load duration in ms.", + "buckets": [1, 5], + }, + { + "name": "interval_lookup_hit_rates", + "documentation": "Prefer vLLM external prefix cache metrics.", + "buckets": [0.1, 0.5, 1.0], + }, + ], + } + + +def _vllm_config(config=None): + launch_config = {} + if config is not None: + launch_config["metrics_config"] = config + return SimpleNamespace( + kv_transfer_config=SimpleNamespace( + kv_connector="UCM", + launch_config=launch_config, + ) + ) + + +def _reset_fakes(): + fake_ucmmetrics.created.clear() + fake_ucmmetrics.updated.clear() + fake_ucmmetrics.setup_calls = 0 + fake_ucmmetrics.drained.clear() + fake_ucmmetrics.snapshot = ({}, {}, {}) + fake_ucmmetrics.on_drain = None + FakeGauge.created = {} + FakeCounter.created = {} + FakeHistogram.created = {} + import ucm.metrics_dispatcher as dispatcher_module + + dispatcher_module._DISPATCHER = None + + +def test_config_definitions_register_enable_list_and_metric_names(): + _reset_fakes() + config = _metrics_config() + + definitions = get_metric_definitions(config) + by_name = {definition.name: definition for definition in definitions} + setup_ucm_metrics(config) + + assert consumer_enabled(config, "multiproc") + assert consumer_enabled(config, "vllm_connector") + assert not consumer_enabled({"consumers": {"vllm_connector": True}}, "multiproc") + assert consumer_enabled({"consumers": {"vllm_connector": True}}, "vllm_connector") + assert ( + multiproc_metric_name(config, "load_bytes_total") + == "ucm_multiproc:load_bytes_total" + ) + assert vllm_connector_prefix({}) == "ucm:" + assert list(by_name) == [ + "load_bytes_total", + "cache_lookup_hit_rate", + "load_duration", + "cache_load_duration_ms", + "interval_lookup_hit_rates", + ] + assert by_name["load_duration"].vllm_connector_name == "ucm:load_duration" + assert by_name["load_duration"].vllm_connector_buckets == (50, 100) + assert by_name["load_duration"].vllm_connector_value_scale == 1.0 + assert ( + by_name["cache_load_duration_ms"].vllm_connector_name + == "ucm:cache_load_duration_ms" + ) + assert by_name["interval_lookup_hit_rates"].vllm_connector_enabled is False + assert fake_ucmmetrics.created == [ + ("load_bytes_total", "counter", ()), + ("cache_lookup_hit_rate", "gauge", ()), + ("load_duration", "histogram", (50, 100)), + ("cache_load_duration_ms", "histogram", (1, 5)), + ("interval_lookup_hit_rates", "histogram", (0.1, 0.5, 1.0)), + ] + + +def test_setup_ucm_metrics_logs_registered_metrics(monkeypatch): + _reset_fakes() + import ucm.metrics_config as metrics_config + + capture_logger = CaptureLogger() + monkeypatch.setattr(metrics_config, "logger", capture_logger) + + setup_ucm_metrics(_metrics_config()) + + assert capture_logger.infos == [ + "UCM metrics enabled for multiproc, vllm_connector: " + "total=5, counters=1, gauges=1, histograms=3" + ] + + +def test_dispatcher_fans_out_single_core_drain_to_independent_consumers(): + _reset_fakes() + config = _metrics_config() + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 4096.0, "not_configured": 99.0}, + {"cache_lookup_hit_rate": 0.5}, + { + "load_duration": ([0, 1, 0], 75.0), + "interval_lookup_hit_rates": ([1, 0, 0, 0], 0.1), + }, + ) + dispatcher = get_metrics_dispatcher(config) + + dispatcher.drain_to_consumers() + multiproc_stats = dispatcher.get_stats_and_clear("multiproc") + vllm_stats = dispatcher.get_stats_and_clear("vllm_connector") + + assert fake_ucmmetrics.drained == ["all"] + assert multiproc_stats[0] == {"load_bytes_total": 4096.0} + assert multiproc_stats[1] == {"cache_lookup_hit_rate": 0.5} + assert multiproc_stats[2]["load_duration"] == ([0, 1, 0], 75.0) + assert multiproc_stats[2]["interval_lookup_hit_rates"] == ([1, 0, 0, 0], 0.1) + assert vllm_stats[0] == {"load_bytes_total": 4096.0} + assert vllm_stats[1] == {"cache_lookup_hit_rate": 0.5} + assert vllm_stats[2] == {"load_duration": ([0, 1, 0], 75.0)} + assert dispatcher.get_stats_and_clear("multiproc") == ({}, {}, {}) + + +def test_dispatcher_accumulates_deltas_and_keeps_gauge_latest(): + _reset_fakes() + dispatcher = get_metrics_dispatcher(_metrics_config()) + + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 100.0}, + {"cache_lookup_hit_rate": 0.25}, + {"load_duration": ([1, 0, 0], 50.0)}, + ) + dispatcher.drain_to_consumers() + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 50.0}, + {"cache_lookup_hit_rate": 0.75}, + {"load_duration": ([0, 2, 0], 125.0)}, + ) + dispatcher.drain_to_consumers() + + counters, gauges, histograms = dispatcher.get_stats_and_clear("vllm_connector") + + assert counters == {"load_bytes_total": 150.0} + assert gauges == {"cache_lookup_hit_rate": 0.75} + assert histograms == {"load_duration": ([1, 2, 0], 175.0)} + + +def test_dispatcher_lock_covers_core_drain_and_fanout(): + _reset_fakes() + dispatcher = get_metrics_dispatcher(_metrics_config()) + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 100.0}, + {}, + {"load_duration": ([1, 0, 0], 50.0)}, + ) + reader_snapshot = None + reader_started = threading.Event() + reader_thread = None + + def read_consumer(): + nonlocal reader_snapshot + reader_started.set() + reader_snapshot = dispatcher.get_stats_and_clear("vllm_connector") + + def read_while_core_drain_is_active(): + nonlocal reader_thread + reader_thread = threading.Thread(target=read_consumer) + reader_thread.start() + reader_started.wait(timeout=1) + + fake_ucmmetrics.on_drain = read_while_core_drain_is_active + + dispatcher.drain_to_consumers() + reader_thread.join(timeout=1) + + assert reader_snapshot == ( + {"load_bytes_total": 100.0}, + {}, + {"load_duration": ([1, 0, 0], 50.0)}, + ) + assert dispatcher.get_stats_and_clear("vllm_connector") == ({}, {}, {}) + + +def test_dispatcher_disabled_consumer_does_not_store_snapshot(): + _reset_fakes() + config = _metrics_config({"multiproc": False, "vllm_connector": True}) + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 10.0}, + {}, + {"load_duration": ([1, 0, 0], 50.0)}, + ) + dispatcher = get_metrics_dispatcher(config) + + dispatcher.drain_to_consumers() + + assert dispatcher.get_stats_and_clear("multiproc") == ({}, {}, {}) + assert dispatcher.get_stats_and_clear("vllm_connector")[0] == { + "load_bytes_total": 10.0 + } + + +def test_dispatcher_rejects_invalid_consumer_name(): + _reset_fakes() + dispatcher = get_metrics_dispatcher(_metrics_config()) + + with pytest.raises(ValueError): + dispatcher.get_stats_and_clear("legacy") + + +def test_stats_from_ucm_snapshot_preserves_metric_types_and_worker_rank(): + definitions = get_vllm_connector_metric_definitions(_metrics_config()) + stats = UCMConnectorStats.from_ucm_snapshot( + counter_stats={"load_bytes_total": 2048.0, "not_configured": 99.0}, + gauge_stats={"cache_lookup_hit_rate": 0.75}, + histogram_stats={ + "load_duration": ([1, 2, 0], 150.0), + "interval_lookup_hit_rates": ([1, 0, 0, 0], 0.1), + }, + worker_rank=7, + metric_definitions=definitions, + ) + + assert stats.worker_rank == "7" + assert stats.data["counters_by_rank"]["7"]["load_bytes_total"] == 2048.0 + assert stats.data["gauges_by_rank"]["7"]["cache_lookup_hit_rate"] == 0.75 + assert stats.data["histograms_by_rank"]["7"]["load_duration"] == { + "bucket_counts": [1, 2, 0], + "sum": 150.0, + } + assert "not_configured" not in stats.data["counters_by_rank"]["7"] + assert "interval_lookup_hit_rates" not in stats.data["histograms_by_rank"]["7"] + + +def test_stats_record_aggregate_clone_and_reset_preserve_worker_rank(): + rank0 = UCMConnectorStats(worker_rank=0) + rank1 = UCMConnectorStats(worker_rank=1) + + rank0.record({"load_duration": 50.0}, {"load_duration": "histogram"}) + rank1.aggregate( + UCMConnectorStats( + data={ + "counters_by_rank": {"1": {"load_bytes_total": 10.0}}, + "gauges_by_rank": {"1": {"cache_lookup_hit_rate": 0.5}}, + "histograms_by_rank": { + "1": {"load_duration": {"bucket_counts": [0, 1], "sum": 75.0}} + }, + } + ) + ) + + rank0.aggregate(rank1) + snapshot = rank0.clone_and_reset() + + assert rank0.is_empty() + assert snapshot.data["histograms_by_rank"]["0"]["load_duration"] == [50.0] + assert snapshot.data["counters_by_rank"]["1"]["load_bytes_total"] == 10.0 + assert snapshot.data["gauges_by_rank"]["1"]["cache_lookup_hit_rate"] == 0.5 + assert snapshot.data["histograms_by_rank"]["1"]["load_duration"] == { + "bucket_counts": [0, 1], + "sum": 75.0, + } + assert snapshot.worker_rank == "0" + + +def test_stats_reduce_skips_ucm_cli_summary(): + stats = UCMConnectorStats( + data={ + "counters_by_rank": {"0": {"load_bytes_total": 10.0}}, + "gauges_by_rank": {"0": {"cache_lookup_hit_rate": 0.5}}, + "histograms_by_rank": { + "0": { + "load_duration": {"bucket_counts": [1, 0], "sum": 50.0}, + "cache_load_duration_ms": { + "bucket_counts": [0, 1], + "sum": 75.0, + }, + } + }, + } + ) + + assert stats.reduce() == {} + + +def test_prom_metrics_register_vllm_connector_prefixed_metrics(): + _reset_fakes() + import ucm.integration.vllm.metrics as metrics_module + + capture_logger = CaptureLogger() + metrics_module.logger = capture_logger + prom = UCMPromMetrics( + _vllm_config(_metrics_config()), + _metric_types(), + ["model_name", "engine"], + {0: ["model-a", "0"]}, + ) + + prom.observe( + { + "counters_by_rank": {"7": {"load_bytes_total": 2048.0}}, + "gauges_by_rank": {"7": {"cache_lookup_hit_rate": 0.75}}, + "histograms_by_rank": { + "7": {"load_duration": {"bucket_counts": [1, 2, 0], "sum": 150.0}} + }, + }, + engine_idx=0, + ) + + assert all(name.startswith("ucm:") for name in FakeCounter.created) + assert all(name.startswith("ucm:") for name in FakeGauge.created) + assert all(name.startswith("ucm:") for name in FakeHistogram.created) + assert capture_logger.infos == [ + "UCM metrics vllm_connector path enabled: " + "total=4, counters=1, gauges=1, histograms=2, " + "labels=['model_name', 'engine', 'worker_rank']" + ] + + counter = FakeCounter.created["ucm:load_bytes_total"] + gauge = FakeGauge.created["ucm:cache_lookup_hit_rate"] + histogram = FakeHistogram.created["ucm:load_duration"] + + assert counter.labelnames == ["model_name", "engine", "worker_rank"] + assert counter.children[("model-a", "0", "7")].increments == [2048.0] + assert gauge.children[("model-a", "0", "7")].set_values == [0.75] + histogram_child = histogram.children[("model-a", "0", "7")] + assert [bucket.value for bucket in histogram_child._buckets] == [1, 2, 0] + assert histogram_child._sum.value == 150.0 + + +def test_prom_metrics_keeps_ucm_duration_observations_in_ms(): + _reset_fakes() + prom = UCMPromMetrics( + _vllm_config(_metrics_config()), + _metric_types(), + ["model_name", "engine"], + {0: ["model-a", "0"]}, + ) + + prom.observe({"histograms_by_rank": {"3": {"load_duration": [50.0]}}}) + + histogram = FakeHistogram.created["ucm:load_duration"] + assert histogram.children[("model-a", "0", "3")].observations == [50.0] + + +def test_ucm_connector_builds_stats_and_respects_vllm_connector_switch(): + data = {"histograms_by_rank": {"3": {"load_duration": [2.0]}}} + + stats = UCMConnector.build_kv_connector_stats(data) + prom = UCMConnector.build_prom_metrics( + _vllm_config(_metrics_config({"multiproc": True, "vllm_connector": True})), + _metric_types(), + ["model_name", "engine"], + {0: ["model-a", "0"]}, + ) + disabled = UCMConnector.build_prom_metrics( + _vllm_config(_metrics_config({"multiproc": True, "vllm_connector": False})), + _metric_types(), + ["model_name", "engine"], + {0: ["model-a", "0"]}, + ) + missing_config = UCMConnector.build_prom_metrics( + _vllm_config(), + _metric_types(), + ["model_name", "engine"], + {0: ["model-a", "0"]}, + ) + + assert isinstance(stats, UCMConnectorStats) + assert stats.data == data + assert isinstance(prom, UCMPromMetrics) + assert disabled is None + assert missing_config is None + + +def test_direct_connector_drains_dispatcher_vllm_connector_snapshot(): + _reset_fakes() + config = _metrics_config() + dispatcher = get_metrics_dispatcher(config) + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 4096.0}, + {"cache_lookup_hit_rate": 0.5}, + {"load_duration": ([0, 1, 0], 75.0)}, + ) + connector = object.__new__(UCMDirectConnector) + connector._vllm_metrics_enabled = True + connector._vllm_metric_definitions = get_vllm_connector_metric_definitions(config) + connector._metrics_dispatcher = dispatcher + connector._worker_rank = 5 + + stats = connector.get_kv_connector_stats() + + assert fake_ucmmetrics.drained == ["all"] + assert stats.data["counters_by_rank"]["5"]["load_bytes_total"] == 4096.0 + assert stats.data["gauges_by_rank"]["5"]["cache_lookup_hit_rate"] == 0.5 + assert stats.data["histograms_by_rank"]["5"]["load_duration"] == { + "bucket_counts": [0, 1, 0], + "sum": 75.0, + } + assert connector.get_kv_connector_stats() is None + + +def test_direct_connector_get_finished_records_async_durations(): + _reset_fakes() + import ucm.integration.vllm.ucm_connector as ucm_connector_module + + class Store: + def __init__(self): + self.waited = [] + + def wait(self, task): + self.waited.append(task) + + class Device: + def __init__(self): + self.destroyed = [] + + def destroy_event_handle(self, event_handle): + self.destroyed.append(event_handle) + + connector = object.__new__(UCMDirectConnector) + connector.store = Store() + connector.enable_event_sync = True + connector.device = Device() + task = object() + pending = PendingDumpTask( + task=task, + request_ids={"req-1"}, + event_handle=7, + wait_for_save_start_ms=900.0, + ) + connector._pending_dump_tasks = [pending] + connector._async_dump_req_ids = {"req-1"} + + times = iter([1.0, 1.025]) + original_perf_counter = ucm_connector_module.time.perf_counter + ucm_connector_module.time.perf_counter = lambda: next(times) + try: + finished, skipped = connector.get_finished({"req-1"}) + finally: + ucm_connector_module.time.perf_counter = original_perf_counter + + assert finished == {"req-1"} + assert skipped is None + assert connector._pending_dump_tasks == [] + assert connector._async_dump_req_ids == set() + assert connector.store.waited == [task] + assert connector.device.destroyed == [7] + assert pending.event_handle == 0 + assert fake_ucmmetrics.updated == [ + { + "save_duration": 125.0, + "save_completion_wait_duration": 25.0, + } + ] + + +def test_direct_connector_poll_records_zero_completion_wait_duration(): + _reset_fakes() + import ucm.integration.vllm.ucm_connector as ucm_connector_module + + class Store: + def __init__(self): + self.checked = [] + self.waited = [] + + def check(self, task): + self.checked.append(task) + return True + + def wait(self, task): + self.waited.append(task) + + connector = object.__new__(UCMDirectConnector) + connector.store = Store() + connector.enable_event_sync = False + connector.device = None + task = object() + connector._pending_dump_tasks = [ + PendingDumpTask( + task=task, + request_ids={"req-1"}, + wait_for_save_start_ms=900.0, + ) + ] + + times = iter([1.0, 1.0]) + original_perf_counter = ucm_connector_module.time.perf_counter + ucm_connector_module.time.perf_counter = lambda: next(times) + try: + connector._poll_pending_dump_tasks() + finally: + ucm_connector_module.time.perf_counter = original_perf_counter + + assert connector._pending_dump_tasks == [] + assert connector.store.checked == [task] + assert connector.store.waited == [task] + assert fake_ucmmetrics.updated == [ + { + "save_duration": 100.0, + "save_completion_wait_duration": 0.0, + } + ] + + +def test_multiproc_logger_uses_prefix_and_dispatcher_snapshot(tmp_path): + _reset_fakes() + import ucm.observability as observability + + config = _metrics_config() + config["multiproc_dir"] = str(tmp_path) + capture_logger = CaptureLogger() + observability._metric_mappings.clear() + observability.load_metrics_config = lambda _: config + observability.logger = capture_logger + observability.threading.Thread = FakeThread + original_sleep = observability.time.sleep + logger = observability.PrometheusStatsLogger("model-a", "worker-0", "unused.yaml") + fake_ucmmetrics.snapshot = ( + {"load_bytes_total": 1024.0}, + {"cache_lookup_hit_rate": 0.25}, + {"load_duration": ([1, 0, 0], 50.0)}, + ) + + observability.time.sleep = lambda _: setattr(logger, "is_running", False) + try: + logger.update_stats_loop() + finally: + observability.time.sleep = original_sleep + + counter = FakeCounter.created["ucm_multiproc:load_bytes_total"] + gauge = FakeGauge.created["ucm_multiproc:cache_lookup_hit_rate"] + histogram = FakeHistogram.created["ucm_multiproc:load_duration"] + + assert logger.thread.started + assert any( + "UCM metrics multiproc path enabled: total=5, counters=1, gauges=1, " + "histograms=3, prefix=ucm_multiproc:, labels=['model_name', 'worker_id']" + in message + for message in capture_logger.infos + ) + assert counter.children[("model-a", "worker-0")].increments == [1024.0] + assert gauge.children[("model-a", "worker-0")].set_values == [0.25] + histogram_child = histogram.children[("model-a", "worker-0")] + assert [bucket.value for bucket in histogram_child._buckets] == [1, 0, 0] + assert histogram_child._sum.value == 50.0 + + +def test_multiproc_logger_respects_consumer_switch(): + _reset_fakes() + import ucm.observability as observability + + observability._metric_mappings.clear() + observability.load_metrics_config = lambda _: _metrics_config( + {"multiproc": False, "vllm_connector": True} + ) + logger = observability.PrometheusStatsLogger("model-a", "worker-0", "unused.yaml") + + assert not hasattr(logger, "thread") + assert FakeCounter.created == {} + + +def test_ucm_connector_get_kv_connector_stats_forwards_to_inner_connector(): + expected = UCMConnectorStats(worker_rank=2) + inner = SimpleNamespace(get_kv_connector_stats=lambda: expected) + connector = object.__new__(UCMConnector) + connector.connector = inner + + assert connector.get_kv_connector_stats() is expected + + +def test_example_metrics_config_defaults_to_vllm_connector_metrics(): + text = (REPO_ROOT / "examples" / "metrics" / "metrics_configs.yaml").read_text( + encoding="utf-8" + ) + assert "connector_task_" not in text + assert 'name: "save_completion_wait_duration"' in text + assert 'name: "cache_d2h_callback_wait_ms"' not in text + assert 'name: "save_speed"' not in text + assert '# multiproc_dir: "/vllm-workspace"' in text + assert '# multiproc_prefix: "ucm_multiproc:"' in text + assert 'vllm_connector_prefix: "ucm:"' in text + assert "# multiproc: true" in text + assert re.search(r"^\s+vllm_connector:\s+true$", text, re.MULTILINE) + assert not re.search(r"^\s+multiproc:\s+true$", text, re.MULTILINE) + + +def test_connector_dashboard_direct_connector_layout_and_metrics(): + dashboard_path = REPO_ROOT / "examples" / "metrics" / "grafana_connector.json" + text = dashboard_path.read_text(encoding="utf-8") + dashboard = json.loads(text) + panels = dashboard["panels"] + titles = [panel.get("title", "") for panel in panels] + + assert "_seconds" not in text + assert "1000 *" not in text + assert "save_speed" not in text + assert panels[0]["title"] == "Connector Prefix Cache Hit Rate" + assert panels[0]["gridPos"] == {"h": 8, "w": 24, "x": 0, "y": 0} + assert "Direct Connector" in titles + assert not any("Requests Rate" in title for title in titles) + assert not any("Size Distribution" in title for title in titles) + + expected_direct_titles = [ + "Direct Connector", + "Connector Load Bandwidth (aggregated)", + "Connector Dump Bandwidth (aggregated)", + "Connector Load Duration", + "Connector Dump Duration", + "Connector Load Speed (per task)", + "Connector Dump Completion Wait Duration", + ] + direct_start = titles.index("Direct Connector") + assert titles[direct_start:] == expected_direct_titles + + by_title = {panel["title"]: panel for panel in panels} + assert by_title["Direct Connector"]["type"] == "row" + assert by_title["Direct Connector"]["gridPos"] == { + "h": 1, + "w": 24, + "x": 0, + "y": 8, + } + + occupied = set() + for panel in panels: + grid = panel["gridPos"] + for x in range(grid["x"], grid["x"] + grid["w"]): + for y in range(grid["y"], grid["y"] + grid["h"]): + cell = (x, y) + assert cell not in occupied + occupied.add(cell) + + +def test_ucm_dashboards_reference_configured_vllm_connector_metrics(): + metrics_text = ( + REPO_ROOT / "examples" / "metrics" / "metrics_configs.yaml" + ).read_text(encoding="utf-8") + configured_names = set( + re.findall(r'^\s*-\s+name:\s+"([^"]+)"', metrics_text, re.MULTILINE) + ) + expected_vllm_names = set() + for name in configured_names: + if name == "interval_lookup_hit_rates": + continue + expected_vllm_names.add(f"ucm:{name}") + + for filename in GRAFANA_UCM_DASHBOARDS: + dashboard_text = (REPO_ROOT / "examples" / "metrics" / filename).read_text( + encoding="utf-8" + ) + assert "vllm:ucm_" not in dashboard_text + referenced = set(re.findall(r"ucm:[A-Za-z0-9_]+", dashboard_text)) + referenced = { + re.sub(r"_(bucket|sum|count)$", "", metric) for metric in referenced + } + + assert referenced <= expected_vllm_names + assert not any( + metric.startswith("ucm:connector_task_") for metric in referenced + ) + + +def test_vllm_dashboard_uses_combined_prefix_cache_hit_rate_breakdown(): + dashboard = json.loads( + (REPO_ROOT / "examples" / "metrics" / "grafana_vllm.json").read_text( + encoding="utf-8" + ) + ) + panels = {panel["title"]: panel for panel in dashboard["panels"]} + + assert "GPU Prefix Cache Hit Rate" not in panels + assert "External Connector Hit Rate" not in panels + + panel = panels["KV Cache Hit Rate Breakdown"] + assert panel["fieldConfig"]["defaults"]["unit"] == "percentunit" + assert panel["fieldConfig"]["defaults"]["min"] == 0 + assert panel["fieldConfig"]["defaults"]["max"] == 1 + assert panel["gridPos"] == {"h": 8, "w": 24, "x": 0, "y": 24} + assert len(panel["targets"]) == 2 + + expected = [ + ( + "GPU Prefix Cache", + "vllm:prefix_cache_hits_total", + "vllm:prefix_cache_queries_total", + ), + ( + "Connector Prefix Cache", + "vllm:external_prefix_cache_hits_total", + "vllm:external_prefix_cache_queries_total", + ), + ] + for target, (legend, hits, queries) in zip(panel["targets"], expected): + assert target["legendFormat"] == legend + expr = target["expr"] + assert hits in expr + assert queries in expr + assert 'model_name="$model_name"' in expr + assert 'job=~"$job"' in expr + assert 'instance="$instance"' in expr + assert "clamp_min" in expr + + +def test_grafana_dashboards_use_isolated_vllm_ucm_identity(): + expected = { + "grafana_connector.json": ( + "vLLM - UCM Connector (vLLM Metrics)", + "ucm-vllm-connector-overview", + ), + "grafana_layerwise.json": ( + "vLLM - UCM Layerwise (vLLM Metrics)", + "ucm-vllm-layerwise", + ), + "grafana_pipeline_store.json": ( + "vLLM - UCM Cache / Posix Store (vLLM Metrics)", + "ucm-vllm-pipeline-store", + ), + "grafana_vllm.json": ( + "vLLM (UCM Metrics)", + "ucm-vllm-overview", + ), + } + old_uids = { + "ucm-connector-overview", + "ucm-layerwise", + "ucm-pipeline-store", + "vllm-overview", + } + + for filename, (title, uid) in expected.items(): + dashboard = json.loads( + (REPO_ROOT / "examples" / "metrics" / filename).read_text(encoding="utf-8") + ) + + assert dashboard["title"] == title + assert dashboard["uid"] == uid + assert dashboard["id"] is None + assert dashboard["uid"] not in old_uids + assert GRAFANA_VLLM_UCM_TAG in dashboard["tags"] + assert "ucm" not in dashboard["tags"] + assert "(tag: ucm)" not in dashboard.get("description", "") + if "other UCM dashboards" in dashboard.get("description", ""): + assert f"(tag: {GRAFANA_VLLM_UCM_TAG})" in dashboard["description"] + for link in dashboard.get("links", []): + if link.get("type") == "dashboards": + assert link["title"] == "Other UCM vLLM metrics dashboards" + assert link["tags"] == [GRAFANA_VLLM_UCM_TAG] + + +def test_ucm_dashboards_use_engine_and_worker_rank_filters(): + for filename in GRAFANA_UCM_DASHBOARDS: + dashboard_text = (REPO_ROOT / "examples" / "metrics" / filename).read_text( + encoding="utf-8" + ) + dashboard = json.loads(dashboard_text) + variables = {item["name"]: item for item in dashboard["templating"]["list"]} + variable_names = [item["name"] for item in dashboard["templating"]["list"]] + + assert variable_names.index("perWorker") < variable_names.index("engine") + assert variables["perWorker"]["current"]["text"] == "Aggregated" + assert variables["perWorker"]["current"]["value"] == "model_name" + assert variables["perWorker"]["options"] == [ + { + "selected": True, + "text": "Aggregated", + "value": "model_name", + }, + { + "selected": False, + "text": "Per Worker", + "value": "model_name, engine, worker_rank", + }, + ] + assert "${perWorker:raw}" in dashboard_text + assert variables["engine"]["includeAll"] is True + assert variables["engine"]["allValue"] == ".*" + assert "label_values" in variables["engine"]["definition"] + assert "engine" in variables["engine"]["definition"] + assert variables["worker_rank"]["includeAll"] is True + assert variables["worker_rank"]["allValue"] == ".*" + assert 'engine=~"$engine"' in variables["worker_rank"]["definition"] + + exprs = [ + target["expr"] + for panel in dashboard["panels"] + for target in panel.get("targets", []) + if "expr" in target + ] + legends = [ + target["legendFormat"] + for panel in dashboard["panels"] + for target in panel.get("targets", []) + if "legendFormat" in target and "ucm:" in target.get("expr", "") + ] + assert legends + for legend in legends: + if legend == "{{le}}": + continue + assert "engine={{engine}}" not in legend + assert "worker={{worker_rank}}" not in legend + assert "Aggregated" not in legend + assert "${perWorker:text}" not in legend + assert "{{engine}}" in legend + assert "{{worker_rank}}" in legend + + for expr in exprs: + if "ucm:" not in expr: + continue + assert 'engine=~"$engine"' in expr + assert 'worker_rank=~"$worker_rank"' in expr + if "sum by (" in expr: + assert ( + "sum by (${perWorker:raw})" in expr + or "sum by (le, ${perWorker:raw})" in expr + or "sum by (le)" in expr + ) + + if filename == "grafana_connector.json": + external_prefix_exprs = [ + expr for expr in exprs if "vllm:external_prefix_cache_" in expr + ] + assert external_prefix_exprs + for expr in external_prefix_exprs: + assert 'engine=~"$engine"' in expr + assert 'worker_rank=~"$worker_rank"' not in expr + assert "sum by (model_name)" in expr + + +def test_layerwise_dashboard_hides_no_transfer_and_uses_rate_interval_for_breakdown(): + dashboard = json.loads( + (REPO_ROOT / "examples" / "metrics" / "grafana_layerwise.json").read_text( + encoding="utf-8" + ) + ) + panels = {panel["title"]: panel for panel in dashboard["panels"]} + all_panels = [] + for panel in dashboard["panels"]: + all_panels.append(panel) + all_panels.extend(panel.get("panels", [])) + + assert "Layerwise / Batch Total - No Transfer" not in panels + assert all("No Transfer" not in panel["title"] for panel in all_panels) + + batch_mix = panels["Layerwise / Batch Mix Rate"] + assert all( + "layerwise_batch_total_no_transfer_ms" not in target["expr"] + for target in batch_mix["targets"] + ) + assert all( + "no transfer" not in target.get("legendFormat", "") + for target in batch_mix["targets"] + ) + + load_only = panels["Layerwise / Load Only Batch Avg Breakdown"] + save_only = panels["Layerwise / Save Only Batch Avg Breakdown"] + load_save = panels["Layerwise / Load + Save Batch Avg Breakdown"] + for panel in (load_only, save_only, load_save): + for target in panel["targets"]: + assert "[40s]" not in target["expr"] + assert "[$__rate_interval]" in target["expr"] + + +def test_layerwise_dashboard_uses_batch_duration_and_dump_completion_wait(): + dashboard = json.loads( + (REPO_ROOT / "examples" / "metrics" / "grafana_layerwise.json").read_text( + encoding="utf-8" + ) + ) + all_panels = [] + for panel in dashboard["panels"]: + all_panels.append(panel) + all_panels.extend(panel.get("panels", [])) + panels = {panel["title"]: panel for panel in all_panels} + + assert "Layerwise / Batch Total Duration (all batches)" not in panels + assert "Layerwise / wait_for_save Total (save batches only)" not in panels + assert all("Batch Total" not in panel["title"] for panel in all_panels) + + assert "Layerwise / Batch Duration - Load Only" in panels + assert "Layerwise / Batch Duration - Save Only" in panels + assert "Layerwise / Batch Duration - Load + Save" in panels + + panel = panels["Layerwise / Dump Completion Wait Duration"] + assert panel["gridPos"] == {"h": 8, "w": 12, "x": 0, "y": 64} + assert all( + "ucm:save_completion_wait_duration" in target["expr"] + for target in panel["targets"] + ) + assert all( + "ucm:layerwise_save_tail_total_ms" not in target["expr"] + for target in panel["targets"] + ) + + +def test_layerwise_wait_for_save_records_save_tail_and_completion_start(): + source = ( + REPO_ROOT / "ucm" / "integration" / "vllm" / "ucm_connector.py" + ).read_text(encoding="utf-8") + + assert "save_tail_start = time.perf_counter()" in source + assert "wait_for_save_start_ms = save_tail_start * 1000" in source + assert "pending_dump_task.wait_for_save_start_ms = wait_for_save_start_ms" in source + assert "save_tail_ms = (total_end - save_tail_start) * 1000" in source + assert "self._layerwise_batch_stats(total_end, save_tail_ms)" in source + assert 'stats["layerwise_save_tail_total_ms"] = save_tail_ms' in source + + +def test_cache_load_h2d_duration_uses_first_backend_ready_time(): + header = (REPO_ROOT / "ucm" / "store" / "cache" / "cc" / "load_queue.h").read_text( + encoding="utf-8" + ) + source = (REPO_ROOT / "ucm" / "store" / "cache" / "cc" / "load_queue.cc").read_text( + encoding="utf-8" + ) + + assert "double h2dBatchStartTp_{0.0};" in header + assert "firstH2dReadyTp" not in header + assert "firstH2dReadyTp" not in source + assert "double h2dBatchStartTp = 0.0;" not in source + assert "double& h2dBatchStartTp" not in header + assert "double& h2dBatchStartTp" not in source + assert "if (holder_.empty()) { h2dBatchStartTp_ = tpBackendReady; }" in source + assert "auto h2dSyncMs = (NowTime::Now() - h2dBatchStartTp_) * 1e3;" in source + assert "h2dBatchStartTp_ = 0.0;" in source + assert "auto h2dSyncMs = (NowTime::Now() - tpH2dSubmitted) * 1e3;" not in source + + +def test_cache_dump_d2h_metrics_require_event_ready_timestamp(): + source = (REPO_ROOT / "ucm" / "store" / "cache" / "cc" / "dump_queue.cc").read_text( + encoding="utf-8" + ) + + stream_header = (REPO_ROOT / "ucm" / "shared" / "trans" / "stream.h").read_text( + encoding="utf-8" + ) + cuda_header = ( + REPO_ROOT / "ucm" / "shared" / "trans" / "cuda" / "cuda_stream.h" + ).read_text(encoding="utf-8") + cuda_source = ( + REPO_ROOT / "ucm" / "shared" / "trans" / "cuda" / "cuda_stream.cc" + ).read_text(encoding="utf-8") + ascend_header = ( + REPO_ROOT / "ucm" / "shared" / "trans" / "ascend" / "ascend_stream.h" + ).read_text(encoding="utf-8") + ascend_source = ( + REPO_ROOT / "ucm" / "shared" / "trans" / "ascend" / "ascend_stream.cc" + ).read_text(encoding="utf-8") + + assert "struct StreamEventTimer" not in stream_header + assert "RecordEventTimerStart" not in stream_header + assert "RecordEventTimerEnd" not in stream_header + assert "EventElapsedTimeMs" not in stream_header + assert "DestroyEventTimer" not in stream_header + + assert "RecordEventTimerStart" not in cuda_header + assert "RecordEventTimerEnd" not in cuda_header + assert "EventElapsedTimeMs" not in cuda_header + assert "DestroyEventTimer" not in cuda_header + assert "CudaEventTimer" not in cuda_source + assert "cudaEventCreate(&eventTimer->start)" not in cuda_source + assert "cudaEventRecord(eventTimer->start, stream_)" not in cuda_source + assert "cudaEventRecord(eventTimer->end, stream_)" not in cuda_source + assert "cudaEventSynchronize(eventTimer->end)" not in cuda_source + assert ( + "cudaEventElapsedTime(&elapsedMs, eventTimer->start, eventTimer->end)" + not in cuda_source + ) + + assert "RecordEventTimerStart" not in ascend_header + assert "RecordEventTimerEnd" not in ascend_header + assert "EventElapsedTimeMs" not in ascend_header + assert "DestroyEventTimer" not in ascend_header + assert "AscendEventTimer" not in ascend_source + assert "aclrtCreateEvent(&eventTimer->start)" not in ascend_source + assert "aclrtRecordEvent(eventTimer->start, stream_)" not in ascend_source + assert "aclrtRecordEvent(eventTimer->end, stream_)" not in ascend_source + assert "aclrtSynchronizeEvent(eventTimer->end)" not in ascend_source + assert ( + "aclrtEventElapsedTime(&elapsedMs, eventTimer->start, eventTimer->end)" + not in ascend_source + ) + + assert "eventReadyTp->store(NowTime::Now(), std::memory_order_release);" in source + assert "auto ready = eventReadyTp->load(std::memory_order_acquire);" in source + assert "auto tpSyncStart = NowTime::Now();" in source + assert "auto tpSyncStream = NowTime::Now();" in source + assert "auto d2hMs = std::max(0.0, tpSyncStream - tpSyncStart) * 1e3;" in source + assert "auto tpBackendSubmitStart = NowTime::Now();" in source + assert "(tpEnd - tpBackendSubmitStart) * 1e3" in source + assert 'NAME_TO_METRIC_ID("cache_d2h_callback_wait_ms")' not in source + assert "auto dumpStartTp = NowTime::Now();" in source + assert "std::shared_ptr> d2hEndTp" not in source + assert "auto endCbStatus = stream.AppendCallback" not in source + assert "d2hEndTp" not in source + assert "auto tp = NowTime::Now();" not in source + assert "auto d2hStartTp = tpMakeBuffer;" not in source + assert "d2hStartTp = std::max(d2hStartTp, ready);" not in source + assert "tpSyncStream - ready" not in source + assert "end - ready" not in source + assert "Trans::StreamEventTimer d2hEventTimer" not in source + assert "RecordEventTimerStart" not in source + assert "RecordEventTimerEnd" not in source + assert "EventElapsedTimeMs" not in source + assert "DestroyEventTimer" not in source + assert "auto copyStream = stream.NextStream();" not in source + assert "DeviceToHostGatherAsync(stream.NextStream()" in source + assert "if (eventReadyTp && copiedShards > 0)" not in source + + d2h_duration_pos = source.index('NAME_TO_METRIC_ID("cache_d2h_duration_ms")') + d2h_bandwidth_pos = source.index('NAME_TO_METRIC_ID("cache_d2h_bandwidth_gbps")') + sync_start_pos = source.index("auto tpSyncStart = NowTime::Now()") + sync_pos = source.index("auto s = stream.Synchronize()") + sync_end_pos = source.index("auto tpSyncStream = NowTime::Now()") + d2h_ms_pos = source.index( + "auto d2hMs = std::max(0.0, tpSyncStream - tpSyncStart) * 1e3;" + ) + backend_submit_start_pos = source.index( + "auto tpBackendSubmitStart = NowTime::Now()" + ) + prereq_block_pos = source.index("if (eventReadyTp)") + ready_load_pos = source.index("auto ready = eventReadyTp->load", prereq_block_pos) + backend_submit_pos = source.index( + 'NAME_TO_METRIC_ID("cache_dump_backend_submit_duration_ms")' + ) + + assert sync_start_pos < sync_pos < sync_end_pos < backend_submit_start_pos + assert sync_end_pos < d2h_ms_pos < d2h_duration_pos < backend_submit_pos + assert prereq_block_pos < ready_load_pos < backend_submit_pos + assert d2h_ms_pos < d2h_bandwidth_pos < backend_submit_pos + + +def test_pipeline_dashboard_orders_cache_bandwidth_rows(): + dashboard = json.loads( + (REPO_ROOT / "examples" / "metrics" / "grafana_pipeline_store.json").read_text( + encoding="utf-8" + ) + ) + panels = {panel["title"]: panel for panel in dashboard["panels"]} + + assert panels["Cache Load Bandwidth (aggregated)"]["gridPos"]["y"] == 33 + assert panels["Cache Dump Bandwidth (aggregated)"]["gridPos"]["y"] == 33 + assert panels["Cache Load Bandwidth (per task)"]["gridPos"]["y"] == 41 + assert panels["Cache Dump Bandwidth (per task)"]["gridPos"]["y"] == 41 + assert panels["Cache Load H2D Bandwidth (per task)"]["gridPos"]["y"] == 49 + assert panels["Cache Dump D2H Bandwidth (per task)"]["gridPos"]["y"] == 49 + assert panels["Cache Load Backend Wait Duration"]["gridPos"] == { + "h": 8, + "w": 12, + "x": 0, + "y": 65, + } + assert panels["Cache Dump Wait Compute Duration"]["gridPos"] == { + "h": 8, + "w": 12, + "x": 12, + "y": 65, + } + assert panels["Cache Load H2D Duration"]["gridPos"]["y"] == 73 + assert "Cache Dump D2H Duration (include wait compute)" in panels + assert "Cache Dump D2H Duration" not in panels diff --git a/ucm/integration/vllm/metrics.py b/ucm/integration/vllm/metrics.py new file mode 100644 index 000000000..50e1101f9 --- /dev/null +++ b/ucm/integration/vllm/metrics.py @@ -0,0 +1,414 @@ +import copy +import math +from dataclasses import dataclass, field +from typing import Any + +from vllm.config import VllmConfig + +from ucm.logger import init_logger +from ucm.metrics_config import ( + MetricDefinition, + get_vllm_connector_metric_definitions, + load_launch_metrics_config, +) + +try: + from vllm.distributed.kv_transfer.kv_connector.v1.metrics import ( + KVConnectorPromMetrics, + KVConnectorStats, + PromMetric, + PromMetricT, + ) + + UCM_HAS_PROM_METRICS = True +except ImportError: + try: + from vllm.distributed.kv_transfer.kv_connector.v1.metrics import ( + KVConnectorStats, + ) + except ImportError: + + @dataclass + class KVConnectorStats: + data: dict[str, Any] = field(default_factory=dict) + + def is_empty(self) -> bool: + return not self.data + + KVConnectorPromMetrics = object + PromMetric = Any + PromMetricT = Any + UCM_HAS_PROM_METRICS = False + +logger = init_logger(__name__) + + +@dataclass +class UCMConnectorStats(KVConnectorStats): + worker_rank: int | str | None = None + + def __post_init__(self): + if self.worker_rank is not None: + self.worker_rank = str(self.worker_rank) + if not self.data: + self.reset() + + @classmethod + def from_ucm_snapshot( + cls, + counter_stats: dict[str, int | float], + gauge_stats: dict[str, int | float], + histogram_stats: dict[str, Any], + worker_rank: int | str | None, + metric_definitions: list[MetricDefinition], + ) -> "UCMConnectorStats": + stats = cls(worker_rank=worker_rank) + rank = stats._rank_key() + definitions_by_name = { + definition.name: definition + for definition in metric_definitions + if definition.vllm_connector_enabled + } + + for metric_name, value in counter_stats.items(): + definition = definitions_by_name.get(metric_name) + if definition is None or definition.metric_type != "counter": + continue + value_float = stats._finite(value) + if value_float is None: + continue + stats.data["counters_by_rank"].setdefault(rank, {})[ + metric_name + ] = value_float + + for metric_name, value in gauge_stats.items(): + definition = definitions_by_name.get(metric_name) + if definition is None or definition.metric_type != "gauge": + continue + value_float = stats._finite(value) + if value_float is None: + continue + stats.data["gauges_by_rank"].setdefault(rank, {})[metric_name] = value_float + + for metric_name, value in histogram_stats.items(): + definition = definitions_by_name.get(metric_name) + if definition is None or definition.metric_type != "histogram": + continue + histogram = _histogram_snapshot(value) + if histogram is None: + continue + stats.data["histograms_by_rank"].setdefault(rank, {})[ + metric_name + ] = histogram + + return stats + + def reset(self): + self.data: dict[str, Any] = { + "counters_by_rank": {}, + "gauges_by_rank": {}, + "histograms_by_rank": {}, + } + + def record( + self, + observations: dict[str, int | float], + metric_types: dict[str, str] | None = None, + worker_rank: int | str | None = None, + ) -> None: + rank = self._rank_key(worker_rank) + metric_types = metric_types or {} + for metric_name, value in observations.items(): + value_float = self._finite(value) + if value_float is None: + continue + metric_type = metric_types.get(metric_name, "histogram") + if metric_type == "counter": + self.data["counters_by_rank"].setdefault(rank, {})[ + metric_name + ] = value_float + elif metric_type == "gauge": + self.data["gauges_by_rank"].setdefault(rank, {})[ + metric_name + ] = value_float + else: + rank_data = self.data["histograms_by_rank"].setdefault(rank, {}) + rank_data.setdefault(metric_name, []).append(value_float) + + def clone_and_reset(self) -> "UCMConnectorStats": + snapshot = copy.deepcopy(self.data) + self.reset() + return UCMConnectorStats(data=snapshot, worker_rank=self.worker_rank) + + def aggregate(self, other: KVConnectorStats) -> KVConnectorStats: + if other.is_empty(): + return self + + for section in ("counters_by_rank", "gauges_by_rank"): + target = self.data.setdefault(section, {}) + for rank, rank_data in other.data.get(section, {}).items(): + target.setdefault(str(rank), {}).update(rank_data) + + histograms_by_rank = self.data.setdefault("histograms_by_rank", {}) + other_histograms = other.data.get("histograms_by_rank", {}) + for rank, rank_data in other_histograms.items(): + accumulator = histograms_by_rank.setdefault(str(rank), {}) + for metric_name, value in rank_data.items(): + _merge_histogram_value(accumulator, metric_name, value) + return self + + def reduce(self) -> dict[str, int | float]: + return {} + + def is_empty(self) -> bool: + for section in ( + "counters_by_rank", + "gauges_by_rank", + "histograms_by_rank", + ): + for rank_data in self.data.get(section, {}).values(): + if rank_data: + return False + return True + + def _rank_key(self, worker_rank: int | str | None = None) -> str: + if worker_rank is not None: + return str(worker_rank) + if self.worker_rank is not None: + return str(self.worker_rank) + return "unknown" + + def _finite(self, value: int | float) -> float | None: + value_float = float(value) + if not math.isfinite(value_float): + return None + return value_float + + +if UCM_HAS_PROM_METRICS: + + class UCMPromMetrics(KVConnectorPromMetrics): + def __init__( + self, + vllm_config: VllmConfig, + metric_types: dict[type[PromMetric], type[PromMetricT]], + labelnames: list[str], + per_engine_labelvalues: dict[int, list[object]], + ): + super().__init__( + vllm_config, metric_types, labelnames, per_engine_labelvalues + ) + config = _metrics_config_from_vllm_config(vllm_config) + definitions = get_vllm_connector_metric_definitions(config) + self._definitions = { + definition.name: definition for definition in definitions + } + self._metrics_by_name: dict[str, PromMetricT] = {} + self._labeled_metrics: dict[tuple[int, str, str], PromMetricT] = {} + counts = {"counter": 0, "gauge": 0, "histogram": 0} + + for definition in definitions: + self._metrics_by_name[definition.name] = self._create_metric( + definition, + labelnames + ["worker_rank"], + ) + counts[definition.metric_type] += 1 + logger.info( + f"UCM metrics vllm_connector path enabled: " + f"total={len(definitions)}, counters={counts['counter']}, " + f"gauges={counts['gauge']}, histograms={counts['histogram']}, " + f"labels={labelnames + ['worker_rank']}" + ) + + def observe(self, transfer_stats_data: dict[str, Any], engine_idx: int = 0): + self._observe_counters(transfer_stats_data, engine_idx) + self._observe_gauges(transfer_stats_data, engine_idx) + self._observe_histograms(transfer_stats_data, engine_idx) + + def _create_metric( + self, + definition: MetricDefinition, + labelnames: list[str], + ) -> PromMetricT: + metric_kwargs: dict[str, Any] = { + "name": definition.vllm_connector_name, + "documentation": definition.documentation, + "labelnames": labelnames, + } + if definition.metric_type == "histogram": + metric_kwargs["buckets"] = list(definition.vllm_connector_buckets) + return self._histogram_cls(**metric_kwargs) + if definition.metric_type == "counter": + return self._counter_cls(**metric_kwargs) + return self._gauge_cls(**metric_kwargs) + + def _observe_counters( + self, transfer_stats_data: dict[str, Any], engine_idx: int + ) -> None: + for worker_rank, rank_data in transfer_stats_data.get( + "counters_by_rank", {} + ).items(): + for metric_name, value in rank_data.items(): + definition = self._definition(metric_name, "counter") + if definition is None: + continue + value_float = _finite(value) + if value_float is None or value_float < 0: + continue + self._metric(engine_idx, worker_rank, metric_name).inc(value_float) + + def _observe_gauges( + self, transfer_stats_data: dict[str, Any], engine_idx: int + ) -> None: + for worker_rank, rank_data in transfer_stats_data.get( + "gauges_by_rank", {} + ).items(): + for metric_name, value in rank_data.items(): + definition = self._definition(metric_name, "gauge") + if definition is None: + continue + value_float = _finite(value) + if value_float is None: + continue + self._metric(engine_idx, worker_rank, metric_name).set(value_float) + + def _observe_histograms( + self, transfer_stats_data: dict[str, Any], engine_idx: int + ) -> None: + for worker_rank, rank_data in transfer_stats_data.get( + "histograms_by_rank", {} + ).items(): + for metric_name, value in rank_data.items(): + definition = self._definition(metric_name, "histogram") + if definition is None: + continue + histogram = self._metric(engine_idx, worker_rank, metric_name) + if isinstance(value, list): + for observation in value: + value_float = _finite(observation) + if value_float is not None: + histogram.observe( + value_float * definition.vllm_connector_value_scale + ) + elif isinstance(value, dict): + self._update_histogram_snapshot(histogram, value, definition) + + def _update_histogram_snapshot( + self, + histogram: PromMetricT, + value: dict[str, Any], + definition: MetricDefinition, + ) -> None: + bucket_counts = list(value.get("bucket_counts", [])) + sum_delta = ( + float(value.get("sum", 0.0)) * definition.vllm_connector_value_scale + ) + buckets = getattr(histogram, "_buckets", None) + metric_sum = getattr(histogram, "_sum", None) + if ( + buckets is None + or metric_sum is None + or len(bucket_counts) != len(buckets) + ): + return + for bucket, count in zip(buckets, bucket_counts): + if count: + bucket.inc(count) + if sum_delta: + metric_sum.inc(sum_delta) + + def _definition( + self, metric_name: str, metric_type: str + ) -> MetricDefinition | None: + definition = self._definitions.get(metric_name) + if definition is None or definition.metric_type != metric_type: + return None + if metric_name not in self._metrics_by_name: + return None + return definition + + def _metric( + self, engine_idx: int, worker_rank: int | str, metric_name: str + ) -> PromMetricT: + worker_rank = str(worker_rank) + key = (engine_idx, worker_rank, metric_name) + if key not in self._labeled_metrics: + self._labeled_metrics[key] = self._metrics_by_name[metric_name].labels( + *(self._engine_labelvalues[engine_idx] + [worker_rank]) + ) + return self._labeled_metrics[key] + + @property + def _engine_labelvalues(self) -> dict[int, list[object]]: + if hasattr(self, "per_engine_labelvalues"): + return self.per_engine_labelvalues + return self._per_engine_labelvalues + +else: + + class UCMPromMetrics: + def __init__(self, *args: Any, **kwargs: Any): + raise RuntimeError("vLLM connector Prometheus metrics are unavailable.") + + +def _metrics_config_from_vllm_config(vllm_config: VllmConfig) -> dict[str, Any]: + kv_transfer_config = getattr(vllm_config, "kv_transfer_config", None) + launch_config = getattr(kv_transfer_config, "launch_config", None) + if launch_config is not None: + return load_launch_metrics_config(launch_config) + try: + from ucm.utils import Config + + return load_launch_metrics_config(Config(kv_transfer_config).get_config()) + except Exception: + return {} + + +def _finite(value: Any) -> float | None: + value_float = float(value) + if not math.isfinite(value_float): + return None + return value_float + + +def _histogram_snapshot(value: Any) -> dict[str, Any] | None: + if isinstance(value, dict): + bucket_counts = value.get("bucket_counts", []) + sum_delta = value.get("sum", 0.0) + elif isinstance(value, (tuple, list)) and len(value) == 2: + bucket_counts, sum_delta = value + else: + bucket_counts = getattr(value, "bucketCounts", None) + sum_delta = getattr(value, "sum", None) + if bucket_counts is None or sum_delta is None: + return None + return { + "bucket_counts": [int(count) for count in bucket_counts], + "sum": float(sum_delta), + } + + +def _merge_histogram_value( + accumulator: dict[str, Any], metric_name: str, value: Any +) -> None: + if isinstance(value, list): + accumulator.setdefault(metric_name, []).extend(value) + return + if not isinstance(value, dict): + return + target = accumulator.setdefault( + metric_name, + {"bucket_counts": [0] * len(value.get("bucket_counts", [])), "sum": 0.0}, + ) + if isinstance(target, list): + target.append(float(value.get("sum", 0.0))) + return + target_counts = target.setdefault( + "bucket_counts", [0] * len(value.get("bucket_counts", [])) + ) + value_counts = list(value.get("bucket_counts", [])) + if len(target_counts) < len(value_counts): + target_counts.extend([0] * (len(value_counts) - len(target_counts))) + for index, count in enumerate(value_counts): + target_counts[index] += int(count) + target["sum"] = float(target.get("sum", 0.0)) + float(value.get("sum", 0.0)) diff --git a/ucm/integration/vllm/ucm_connector.py b/ucm/integration/vllm/ucm_connector.py index 207a66f62..7095ae411 100644 --- a/ucm/integration/vllm/ucm_connector.py +++ b/ucm/integration/vllm/ucm_connector.py @@ -40,7 +40,21 @@ from vllm.v1.outputs import KVConnectorOutput from ucm.integration.vllm.device import create_device +from ucm.integration.vllm.metrics import ( + UCM_HAS_PROM_METRICS, + UCMConnectorStats, + UCMPromMetrics, +) from ucm.logger import init_logger +from ucm.metrics_config import ( + MULTIPROC_CONSUMER, + VLLM_CONNECTOR_CONSUMER, + consumer_enabled, + get_vllm_connector_metric_definitions, + load_launch_metrics_config, + setup_ucm_metrics, +) +from ucm.metrics_dispatcher import get_metrics_dispatcher from ucm.observability import PrometheusStatsLogger from ucm.shared.metrics import ucmmetrics from ucm.store.factory_v1 import UcmConnectorFactoryV1 @@ -49,6 +63,12 @@ if TYPE_CHECKING: from vllm.attention.backends.abstract import AttentionMetadata + from vllm.distributed.kv_transfer.kv_connector.v1.metrics import ( + KVConnectorPromMetrics, + KVConnectorStats, + PromMetric, + PromMetricT, + ) from vllm.forward_context import ForwardContext from vllm.v1.core.kv_cache_manager import KVCacheBlocks from vllm.v1.kv_cache_interface import KVCacheConfig @@ -271,6 +291,7 @@ class PendingDumpTask: task: Task request_ids: set[str] event_handle: int = 0 + wait_for_save_start_ms: float = 0.0 class RequestHasher: @@ -332,6 +353,12 @@ def __init__( self.local_rank = ( -1 if role == KVConnectorRole.SCHEDULER else get_world_group().local_rank ) + self._worker_rank = ( + get_world_group().rank if role == KVConnectorRole.WORKER else None + ) + self._vllm_metrics_enabled = False + self._vllm_metric_definitions = [] + self._metrics_dispatcher = None self.tp_rank = self._vllm_config.parallel_config.rank self.block_size = self._vllm_config.cache_config.block_size self.is_mla = self._vllm_config.model_config.is_deepseek_mla @@ -397,8 +424,19 @@ def __init__( ) self._connector_worker_meta = UCMWorkerMetadata() - metrics_config = self.launch_config.get("metrics_config_path", "") - if metrics_config: + metrics_config_path = self.launch_config.get("metrics_config_path", "") + self.metrics_config = load_launch_metrics_config(self.launch_config) + if self.metrics_config and ( + consumer_enabled(self.metrics_config, MULTIPROC_CONSUMER) + or consumer_enabled(self.metrics_config, VLLM_CONNECTOR_CONSUMER) + ): + setup_ucm_metrics(self.metrics_config) + self._metrics_dispatcher = get_metrics_dispatcher(self.metrics_config) + if ( + metrics_config_path + and self.metrics_config + and consumer_enabled(self.metrics_config, MULTIPROC_CONSUMER) + ): worker_id = ( f"{self.engine_id}_{get_world_group().rank}" if role == KVConnectorRole.WORKER @@ -407,11 +445,20 @@ def __init__( self.stats_logger = PrometheusStatsLogger( vllm_config.model_config.served_model_name, worker_id, - metrics_config, + metrics_config_path, ) logger.info( - f"metrics_config_path: {metrics_config}, set worker_id: {worker_id}" + f"metrics_config_path: {metrics_config_path}, set worker_id: {worker_id}" + ) + if ( + role == KVConnectorRole.WORKER + and self.metrics_config + and consumer_enabled(self.metrics_config, VLLM_CONNECTOR_CONSUMER) + ): + self._vllm_metric_definitions = get_vllm_connector_metric_definitions( + self.metrics_config ) + self._vllm_metrics_enabled = bool(self._vllm_metric_definitions) self.persist_token_threshold = self.launch_config.get( "persist_token_threshold", 0 @@ -858,20 +905,18 @@ def start_load_kv(self, forward_context: "ForwardContext", **kwargs) -> None: num_loaded_block -= request_to_load_blocks.get(request_id, 0) load_end_time = time.perf_counter() * 1000 + load_duration_ms = load_end_time - load_start_time load_bytes = num_loaded_block * self.block_data_size - load_speed = ( - load_bytes / (load_end_time - load_start_time) / 1024 / 1024 - ) # GB/s + load_speed = load_bytes / load_duration_ms / 1024 / 1024 # GB/s if is_load: - ucmmetrics.update_stats( - { - "load_requests_num": num_loaded_request, - "load_blocks_num": num_loaded_block, - "load_duration": load_end_time - load_start_time, - "load_speed": load_speed, - "load_bytes_total": load_bytes, - } - ) + load_stats = { + "load_requests_num": num_loaded_request, + "load_blocks_num": num_loaded_block, + "load_duration": load_duration_ms, + "load_speed": load_speed, + "load_bytes_total": load_bytes, + } + ucmmetrics.update_stats(load_stats) def wait_for_layer_load(self, layer_name: str) -> None: pass @@ -896,8 +941,23 @@ def save_kv_layer( pass def _wait_pending_dump_task(self, pending_dump_task: PendingDumpTask) -> None: + wait_start_ms = time.perf_counter() * 1000 try: self.store.wait(pending_dump_task.task) + except Exception: + raise + else: + wait_end_ms = time.perf_counter() * 1000 + stats = {} + if pending_dump_task.wait_for_save_start_ms > 0: + stats["save_duration"] = self._non_negative_ms( + wait_end_ms - pending_dump_task.wait_for_save_start_ms + ) + stats["save_completion_wait_duration"] = self._non_negative_ms( + wait_end_ms - wait_start_ms + ) + if stats: + ucmmetrics.update_stats(stats) finally: self._release_dump_event_handle(pending_dump_task) @@ -961,6 +1021,7 @@ def handle_preemptions( def wait_for_save(self) -> None: # TODO support PP + wait_for_save_start_ms = time.perf_counter() * 1000 self._poll_pending_dump_tasks() metadata = self._get_connector_metadata() @@ -975,8 +1036,9 @@ def wait_for_save(self) -> None: if self.is_mla and self.tp_rank != 0: return - dump_tasks: List[Task] = [] is_save = False + num_saved_block = 0 + num_saved_request = 0 total_ucm_block_ids, total_vllm_block_ids = [], [] dump_request_ids: set[str] = set() for request_id, request in metadata.request_meta.items(): @@ -994,6 +1056,8 @@ def wait_for_save(self) -> None: continue is_save = True dump_request_ids.add(request_id) + num_saved_block += len(ucm_block_ids) + num_saved_request += 1 if self.tp_rank != 0: for i, ucm_block_id in enumerate(ucm_block_ids): ucm_block_ids[i] = self.request_hasher(ucm_block_id) @@ -1012,20 +1076,54 @@ def wait_for_save(self) -> None: task = self.store.dump_data( total_ucm_block_ids, shard_indexs, total_ptrs, event_handle ) - dump_tasks.append(task) except Exception as e: logger.error(f"dump kv cache failed. {type(e).__name__}: {e}") if self.enable_event_sync and event_handle and self.device is not None: self.device.destroy_event_handle(event_handle) return - for task in dump_tasks: - pending_dump_task = PendingDumpTask( + save_bytes = num_saved_block * self.block_data_size + save_stats = { + "save_requests_num": num_saved_request, + "save_blocks_num": num_saved_block, + "save_bytes_total": save_bytes, + } + ucmmetrics.update_stats(save_stats) + self._pending_dump_tasks.append( + PendingDumpTask( task=task, request_ids=set(dump_request_ids), event_handle=event_handle, + wait_for_save_start_ms=wait_for_save_start_ms, ) - self._pending_dump_tasks.append(pending_dump_task) + ) + + @staticmethod + def _non_negative_ms(value: float) -> float: + value = float(value) + if not math.isfinite(value): + return 0.0 + return max(value, 0.0) + + def get_kv_connector_stats(self) -> Optional["KVConnectorStats"]: + if not self._vllm_metrics_enabled: + return None + if self._metrics_dispatcher is None: + return None + self._metrics_dispatcher.drain_to_consumers() + counter_stats, gauge_stats, histogram_stats = ( + self._metrics_dispatcher.get_stats_and_clear(VLLM_CONNECTOR_CONSUMER) + ) + stats = UCMConnectorStats.from_ucm_snapshot( + counter_stats=counter_stats, + gauge_stats=gauge_stats, + histogram_stats=histogram_stats, + worker_rank=self._worker_rank, + metric_definitions=self._vllm_metric_definitions, + ) + if stats.is_empty(): + return None + return stats def clear_connector_metadata(self) -> None: super().clear_connector_metadata() @@ -1112,6 +1210,21 @@ class UCMLayerWiseConnector(UCMDirectConnector): load l2 -> forward l2 -> save l2 """ + _BATCH_TOTAL_METRICS = { + (False, False): "layerwise_batch_total_no_transfer_ms", + (True, False): "layerwise_batch_total_load_only_ms", + (False, True): "layerwise_batch_total_save_only_ms", + (True, True): "layerwise_batch_total_load_save_ms", + } + _BATCH_LOAD_WAIT_METRICS = { + (True, False): "layerwise_batch_load_wait_total_load_only_ms", + (True, True): "layerwise_batch_load_wait_total_load_save_ms", + } + _BATCH_SAVE_TAIL_METRICS = { + (False, True): "layerwise_batch_save_tail_save_only_ms", + (True, True): "layerwise_batch_save_tail_load_save_ms", + } + def __init__( self, vllm_config: "VllmConfig", @@ -1129,6 +1242,7 @@ def __init__( self._failure_req_ids: set[str] = set() self._layerwise_prev_wait_end: Optional[float] = None self._layerwise_batch_start: Optional[float] = None + self._layerwise_batch_wait_blocking_total_ms = 0.0 # MTP layers can be revisited several times in one speculative decode # batch. Keep only the last metadata snapshot for each MTP layer and # submit its dump after the layerwise forward finishes. @@ -1142,6 +1256,32 @@ def __init__( self._init_mtp_layerwise_dump_state() logger.info("Init UCMLayerWiseConnector.") + def _layerwise_batch_stats( + self, total_end: float, save_tail_ms: Optional[float] = None + ) -> dict[str, float]: + if self._layerwise_batch_start is None: + return {} + + batch_type = (self.need_load, self.is_save) + batch_total_ms = (total_end - self._layerwise_batch_start) * 1000 + stats = { + "layerwise_batch_total_ms": batch_total_ms, + self._BATCH_TOTAL_METRICS[batch_type]: batch_total_ms, + } + + load_wait_metric = self._BATCH_LOAD_WAIT_METRICS.get(batch_type) + if load_wait_metric: + stats[load_wait_metric] = self._layerwise_batch_wait_blocking_total_ms + + save_tail_metric = self._BATCH_SAVE_TAIL_METRICS.get(batch_type) + if save_tail_metric and save_tail_ms is not None: + stats["layerwise_save_tail_total_ms"] = save_tail_ms + stats[save_tail_metric] = save_tail_ms + + self._layerwise_batch_start = None + self._layerwise_batch_wait_blocking_total_ms = 0.0 + return stats + def _init_mtp_layerwise_dump_state(self) -> None: """Cache whether MTP is enabled and how many MTP layers it has.""" speculative_config = getattr(self._vllm_config, "speculative_config", None) @@ -1293,6 +1433,7 @@ def start_load_kv(self, forward_context: "ForwardContext", **kwargs) -> None: self._failure_req_ids.clear() self.need_load = False self._layerwise_prev_wait_end = None + self._layerwise_batch_wait_blocking_total_ms = 0.0 for request_id, request in metadata.request_meta.items(): if len(request.load_block_ids[0]) == 0: @@ -1363,6 +1504,7 @@ def wait_for_layer_load(self, layer_name: str) -> None: ) blocking_ms = (wait_end - wait_start) * 1000 + self._layerwise_batch_wait_blocking_total_ms += blocking_ms stats = { "layerwise_wait_blocking_ms": blocking_ms, "layerwise_wait_tasks_count": float(n_tasks), @@ -1434,9 +1576,12 @@ def save_kv_layer( ) def wait_for_save(self) -> None: + save_tail_start = time.perf_counter() + wait_for_save_start_ms = save_tail_start * 1000 self._flush_deferred_mtp_dumps() - # Only reap completed tasks here. Unfinished dumps are waited when the - # request finishes or is preempted. + for pending_dump_task in self._pending_dump_tasks: + if pending_dump_task.wait_for_save_start_ms <= 0: + pending_dump_task.wait_for_save_start_ms = wait_for_save_start_ms self._poll_pending_dump_tasks() if self._connector_metadata: metadata = self._get_connector_metadata() @@ -1447,11 +1592,10 @@ def wait_for_save(self) -> None: ) total_end = time.perf_counter() - if self._layerwise_batch_start is not None: - batch_total_ms = (total_end - self._layerwise_batch_start) * 1000 - ucmmetrics.update_stats({"layerwise_batch_total_ms": batch_total_ms}) - self._layerwise_batch_start = None - + save_tail_ms = (total_end - save_tail_start) * 1000 + stats = self._layerwise_batch_stats(total_end, save_tail_ms) + if stats: + ucmmetrics.update_stats(stats) self.is_save = False self.dump_total_ptrs = None @@ -3193,6 +3337,39 @@ def wait_for_save(self) -> None: """ self.connector.wait_for_save() + def get_kv_connector_stats(self) -> Optional["KVConnectorStats"]: + return self.connector.get_kv_connector_stats() + + @classmethod + def build_kv_connector_stats( + cls, data: dict[str, Any] | None = None + ) -> Optional["KVConnectorStats"]: + return UCMConnectorStats(data=data) if data is not None else UCMConnectorStats() + + @classmethod + def build_prom_metrics( + cls, + vllm_config: "VllmConfig", + metric_types: dict[type["PromMetric"], type["PromMetricT"]], + labelnames: list[str], + per_engine_labelvalues: dict[int, list[object]], + ) -> Optional["KVConnectorPromMetrics"]: + if not UCM_HAS_PROM_METRICS: + return None + config = load_launch_metrics_config( + Config(vllm_config.kv_transfer_config).get_config() + ) + if not config or not consumer_enabled(config, VLLM_CONNECTOR_CONSUMER): + return None + if not get_vllm_connector_metric_definitions(config): + return None + return UCMPromMetrics( + vllm_config, + metric_types, + labelnames, + per_engine_labelvalues, + ) + def request_finished_all_groups( self, request: "Request", diff --git a/ucm/metrics_config.py b/ucm/metrics_config.py new file mode 100644 index 000000000..1d7bb8289 --- /dev/null +++ b/ucm/metrics_config.py @@ -0,0 +1,237 @@ +# +# MIT License +# +# Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# + +from dataclasses import dataclass +from typing import Any + +from ucm.logger import init_logger +from ucm.shared.metrics import ucmmetrics + +logger = init_logger(__name__) + +METRIC_TYPES = ("counter", "gauge", "histogram") +VLLM_EXCLUDED_METRICS = {"interval_lookup_hit_rates"} +MULTIPROC_CONSUMER = "multiproc" +VLLM_CONNECTOR_CONSUMER = "vllm_connector" + + +@dataclass(frozen=True) +class MetricDefinition: + name: str + metric_type: str + documentation: str = "" + buckets: tuple[float, ...] = () + vllm_connector_name: str = "" + vllm_connector_buckets: tuple[float, ...] = () + vllm_connector_value_scale: float = 1.0 + vllm_connector_enabled: bool = True + + +def load_metrics_config(config_path: str) -> dict[str, Any]: + if not config_path: + return {} + try: + import yaml + + with open(config_path, "r") as f: + config = yaml.safe_load(f) + except FileNotFoundError: + logger.warning(f"Config file {config_path} not found") + return {} + except ImportError as e: + logger.error(f"PyYAML is required to read metrics config {config_path}: {e}") + return {} + except Exception as e: + logger.error(f"Error loading metrics config file {config_path}: {e}") + return {} + if config is None: + return {} + if not isinstance(config, dict): + logger.error(f"Metrics config {config_path} must be a YAML mapping") + return {} + return config + + +def load_launch_metrics_config(launch_config: dict[str, Any] | None) -> dict[str, Any]: + if not launch_config: + return {} + inline_config = launch_config.get("metrics_config") + if isinstance(inline_config, dict): + return inline_config + return load_metrics_config(launch_config.get("metrics_config_path", "")) + + +def consumer_enabled( + config: dict[str, Any] | None, consumer: str, default: bool = True +) -> bool: + consumers = (config or {}).get("consumers") + if not isinstance(consumers, dict): + return default + return _as_bool(consumers.get(consumer), False) + + +def get_metric_definitions(config: dict[str, Any] | None) -> list[MetricDefinition]: + if not config: + return [] + + definitions: list[MetricDefinition] = [] + for metric_type in METRIC_TYPES: + for item in config.get(metric_type, []) or []: + if not isinstance(item, dict): + continue + name = item.get("name") + if not name: + continue + buckets = tuple(float(bucket) for bucket in item.get("buckets", []) or []) + vllm_connector_scale = _vllm_connector_value_scale(name, item) + vllm_connector_buckets = item.get("vllm_connector_buckets") + if vllm_connector_buckets is None: + vllm_connector_buckets = tuple( + bucket * vllm_connector_scale for bucket in buckets + ) + else: + vllm_connector_buckets = tuple( + float(bucket) for bucket in vllm_connector_buckets + ) + vllm_connector_enabled = _metric_vllm_connector_enabled(name, item) + definitions.append( + MetricDefinition( + name=name, + metric_type=metric_type, + documentation=item.get("documentation", ""), + buckets=buckets, + vllm_connector_name=_vllm_connector_metric_name( + name, + item, + vllm_connector_prefix(config), + ), + vllm_connector_buckets=tuple(vllm_connector_buckets), + vllm_connector_value_scale=vllm_connector_scale, + vllm_connector_enabled=vllm_connector_enabled, + ) + ) + return definitions + + +def get_vllm_connector_metric_definitions( + config: dict[str, Any] | None, +) -> list[MetricDefinition]: + return [ + definition + for definition in get_metric_definitions(config) + if definition.vllm_connector_enabled + ] + + +def setup_ucm_metrics(config: dict[str, Any] | None) -> list[MetricDefinition]: + definitions = get_metric_definitions(config) + if not definitions: + return [] + ucmmetrics.set_up() + for definition in definitions: + ucmmetrics.create_stats( + definition.name, + definition.metric_type, + list(definition.buckets), + ) + enabled_consumers = [ + consumer + for consumer in (MULTIPROC_CONSUMER, VLLM_CONNECTOR_CONSUMER) + if consumer_enabled(config, consumer) + ] + counts = _metric_definition_counts(definitions) + logger.info( + f"UCM metrics enabled for " + f"{', '.join(enabled_consumers) if enabled_consumers else 'no consumers'}: " + f"total={len(definitions)}, counters={counts['counter']}, " + f"gauges={counts['gauge']}, histograms={counts['histogram']}" + ) + return definitions + + +def multiproc_metric_name( + config: dict[str, Any] | None, metric_name: str, default_prefix: str = "ucm:" +) -> str: + return f"{multiproc_prefix(config, default_prefix)}{metric_name}" + + +def multiproc_prefix( + config: dict[str, Any] | None, default_prefix: str = "ucm:" +) -> str: + config = config or {} + return config.get("multiproc_prefix", config.get("metric_prefix", default_prefix)) + + +def vllm_connector_prefix( + config: dict[str, Any] | None, default_prefix: str = "ucm:" +) -> str: + return (config or {}).get("vllm_connector_prefix", default_prefix) + + +def _metric_definition_counts( + definitions: list[MetricDefinition], +) -> dict[str, int]: + return { + metric_type: sum( + 1 for definition in definitions if definition.metric_type == metric_type + ) + for metric_type in METRIC_TYPES + } + + +def _as_bool(value: Any, default: bool) -> bool: + if value is None: + return default + if isinstance(value, bool): + return value + if isinstance(value, str): + return value.lower() in {"1", "true", "yes", "on"} + return bool(value) + + +def _metric_vllm_connector_enabled(name: str, item: dict[str, Any]) -> bool: + if "vllm_connector_enabled" in item: + return _as_bool(item.get("vllm_connector_enabled"), True) + return name not in VLLM_EXCLUDED_METRICS + + +def _vllm_connector_value_scale(name: str, item: dict[str, Any]) -> float: + if "vllm_connector_value_scale" in item: + return float(item["vllm_connector_value_scale"]) + return 1.0 + + +def _vllm_connector_metric_name(name: str, item: dict[str, Any], prefix: str) -> str: + configured = item.get("vllm_connector_name") + if configured: + return _ensure_prefix(str(configured), prefix) + return f"{prefix}{name}" + + +def _ensure_prefix(name: str, prefix: str) -> str: + if name.startswith(prefix): + return name + if ":" in name: + name = name.split(":", 1)[1] + return f"{prefix}{name}" diff --git a/ucm/metrics_dispatcher.py b/ucm/metrics_dispatcher.py new file mode 100644 index 000000000..5b9ad4bd0 --- /dev/null +++ b/ucm/metrics_dispatcher.py @@ -0,0 +1,150 @@ +# +# MIT License +# +# Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# + +import threading +from copy import deepcopy +from typing import Any + +from ucm.metrics_config import ( + MULTIPROC_CONSUMER, + VLLM_CONNECTOR_CONSUMER, + MetricDefinition, + consumer_enabled, + get_metric_definitions, +) +from ucm.shared.metrics import ucmmetrics + +CONSUMERS = (MULTIPROC_CONSUMER, VLLM_CONNECTOR_CONSUMER) +_DISPATCHER: "MetricsDispatcher | None" = None +_DISPATCHER_LOCK = threading.Lock() + + +class MetricsDispatcher: + def __init__(self, config: dict[str, Any] | None): + self.config = config or {} + self._definitions = get_metric_definitions(self.config) + self._definitions_by_name = { + definition.name: definition for definition in self._definitions + } + self._enabled = { + consumer: consumer_enabled(self.config, consumer) for consumer in CONSUMERS + } + self._buffers = {consumer: self._empty_stats() for consumer in CONSUMERS} + self._lock = threading.Lock() + + def drain_to_consumers(self) -> None: + with self._lock: + counter_stats, gauge_stats, histogram_stats = ( + ucmmetrics.get_all_stats_and_clear() + ) + if not counter_stats and not gauge_stats and not histogram_stats: + return + + for consumer in CONSUMERS: + if not self._enabled[consumer]: + continue + self._merge_stats( + self._buffers[consumer], + counter_stats, + gauge_stats, + histogram_stats, + consumer, + ) + + def get_stats_and_clear(self, consumer: str): + if consumer not in CONSUMERS: + raise ValueError(f"Unsupported metrics consumer: {consumer}") + with self._lock: + snapshot = deepcopy(self._buffers[consumer]) + self._buffers[consumer] = self._empty_stats() + return snapshot + + def _merge_stats( + self, + buffer, + counter_stats: dict[str, float], + gauge_stats: dict[str, float], + histogram_stats: dict[str, Any], + consumer: str, + ) -> None: + counters, gauges, histograms = buffer + for metric_name, value in counter_stats.items(): + definition = self._consumer_definition(metric_name, "counter", consumer) + if definition is not None: + counters[metric_name] = counters.get(metric_name, 0.0) + float(value) + + for metric_name, value in gauge_stats.items(): + definition = self._consumer_definition(metric_name, "gauge", consumer) + if definition is not None: + gauges[metric_name] = float(value) + + for metric_name, value in histogram_stats.items(): + definition = self._consumer_definition(metric_name, "histogram", consumer) + if definition is not None: + self._merge_histogram(histograms, metric_name, value) + + def _consumer_definition( + self, metric_name: str, metric_type: str, consumer: str + ) -> MetricDefinition | None: + definition = self._definitions_by_name.get(metric_name) + if definition is None or definition.metric_type != metric_type: + return None + if ( + consumer == VLLM_CONNECTOR_CONSUMER + and not definition.vllm_connector_enabled + ): + return None + return definition + + def _merge_histogram(self, histograms: dict[str, Any], metric_name: str, value): + bucket_counts, sum_delta = self._histogram_tuple(value) + current_counts, current_sum = histograms.get( + metric_name, ([0] * len(bucket_counts), 0.0) + ) + if len(current_counts) < len(bucket_counts): + current_counts.extend([0] * (len(bucket_counts) - len(current_counts))) + for index, count in enumerate(bucket_counts): + current_counts[index] += int(count) + histograms[metric_name] = (current_counts, current_sum + float(sum_delta)) + + def _histogram_tuple(self, value): + if isinstance(value, dict): + return list(value.get("bucket_counts", [])), float(value.get("sum", 0.0)) + if isinstance(value, (tuple, list)) and len(value) == 2: + bucket_counts, sum_delta = value + return list(bucket_counts), float(sum_delta) + return list(getattr(value, "bucketCounts", [])), float( + getattr(value, "sum", 0.0) + ) + + def _empty_stats(self): + return ({}, {}, {}) + + +def get_metrics_dispatcher(config: dict[str, Any] | None) -> MetricsDispatcher: + global _DISPATCHER + with _DISPATCHER_LOCK: + if _DISPATCHER is None: + _DISPATCHER = MetricsDispatcher(config) + return _DISPATCHER diff --git a/ucm/observability.py b/ucm/observability.py index 5ca5f66cd..a8a8a5c2b 100644 --- a/ucm/observability.py +++ b/ucm/observability.py @@ -28,11 +28,18 @@ import time from typing import Any, Union -import yaml from prometheus_client import Counter, Gauge, Histogram from ucm.logger import init_logger -from ucm.shared.metrics import ucmmetrics +from ucm.metrics_config import ( + MULTIPROC_CONSUMER, + consumer_enabled, + get_metric_definitions, + load_metrics_config, + multiproc_metric_name, + setup_ucm_metrics, +) +from ucm.metrics_dispatcher import get_metrics_dispatcher logger = init_logger(__name__) @@ -43,24 +50,6 @@ class PrometheusStatsLogger: - def _load_config(self, config_path: str) -> dict[str, Any]: - """Load configuration from YAML file""" - try: - with open(config_path, "r") as f: - config = yaml.safe_load(f) - if config is None: - logger.warning( - f"Config file {config_path} is empty, using defaults" - ) - return {} - return config - except FileNotFoundError: - logger.warning(f"Config file {config_path} not found, using defaults") - return {} - except yaml.YAMLError as e: - logger.error(f"Error parsing YAML config file {config_path}: {e}") - return {} - def __init__(self, model_name, worker_id, config_path): """ Load metrics config from YAML file (config_path), @@ -69,11 +58,16 @@ def __init__(self, model_name, worker_id, config_path): if _metric_mappings: logger.warning("Metrics are already registered, skipping re-registration.") return - # Load metrics config - self.config = self._load_config(config_path) + self.config = load_metrics_config(config_path) + self.metric_definitions = get_metric_definitions(self.config) + if not self.metric_definitions: + return + if not consumer_enabled(self.config, MULTIPROC_CONSUMER): + return self.log_interval = self.config.get("log_interval", 10) - ucmmetrics.set_up() + setup_ucm_metrics(self.config) + self.metrics_dispatcher = get_metrics_dispatcher(self.config) multiproc_dir = self.config.get("multiproc_dir", "/vllm-workspace") if "PROMETHEUS_MULTIPROC_DIR" not in os.environ: @@ -106,11 +100,14 @@ def _register_metrics_by_type(self, metric_type): """ metric_cls, default_kwargs = self.metric_type_config[metric_type] cfg_list = self.config.get(metric_type, []) + registered_count = 0 for cfg in cfg_list: name = cfg.get("name") + if not name: + continue doc = cfg.get("documentation", "") - prometheus_name = f"{self.metric_prefix}{name}" + prometheus_name = multiproc_metric_name(self.config, name) metric_kwargs = { "name": prometheus_name, @@ -122,16 +119,22 @@ def _register_metrics_by_type(self, metric_type): metric = metric_cls(**metric_kwargs) _metric_mappings[name] = metric - buckets = list(getattr(metric, "_upper_bounds", [])) - ucmmetrics.create_stats(name, metric_type, buckets) + registered_count += 1 + return registered_count def _init_metrics_from_config(self): """Initialize metrics based on config""" - # Get metric name prefix from config (e.g., "ucm:") - self.metric_prefix = self.config.get("metric_prefix", "ucm:") - - for metric_type in self.metric_type_config.keys(): - self._register_metrics_by_type(metric_type) + counts = { + metric_type: self._register_metrics_by_type(metric_type) + for metric_type in self.metric_type_config.keys() + } + logger.info( + f"UCM metrics multiproc path enabled: total={sum(counts.values())}, " + f"counters={counts['counter']}, gauges={counts['gauge']}, " + f"histograms={counts['histogram']}, prefix=" + f"{self.config.get('multiproc_prefix', self.config.get('metric_prefix', 'ucm:'))}, " + f"labels={self.labelnames}" + ) def _update_counter(self, metric, value): if value < 0: @@ -197,8 +200,9 @@ def update_stats_loop(self): Periodically update Prometheus metrics in a loop until stopped. """ while self.is_running: + self.metrics_dispatcher.drain_to_consumers() counter_stats, gauge_stats, histogram_stats = ( - ucmmetrics.get_all_stats_and_clear() + self.metrics_dispatcher.get_stats_and_clear(MULTIPROC_CONSUMER) ) self.update_stats(counter_stats, gauge_stats, histogram_stats) time.sleep(self.log_interval) diff --git a/ucm/store/cache/cc/copy_stream.h b/ucm/store/cache/cc/copy_stream.h index 85e16f961..f661b508f 100644 --- a/ucm/store/cache/cc/copy_stream.h +++ b/ucm/store/cache/cc/copy_stream.h @@ -66,6 +66,11 @@ class CopyStream { streamIndex_ = (streamIndex_ + 1) % streamNumber_; return stream; } + Status AppendCallback(std::function cb) noexcept + { + if (streams_.empty()) [[unlikely]] { return Status::Error(); } + return streams_.front()->AppendCallback(std::move(cb)); + } Status WaitEvent(void* event) noexcept { auto status = Status::OK(); diff --git a/ucm/store/cache/cc/dump_queue.cc b/ucm/store/cache/cc/dump_queue.cc index eb5c84def..cbdf744a1 100644 --- a/ucm/store/cache/cc/dump_queue.cc +++ b/ucm/store/cache/cc/dump_queue.cc @@ -22,6 +22,9 @@ * SOFTWARE. * */ #include "dump_queue.h" +#include +#include +#include #include "logger/logger.h" #include "metrics_api.h" #include "thread/cpu_affinity.h" @@ -42,6 +45,8 @@ Status DumpQueue::Setup(const Config& config, TaskIdSet* failureSet, TransBuffer backend_ = config.storeBackend; deviceId_ = config.deviceId; tensorSizes_ = config.tensorSizes; + shardBytes_ = 0; + for (const auto size : tensorSizes_) { shardBytes_ += size; } streamNumber_ = config.streamNumber; useGdr_ = config.useGdr; cpuAffinityCores_ = config.cpuAffinityCores; @@ -94,13 +99,14 @@ void DumpQueue::DispatchOneTask(CopyStream& stream, TaskPair&& pair) Status DumpQueue::DumpOneTask(CopyStream& stream, TaskPtr task) { - auto tp = NowTime::Now(); + auto dumpStartTp = NowTime::Now(); Detail::TaskDesc backendTaskDesc; backendTaskDesc.brief = "Cache2Backend"; const auto nShard = task->desc.size(); UC_DEBUG("Try to dump ({}) shards.", nShard); DumpCtx dumpCtx; dumpCtx.taskHandle = task->id; + std::shared_ptr> eventReadyTp; if (task->desc.prerequisiteHandle != 0) { auto s = stream.WaitEvent(reinterpret_cast(task->desc.prerequisiteHandle)); if (s.Failure()) [[unlikely]] { @@ -108,7 +114,13 @@ Status DumpQueue::DumpOneTask(CopyStream& stream, TaskPtr task) UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_d2h_errors_total"), 1.0); return s; } + eventReadyTp = std::make_shared>(0.0); + auto cbStatus = stream.AppendCallback([eventReadyTp](bool) { + eventReadyTp->store(NowTime::Now(), std::memory_order_release); + }); + if (cbStatus.Failure()) [[unlikely]] { eventReadyTp.reset(); } } + size_t copiedShards = 0; for (size_t i = 0; i < nShard; i++) { auto& shard = task->desc[i]; auto handle = buffer_->Get(shard.owner, shard.index); @@ -121,6 +133,7 @@ Status DumpQueue::DumpOneTask(CopyStream& stream, TaskPtr task) UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_d2h_errors_total"), 1.0); return s; } + copiedShards++; } backendTaskDesc.push_back(Detail::Shard{shard.owner, shard.index, {handle.Data()}}); dumpCtx.bufferHandles.push_back(std::move(handle)); @@ -131,6 +144,7 @@ Status DumpQueue::DumpOneTask(CopyStream& stream, TaskPtr task) UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_dump_backend_shards_total"), static_cast(backendTaskDesc.size())); if (backendTaskDesc.empty()) { return Status::OK(); } + auto tpSyncStart = NowTime::Now(); auto s = stream.Synchronize(); if (s.Failure()) [[unlikely]] { UC_ERROR("Failed({}) to sync on stream for task({}).", s, task->id); @@ -138,6 +152,7 @@ Status DumpQueue::DumpOneTask(CopyStream& stream, TaskPtr task) return s; } auto tpSyncStream = NowTime::Now(); + auto tpBackendSubmitStart = NowTime::Now(); for (auto& handle : dumpCtx.bufferHandles) { handle.MarkReady(); } auto res = backend_->Dump(std::move(backendTaskDesc)); if (!res) [[unlikely]] { @@ -148,15 +163,28 @@ Status DumpQueue::DumpOneTask(CopyStream& stream, TaskPtr task) dumpCtx.backendTaskHandle = res.Value(); dumping_.Push(std::move(dumpCtx)); auto tpEnd = NowTime::Now(); - UC_DEBUG("Cache task({}) mk_buf={:.3f}ms, sync={:.3f}ms, back={:.3f}ms.", task->id, - (tpMakeBuffer - tp) * 1e3, (tpSyncStream - tpMakeBuffer) * 1e3, - (tpEnd - tpSyncStream) * 1e3); + auto prereqWaitMs = 0.0; + auto d2hMs = std::max(0.0, tpSyncStream - tpSyncStart) * 1e3; + if (eventReadyTp) { + auto ready = eventReadyTp->load(std::memory_order_acquire); + if (ready > 0.0) { + prereqWaitMs = std::max(0.0, ready - dumpStartTp) * 1e3; + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_dump_prereq_wait_ms"), prereqWaitMs); + } + } + if (copiedShards > 0 && d2hMs > 0.0) { + auto copiedBytes = static_cast(copiedShards) * static_cast(shardBytes_); + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_d2h_duration_ms"), d2hMs); + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_d2h_bandwidth_gbps"), + copiedBytes / (d2hMs * 1e-3) / 1e9); + } + UC_DEBUG("Cache task({}) mk_buf={:.3f}ms, prereq={:.3f}ms, d2h={:.3f}ms, back={:.3f}ms.", + task->id, (tpMakeBuffer - dumpStartTp) * 1e3, prereqWaitMs, d2hMs, + (tpEnd - tpBackendSubmitStart) * 1e3); UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_dump_mkbuf_duration_ms"), - (tpMakeBuffer - tp) * 1e3); - UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_d2h_duration_ms"), - (tpSyncStream - tpMakeBuffer) * 1e3); + (tpMakeBuffer - dumpStartTp) * 1e3); UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_dump_backend_submit_duration_ms"), - (tpEnd - tpSyncStream) * 1e3); + (tpEnd - tpBackendSubmitStart) * 1e3); return Status::OK(); } @@ -186,7 +214,10 @@ void DumpQueue::BackendDumpStage() } dumping_.ConsumerLoop(stop_, [this](auto&& task) { if (task.backendTaskHandle > finishedBackendTaskHandle_) { + auto tpWait = NowTime::Now(); auto s = backend_->Wait(task.backendTaskHandle); + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_dump_backend_wait_duration_ms"), + (NowTime::Now() - tpWait) * 1e3); finishedBackendTaskHandle_ = task.backendTaskHandle; if (s.Failure()) { UC_ERROR("Failed({}) to wait backend({}) for task({}).", s, task.backendTaskHandle, diff --git a/ucm/store/cache/cc/dump_queue.h b/ucm/store/cache/cc/dump_queue.h index 857e00906..ff85c7362 100644 --- a/ucm/store/cache/cc/dump_queue.h +++ b/ucm/store/cache/cc/dump_queue.h @@ -55,6 +55,7 @@ class DumpQueue { StoreV1* backend_{nullptr}; int32_t deviceId_{-1}; std::vector tensorSizes_{}; + size_t shardBytes_{0}; size_t streamNumber_{1}; bool useGdr_{false}; std::vector cpuAffinityCores_{}; diff --git a/ucm/store/cache/cc/load_queue.cc b/ucm/store/cache/cc/load_queue.cc index 54c9ef8ac..e72a313d2 100644 --- a/ucm/store/cache/cc/load_queue.cc +++ b/ucm/store/cache/cc/load_queue.cc @@ -42,6 +42,8 @@ Status LoadQueue::Setup(const Config& config, TaskIdSet* failureSet, TransBuffer backend_ = config.storeBackend; deviceId_ = config.deviceId; tensorSizes_ = config.tensorSizes; + shardBytes_ = 0; + for (const auto size : tensorSizes_) { shardBytes_ += size; } streamNumber_ = config.streamNumber; useGdr_ = config.useGdr; cpuAffinityCores_ = config.cpuAffinityCores; @@ -119,7 +121,7 @@ void LoadQueue::DispatchOneTask(TaskPair&& pair) (tpWait - tp) * 1e3, (tpDispatch - tpWait) * 1e3); UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_load_queue_wait_duration_ms"), (tpWait - tp) * 1e3); - UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_load_dispatch_duration_ms"), + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_load_backend_submit_duration_ms"), (tpDispatch - tpWait) * 1e3); UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_load_backend_shards_total"), static_cast(backendSubmitCount)); @@ -159,17 +161,27 @@ void LoadQueue::TransferOneTask(CopyStream& stream, ShardTask&& task) UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_errors_total"), 1.0); break; } + if (holder_.empty()) { h2dBatchStartTp_ = tpBackendReady; } auto tpH2dSubmitted = NowTime::Now(); UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_shard_backend_wait_ms"), (tpBackendReady - tpBackendWait) * 1e3); - UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_shard_h2d_ms"), + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_submit_ms"), (tpH2dSubmitted - tpBackendReady) * 1e3); if (!task.waiter) { holder_.push_back(std::move(task)); return; } + const auto copiedShards = holder_.size() + 1; s = stream.Synchronize(); + auto h2dSyncMs = (NowTime::Now() - h2dBatchStartTp_) * 1e3; + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_sync_ms"), h2dSyncMs); + if (copiedShards > 0 && h2dSyncMs > 0.0) { + auto copiedBytes = static_cast(copiedShards) * static_cast(shardBytes_); + UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_bandwidth_gbps"), + copiedBytes / (h2dSyncMs * 1e-3) / 1e9); + } holder_.clear(); + h2dBatchStartTp_ = 0.0; if (s.Failure()) [[unlikely]] { UC_ERROR("Failed({}) to sync on stream for task({}).", s, task.taskHandle); UC::Metrics::UpdateStats(NAME_TO_METRIC_ID("cache_h2d_errors_total"), 1.0); diff --git a/ucm/store/cache/cc/load_queue.h b/ucm/store/cache/cc/load_queue.h index 78aa00fd7..9d64b4b3d 100644 --- a/ucm/store/cache/cc/load_queue.h +++ b/ucm/store/cache/cc/load_queue.h @@ -56,6 +56,7 @@ class LoadQueue { StoreV1* backend_{nullptr}; int32_t deviceId_{-1}; std::vector tensorSizes_{}; + size_t shardBytes_{0}; size_t streamNumber_{1}; bool useGdr_{false}; std::vector cpuAffinityCores_{}; @@ -64,6 +65,7 @@ class LoadQueue { std::thread dispatcher_; std::thread transfer_; std::vector holder_; + double h2dBatchStartTp_{0.0}; public: ~LoadQueue();