Skip to content

Commit 6f90147

Browse files
authored
[Feat] Support C++/Python to use same metrics singleton within a process (#654)
<!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ OUR OFFICIAL WEBSITE. --> # Purpose What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> - This PR aims to resolve the problem where Python-side and C++-side components within the same process fail to share the same metrics collection component. - Additionally, it refactors the original structure of the Metrics component to simplify the implementation logic and improve maintainability. # Modifications Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> Users now can add new metrics more easily with a provided doc to show how to add new metrics. # Test How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Tested with H20-QwQ-32B-TP2 All Save and All Load <img width="845" height="279" alt="image" src="https://github.com/user-attachments/assets/ec27a761-ff49-4c7a-a4a0-6e4df0e8ceb5" /> The add new metrics doc: <img width="1440" height="863" alt="image" src="https://github.com/user-attachments/assets/95c3e6b1-3173-436e-bf87-dd97dd18f0e5" /> Test with cmake: only cost 1us on UpdateStats, and for GetStatsAndClear it would take more time if there is too much stats, we use another thread to handle it so it would not infect main process <img width="963" height="165" alt="image" src="https://github.com/user-attachments/assets/636550f0-007d-4812-89ad-f9eff23c69ad" /> Test with performance tests: cost stays below 1% in most cases; larger block sizes lead to smaller impact, which is even lower than the system's inherent fluctuations. <img width="1058" height="731" alt="image" src="https://github.com/user-attachments/assets/84c48128-bc81-4235-81c5-769e7a589e1b" />
1 parent b1e95c6 commit 6f90147

21 files changed

Lines changed: 885 additions & 869 deletions

File tree

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# How to Add A New Metric
2+
UCM allows developers to add new metrics for monitoring service health status, and this doc provides the methods for adding new metrics.
3+
4+
## Getting Started
5+
### Step 1: Define New Metrics in YAML
6+
Prometheus provides three fundamental metric types: Counter, Gauge, and Histogram. UCM implements corresponding wrappers for each type. After defining new metric in yaml, it will be registered to Prometheus automatically by below function:
7+
```python
8+
def _register_metrics_by_type(self, metric_type):
9+
"""
10+
Register metrics by different metric types.
11+
"""
12+
metric_cls, default_kwargs = self.metric_type_config[metric_type]
13+
cfg_list = self.config.get(metric_type, [])
14+
15+
for cfg in cfg_list:
16+
name = cfg.get("name")
17+
doc = cfg.get("documentation", "")
18+
# Prometheus metric name with prefix
19+
prometheus_name = f"{self.metric_prefix}{name}"
20+
ucmmetrics.create_stats(name, metric_type)
21+
22+
metric_kwargs = {
23+
"name": prometheus_name,
24+
"documentation": doc,
25+
"labelnames": self.labelnames,
26+
**default_kwargs,
27+
**{k: v for k, v in cfg.items() if k in default_kwargs},
28+
}
29+
30+
self.metric_mappings[name] = metric_cls(**metric_kwargs)
31+
```
32+
33+
Example of yaml below:
34+
```yaml
35+
# Prometheus Metrics Configuration
36+
# This file defines which metrics should be enabled and their configurations
37+
log_interval: 5 # Interval in seconds for logging metrics
38+
39+
multiproc_dir: "/vllm-workspace" # Directory for Prometheus multiprocess mode
40+
41+
metric_prefix: "ucm:"
42+
43+
histogram_max_length: 10000 # Maximum length of the vector for each histogram metric
44+
45+
# Counter metrics configuration
46+
# counter:
47+
# - name: "received_requests"
48+
# documentation: "Total number of requests sent to ucm"
49+
50+
# Gauge metrics configuration
51+
# gauge:
52+
# - name: "lookup_hit_rate"
53+
# documentation: "Hit rate of ucm lookup requests since last log"
54+
# multiprocess_mode: "livemostrecent"
55+
56+
# Histogram metrics configuration
57+
histogram:
58+
- name: "load_requests_num"
59+
documentation: "Number of requests loaded from ucm"
60+
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
61+
- name: "d2s_bandwidth"
62+
documentation: "Band width of uc store task d2s, copy tensors from device to storage"
63+
buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
64+
- name: "s2d_bandwidth"
65+
documentation: "Band width of uc store task s2d, copy tensors from storage to device"
66+
buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
67+
```
68+
Please refer to the [example YAML](https://github.com/ModelEngine-Group/unified-cache-management/blob/develop/examples/metrics/metrics_configs.yaml) for more detailed information.
69+
70+
### Step 2: Use Metrics APIs to Update Stats
71+
After defining metrics in yaml, users only need to link metrics/import ucmmetrics and update them in suitable position, while `observability` component is responsible for fetching the stats.
72+
73+
:::::{tab-set}
74+
:sync-group: install
75+
76+
::::{tab-item} Python side interfaces
77+
:selected:
78+
:sync: py
79+
**Example:** Import the `ucmmetrics` and then use `update_stats` to update new metrics.
80+
```python
81+
# 1. Import ucmmetrics
82+
from ucm.shared.metrics import ucmmetrics
83+
84+
# 2. Update a stat
85+
ucmmetrics.update_stats(
86+
{"interval_lookup_hit_rates": external_hit_blocks / len(ucm_block_ids)},
87+
)
88+
89+
# 2. Update stats
90+
ucmmetrics.update_stats(
91+
{
92+
"load_requests_total": num_loaded_request,
93+
"load_blocks_total": num_loaded_block,
94+
"load_duration": load_end_time - load_start_time,
95+
"load_speed": load_speed,
96+
}
97+
)
98+
```
99+
See more detailed example in [test case](https://github.com/ModelEngine-Group/unified-cache-management/tree/develop/ucm/shared/test/example).
100+
101+
::::
102+
103+
::::{tab-item} C++ side interfaces
104+
:sync: cc
105+
106+
**Example:** UCM supports custom metrics by following steps:
107+
- Step 1: linking the static library metrics
108+
```c++
109+
target_link_libraries(xxxstore PUBLIC storeinfra metrics)
110+
```
111+
- Step 2: Update using function **UpdateStats**
112+
```c++
113+
// 1. Include metrics api head file
114+
#include "metrics_api.h"
115+
116+
// 2. Update metrics defined in yaml
117+
auto Epilog(const size_t ioSize) const noexcept
118+
{
119+
auto total = ioSize * number_;
120+
auto costs = NowTp() - startTp;
121+
auto bw = double(total) / costs / 1e9;
122+
switch (type)
123+
{
124+
case Type::DUMP:
125+
UC::Metrics::UpdateStats("d2s_bandwidth", bw);
126+
break;
127+
case Type::LOAD:
128+
UC::Metrics::UpdateStats("s2d_bandwidth", bw);
129+
break;
130+
default:
131+
break;
132+
}
133+
return fmt::format("Task({},{},{},{}) finished, costs={:.06f}s, bw={:.06f}GB/s.", id,
134+
brief_, number_, total, costs, bw);
135+
}
136+
```
137+
See more detailed example in [test case](https://github.com/ModelEngine-Group/unified-cache-management/tree/develop/ucm/shared/test/case).
138+
::::
139+
:::::
140+
141+
## How to See New Metrics
142+
After completing the above two steps, developers can view the newly added metrics via the /metrics endpoint.
143+
144+
Developers can also add a new panel in grafana.json to display the newly added metrics. Refer to [grafana example](https://github.com/ModelEngine-Group/unified-cache-management/tree/main/examples/metrics) for more information.

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ user-guide/rerope/rerope
6565
:caption: Developer Guide
6666
:maxdepth: 1
6767
developer-guide/contribute
68+
developer-guide/add_metrics
6869
:::
6970

7071
:::{toctree}

docs/source/user-guide/metrics/metrics.md

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ First, set the `PROMETHEUS_MULTIPROC_DIR` environment variable.
1414
export PROMETHEUS_MULTIPROC_DIR=/vllm-workspace
1515
```
1616

17-
Then, start the UCM service.
17+
Then, you should uncomment `metrics_config_path` in ucm's config.yaml—this path specifies which metrics need to be collected.
18+
19+
After completing the two steps above, you can start the service to collect metrics.
1820

1921
```bash
2022
export CUDA_VISIBLE_DEVICES=0
@@ -40,7 +42,6 @@ vllm serve /home/models/Qwen2.5-14B-Instruct \
4042
}
4143
}'
4244
```
43-
**Note**: You can refer to the `ucm_config.yaml` file at https://github.com/ModelEngine-Group/unified-cache-management/tree/develop/examples to configure the `metrics_config_path` parameter.
4445

4546
You can use the `vllm bench serve` command to run benchmarks:
4647

@@ -173,20 +174,14 @@ Metrics configuration is defined in the `unified-cache-management/examples/metri
173174
```yaml
174175
log_interval: 5 # Interval in seconds for logging metrics
175176
176-
prometheus:
177-
multiproc_dir: "/vllm-workspace" # Prometheus directory
178-
metric_prefix: "ucm:" # Metric name prefix
179-
180-
enabled_metrics:
181-
counters: true
182-
gauges: true
183-
histograms: true
184-
185-
histograms:
186-
- name: "load_requests_num"
187-
documentation: "Number of requests loaded from ucm"
188-
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
189-
# ... other metric configurations
177+
multiproc_dir: "/vllm-workspace" # Prometheus directory
178+
metric_prefix: "ucm:" # Metric name prefix
179+
180+
histograms:
181+
- name: "load_requests_num"
182+
documentation: "Number of requests loaded from ucm"
183+
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
184+
# ... other metric configurations
190185
```
191186

192187
---

examples/metrics/metrics_configs.yaml

Lines changed: 44 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,49 @@
22
# This file defines which metrics should be enabled and their configurations
33
log_interval: 5 # Interval in seconds for logging metrics
44

5-
prometheus:
6-
multiproc_dir: "/vllm-workspace" # Directory for Prometheus multiprocess mode
5+
multiproc_dir: "/vllm-workspace" # Directory for Prometheus multiprocess mode
76

8-
metric_prefix: "ucm:"
9-
10-
# Enable/disable metrics by category
11-
enabled_metrics:
12-
counters: true
13-
gauges: true
14-
histograms: true
15-
16-
# Counter metrics configuration
17-
# counters:
18-
# - name: "received_requests"
19-
# documentation: "Total number of requests sent to ucm"
20-
21-
# Gauge metrics configuration
22-
# gauges:
23-
# - name: "lookup_hit_rate"
24-
# documentation: "Hit rate of ucm lookup requests since last log"
25-
# multiprocess_mode: "livemostrecent"
26-
27-
# Histogram metrics configuration
28-
histograms:
29-
- name: "load_requests_num"
30-
documentation: "Number of requests loaded from ucm"
31-
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
32-
- name: "load_blocks_num"
33-
documentation: "Number of blocks loaded from ucm"
34-
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
35-
- name: "load_duration"
36-
documentation: "Time to load from ucm (ms)"
37-
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
38-
- name: "load_speed"
39-
documentation: "Speed of loading from ucm (GB/s)"
40-
buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100]
41-
- name: "save_requests_num"
42-
documentation: "Number of requests saved to ucm"
43-
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
44-
- name: "save_blocks_num"
45-
documentation: "Number of blocks saved to ucm"
46-
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
47-
- name: "save_duration"
48-
documentation: "Time to save to ucm (ms)"
49-
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
50-
- name: "save_speed"
51-
documentation: "Speed of saving to ucm (GB/s)"
52-
buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100]
53-
- name: "interval_lookup_hit_rates"
54-
documentation: "Hit rates of ucm lookup requests"
55-
buckets: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
7+
metric_prefix: "ucm:"
568

9+
histogram_max_length: 10000 # Maximum length of the vector for each histogram metric
10+
11+
# Counter metrics configuration
12+
# counter:
13+
# - name: "received_requests"
14+
# documentation: "Total number of requests sent to ucm"
15+
16+
# Gauge metrics configuration
17+
# gauge:
18+
# - name: "lookup_hit_rate"
19+
# documentation: "Hit rate of ucm lookup requests since last log"
20+
# multiprocess_mode: "livemostrecent"
21+
22+
# Histogram metrics configuration
23+
histogram:
24+
- name: "load_requests_num"
25+
documentation: "Number of requests loaded from ucm"
26+
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
27+
- name: "load_blocks_num"
28+
documentation: "Number of blocks loaded from ucm"
29+
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
30+
- name: "load_duration"
31+
documentation: "Time to load from ucm (ms)"
32+
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
33+
- name: "load_speed"
34+
documentation: "Speed of loading from ucm (GB/s)"
35+
buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100]
36+
- name: "save_requests_num"
37+
documentation: "Number of requests saved to ucm"
38+
buckets: [1, 5, 10, 20, 50, 100, 200, 500, 1000]
39+
- name: "save_blocks_num"
40+
documentation: "Number of blocks saved to ucm"
41+
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
42+
- name: "save_duration"
43+
documentation: "Time to save to ucm (ms)"
44+
buckets: [0, 50, 100, 150, 200, 250, 300, 350, 400, 550, 600, 750, 800, 850, 900, 950, 1000]
45+
- name: "save_speed"
46+
documentation: "Speed of saving to ucm (GB/s)"
47+
buckets: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100]
48+
- name: "interval_lookup_hit_rates"
49+
documentation: "Hit rates of ucm lookup requests"
50+
buckets: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

0 commit comments

Comments
 (0)