You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/specs/om/open_metrics_spec_2_0.md
+41-33Lines changed: 41 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,33 +1,35 @@
1
1
---
2
-
title: OpenMetrics 2.0
3
-
sort_rank: 3
2
+
title: "OpenMetrics 2.0"
4
3
nav_title: "2.0"
4
+
sort_rank: 3
5
+
5
6
hide_in_nav: true
7
+
6
8
author:
7
-
- email: arthursens2005@gmail.com
8
-
ins: A. Silva Sens
9
-
name: Arthur Silva Sens
10
-
organization: Grafana Labs
11
-
- email: bwplotka@gmail.com
12
-
ins: B. Płotka
13
-
name: Bartłomiej Płotka
14
-
organization: Google
15
-
- email: dashpole@google.com
16
-
ins: D. Ashpole
17
-
name: David Ashpole
18
-
organization: Google
19
-
- email: krajo@prometheus.io
20
-
ins: G. Krajcsovits
21
-
name: György Krajcsovits
22
-
organization: Grafana Labs
23
-
- email: owen.williams@grafana.com
24
-
ins: O. Williams
25
-
name: Owen Williams
26
-
organization: Grafana Labs
27
-
- email: richih@richih.org
28
-
ins: R. Hartmann
29
-
name: Richard Hartmann
30
-
organization: Grafana Labs
9
+
- ins: A. Silva Sens
10
+
name: Arthur Silva Sens
11
+
organization: Grafana Labs
12
+
email: arthursens2005@gmail.com
13
+
- ins: B. Płotka
14
+
name: Bartłomiej Płotka
15
+
organization: Google
16
+
email: bwplotka@gmail.com
17
+
- ins: D. Ashpole
18
+
name: David Ashpole
19
+
organization: Google
20
+
email: dashpole@google.com
21
+
- ins: G. Krajcsovits
22
+
name: György Krajcsovits
23
+
organization: Grafana Labs
24
+
email: krajo@prometheus.io
25
+
- ins: O. Williams
26
+
name: Owen Williams
27
+
organization: Grafana Labs
28
+
email: owen.williams@grafana.com
29
+
- ins: R. Hartmann
30
+
name: Richard Hartmann
31
+
organization: Grafana Labs
32
+
email: richih@richih.org
31
33
---
32
34
33
35
- Version: 2.0.0-rc0
@@ -66,7 +68,7 @@ Common examples of metric time series would be network interface counters, devic
66
68
67
69
## Data Model
68
70
69
-
This section MUST be read together with the ABNF section. In case of disagreements between the two, the ABNF's restrictions MUST take precedence. This reduces repetition as the text wire format MUST be supported.
71
+
This section MUST be read together with the ABNF section. In case of disagreements between the two, the ABNF's restrictions MUST take precedence. This reduces repetition as the text wire format MUST be supported.
70
72
71
73
### Data Types
72
74
@@ -157,7 +159,7 @@ MetricFamily name:
157
159
* MUST be the same as every MetricPoint's MetricName in the family.
> for MetricName and matching MetricFamily names without such suffixes. To improve parser reliability (i.e. matching
162
+
> for MetricName and matching MetricFamily names without such suffixes. To improve parser reliability (i.e. matching
161
163
> [MetricFamily metadata](#metricfamily-metadata)) and future compatibility, this specification requires MetricFamily name to strictly match MetricNames
162
164
> in the same family.
163
165
@@ -676,11 +678,12 @@ Timestamps SHOULD NOT use exponential float rendering for timestamps if nanoseco
676
678
677
679
There MUST NOT be an explicit separator between MetricFamilies. The next MetricFamily MUST be signalled with either metadata or a new sample metric name which cannot be part of the previous MetricFamily.
678
680
681
+
679
682
MetricFamilies MUST NOT be interleaved.
680
683
681
684
#### MetricFamily metadata
682
685
683
-
There are four pieces of metadata: The MetricFamily name, TYPE, UNIT and HELP. An example of the metadata for a counter Metric called foo is:
686
+
There are four pieces of metadata: The MetricFamily name, TYPE, UNIT and HELP. An example of the metadata for a counter Metric called foo is:
684
687
685
688
```openmetrics-add-eof
686
689
# TYPE foo counter
@@ -833,7 +836,7 @@ An example of a MetricFamily with no Metrics:
833
836
# TYPE foo gauge
834
837
```
835
838
836
-
An example with a Metric with a label and a MetricPoint with a timestamp:
839
+
An example with a Metric with a label and a MetricPoint with a timestamp:
837
840
838
841
```openmetrics-add-eof
839
842
# TYPE foo gauge
@@ -1245,13 +1248,16 @@ After namespacing by company or organisation, namespacing and naming should cont
1245
1248
1246
1249
For a common very well known existing piece of software, the name of the software itself may be sufficiently distinguishing. For example bind_ is probably sufficient for the DNS software, even though isc_bind_ would be the more usual naming.
1247
1250
1251
+
1248
1252
Metric names prefixed by scrape_ are used by ingestors to attach information related to individual expositions, so should not be exposed by applications directly. Metrics that have already been consumed and passed through a general purpose monitoring system may include such metric names on subsequent expositions.
1249
1253
If an exposer wishes to provide information about an individual exposition, a metric prefix such as myexposer_scrape_ may be used. A common example is a gauge myexposer_scrape_duration_seconds for how long that exposition took from the exposer's standpoint.
1250
1254
1251
1255
Within the Prometheus ecosystem a set of per-process metrics has emerged that are consistent across all implementations, prefixed with process_. For example for open file ulimits the MetricFamiles process_open_fds and process_max_fds gauges provide both the current and maximum value. (These names are legacy, if such metrics were defined today they would be more likely called process_fds_open and process_fds_limit). In general it is very challengings to get names with identical semantics like this, which is why different instrumentation should use different names.
1252
1256
1257
+
1253
1258
Avoid redundancy in metric names. Avoid substrings like "metric", "timer", "stats", "counter", "total", "float64" and so on - by virtue of being a metric with a given type (and possibly unit) exposed via OpenMetrics information like this is already implied so should not be included explicitly. You should not include label names of a metric in the metric name for the same reasons, and in addition subsequent aggregation of the metric by a monitoring system could make such information incorrect.
1254
1259
1260
+
1255
1261
Avoid including implementation details from other layers of your monitoring system in the metric names contained in your instrumentation. For example a MetricFamily name should not contain the string "openmetrics" merely because it happens to be currently exposed via OpenMetrics somewhere, or "prometheus" merely because your current monitoring system is Prometheus.
1256
1262
1257
1263
### Label Namespacing
@@ -1291,7 +1297,7 @@ OpenMetrics builds on the existing widely adopted Prometheus text exposition for
1291
1297
1292
1298
Metadata can come from different sources. Over the years, two main sources have emerged. While they are often functionally the same, it helps in understanding to talk about their conceptual differences.
1293
1299
1294
-
"Target metadata" is metadata commonly external to an exposer. Common examples would be data coming from service discovery, a CMDB, or similar, like information about a datacenter region, if a service is part of a particular deployment, or production or testing. This can be achieved by either the exposer or the ingestor adding labels to all Metrics that capture this metadata. Doing this through the ingestor is preferred as it is more flexible and carries less overhead. On flexibility, the hardware maintenance team might care about which server rack a machine is located in, whereas the database team using that same machine might care that it contains replica number 2 of the production database. On overhead, hardcoding or configuring this information needs an additional distribution path.
1300
+
"Target metadata" is metadata commonly external to an exposer. Common examples would be data coming from service discovery, a CMDB, or similar, like information about a datacenter region, if a service is part of a particular deployment, or production or testing. This can be achieved by either the exposer or the ingestor adding labels to all Metrics that capture this metadata. Doing this through the ingestor is preferred as it is more flexible and carries less overhead. On flexibility, the hardware maintenance team might care about which server rack a machine is located in, whereas the database team using that same machine might care that it contains replica number 2 of the production database. On overhead, hardcoding or configuring this information needs an additional distribution path.
1295
1301
1296
1302
"Exposer metadata" is coming from within an exposer. Common examples would be software version, compiler version, or Git commit SHA.
1297
1303
@@ -1322,7 +1328,7 @@ The above discussion is in the context of individual exposers. An exposition fro
1322
1328
1323
1329
### Client Calculations and Derived Metrics
1324
1330
1325
-
Exposers should leave any math or calculation up to ingestors. A notable exception is the Summary quantile which is unfortunately required for backwards compatibility. Exposition should be of raw values which are useful over arbitrary time periods.
1331
+
Exposers should leave any math or calculation up to ingestors. A notable exception is the Summary quantile which is unfortunately required for backwards compatibility. Exposition should be of raw values which are useful over arbitrary time periods.
1326
1332
1327
1333
As an example, you should not expose a gauge with the average rate of increase of a counter over the last 5 minutes. Letting the ingestor calculate the increase over the data points they have consumed across expositions has better mathematical properties and is more resilient to scrape failures.
1328
1334
@@ -1369,6 +1375,7 @@ As per the parent section, ingestors should be free to attach their own timestam
1369
1375
my_counter_total 1 123
1370
1376
```
1371
1377
1378
+
1372
1379
In case the specific time of the last change of a counter matters, this would be the correct way:
By putting the timestamp of last change into its own Gauge as a value, ingestors are free to attach their own timestamp to both Metrics.
1385
1392
1393
+
1386
1394
Experience has shown that exposing absolute timestamps (epoch is considered absolute here) is more robust than time elapsed, seconds since, or similar. In either case, they would be gauges. For example:
1387
1395
1388
1396
```
@@ -1430,7 +1438,7 @@ Specific limits run the risk of preventing reasonable use cases, for example whi
1430
1438
1431
1439
On the other hand, an exposition which is too large in some dimension could cause significant performance problems compared to the benefit of the metrics exposed. Thus some guidelines on the size of any single exposition would be useful.
1432
1440
1433
-
ingestors may choose to impose limits themselves, for in particular to prevent attacks or outages. Still, ingestors need to consider reasonable use cases and try not to disproportionately impact them. If any single value/metric/exposition exceeds such limits then the whole exposition must be rejected.
1441
+
ingestors may choose to impose limits themselves, for in particular to prevent attacks or outages. Still, ingestors need to consider reasonable use cases and try not to disproportionately impact them. If any single value/metric/exposition exceeds such limits then the whole exposition must be rejected.
1434
1442
1435
1443
In general there are three things which impact the performance of a general purpose monitoring system ingestion time series data: the number of unique time series, the number of samples over time in those series, and the number of unique strings such as metric names, label names, label values, and HELP. ingestors can control how often they ingest, so that aspect does not need further consideration.
0 commit comments