| title | Integrate TiDB Cloud with Prometheus and Grafana (Preview) |
|---|---|
| summary | Learn how to monitor your TiDB Cloud instances with the Prometheus and Grafana integration. |
TiDB Cloud provides a Prometheus API endpoint. If you have a Prometheus service, you can monitor key metrics of TiDB Cloud from the endpoint easily.
This document describes how to configure your Prometheus service to read key metrics from the {{{ .essential }}}{{{ .premium }}} endpoint and how to view the metrics using Grafana.
-
To integrate TiDB Cloud with Prometheus, you must have a self-hosted or managed Prometheus service.
-
To set up third-party metrics integration for TiDB Cloud, you must have the
Organization OwnerorInstance Manageraccess in TiDB Cloud. To view the integration page, you need at least theProject ViewerorInstance Viewerrole to access the target {{{ .essential }}}{{{ .premium }}} instance under your Organization in TiDB Cloud.
- Prometheus and Grafana integrations are not available for TiDB Cloud Starter instances.
- Prometheus and Grafana integrations are not available when the status of your {{{ .essential }}}{{{ .premium }}} instance is CREATING, RESTORING, PAUSED, or RESUMING.
Before configuring your Prometheus service to read metrics of TiDB Cloud, you need to generate a scrape_config YAML file in TiDB Cloud first. The scrape_config file contains a unique bearer token that allows the Prometheus service to monitor your target {{{ .essential }}}{{{ .premium }}} instance.
- In the TiDB Cloud console, navigate to the My TiDB page, and then click the name of your target {{{ .essential }}} instance to go to its overview page.
- In the left navigation pane, click Integrations > Integration to Prometheus(Preview).
- Click Add File to generate and show the
scrape_configfile for the current {{{ .essential }}} instance. - Make a copy of the
scrape_configfile content for later use.
- In the TiDB Cloud console, navigate to the My TiDB page, and then click the name of your target {{{ .premium }}} instance to go to its overview page.
- In the left navigation pane, click Settings > Integrations > Integration to Prometheus(Preview).
- Click Add File to generate and show the
scrape_configfile for the current {{{ .premium }}} instance. - Make a copy of the
scrape_configfile content for later use.
Note:
- For security reasons, TiDB Cloud only shows a newly generated
scrape_configfile once. Ensure that you copy the content before closing the file window.- If you forget, delete the
scrape_configfile in TiDB Cloud and generate a new one. To delete ascrape_configfile, select the file, click ..., and then click Delete.
-
In the monitoring directory specified by your Prometheus service, locate the Prometheus configuration file.
For example,
/etc/prometheus/prometheus.yml. -
In the Prometheus configuration file, locate the
scrape_configssection, and then copy thescrape_configfile content obtained from TiDB Cloud to the section. -
In your Prometheus service, check Status > Targets to verify that the new
scrape_configfile has been read. If not, you might need to restart the Prometheus service.
After your Prometheus service reads metrics from TiDB Cloud, you can use Grafana GUI dashboards to visualize the metrics as follows:
-
Download the Grafana dashboard JSON file for {{{ .essential }}}{{{ .premium }}} from the following link:
-
Import this JSON to your own Grafana GUI to visualize the metrics.
Note:
If you are already using Prometheus and Grafana to monitor {{{ .essential }}}{{{ .premium }}} instances and want to incorporate the newly available metrics, it is recommended that you create a new dashboard instead of directly updating the JSON of the existing one.
-
(Optional) Customize the dashboard as needed by adding or removing panels, changing data sources, and modifying display options.
For more information about how to use Grafana, see Grafana documentation.
To improve data security, periodically rotate scrape_config file bearer tokens.
- Follow Step 1 to create a new
scrape_configfile for Prometheus. - Add the content of the new file to your Prometheus configuration file.
- Once you confirm that your Prometheus service can read from TiDB Cloud, remove the content of the old
scrape_configfile from your Prometheus configuration file. - On the Integrations page of your {{{ .essential }}}{{{ .premium }}} instance, delete the corresponding old
scrape_configfile to block anyone else from using it to read from the TiDB Cloud Prometheus endpoint.
Prometheus tracks the following metric data for your {{{ .essential }}}{{{ .premium }}} instance.
Note:
{{{ .essential }}} does not support TiCDC components, so the
tidbcloud_changefeed_*metrics are currently not available.
| Metric name | Metric type | Labels | Description |
|---|---|---|---|
tidbcloud_db_total_connection |
gauge | instance_id: <instance id>instance_name: <instance name> |
The number of current connections in your TiDB server |
tidbcloud_db_active_connections |
gauge | instance_id: <instance id>instance_name: <instance name> |
The number of active connections |
tidbcloud_db_disconnections |
gauge | result: Error|...instance_id: <instance id>instance_name: <instance name> |
The number of clients disconnected by connection result |
tidbcloud_db_database_time |
gauge | sql_type: Select|Insert|...instance_id: <instance id>instance_name: <instance name> |
A time model statistic that represents the sum of all processes' CPU consumption plus the sum of non-idle wait time |
tidbcloud_db_query_per_second |
gauge | type: Select|Insert|...instance_id: <instance id>instance_name: <instance name> |
The number of SQL statements executed per second, counted according to statement types |
tidbcloud_db_failed_queries |
gauge | type: planner:xxx|executor:2345|...instance_id: <instance id>instance_name: <instance name> |
The statistics of error types (for example, syntax errors, primary key conflicts) occurred when executing SQL statements per second |
tidbcloud_db_command_per_second |
gauge | type: Query|Ping|...instance_id: <instance id>instance_name: <instance name> |
The number of commands processed by TiDB per second |
tidbcloud_db_queries_using_plan_cache_ops |
gauge | instance_id: <instance id>instance_name: <instance name> |
The statistics of queries hitting the Execution Plan Cache per second |
tidbcloud_db_average_query_duration |
gauge | sql_type: Select|Insert|...instance_id: <instance id>instance_name: <instance name> |
The duration between the time a network request is sent to TiDB and returned to the client |
tidbcloud_db_transaction_per_second |
gauge | type: Commit|Rollback|...txn_mode: optimistic|pessimisticinstance_id: <instance id>instance_name: <instance name> |
The number of transactions executed per second |
tidbcloud_db_row_storage_used_bytes |
gauge | instance_id: <instance id>instance_name: <instance name> |
The row-based storage size of the {{{ .essential }}} instance in bytes |
tidbcloud_db_columnar_storage_used_bytes |
gauge | instance_id: <instance id>instance_name: <instance name> |
The columnar storage size of the {{{ .essential }}} instance in bytes. Returns 0 if TiFlash is not enabled. |
tidbcloud_resource_manager_resource_request_unit_total |
gauge | instance_id: <instance id>instance_name: <instance name> |
The total Request Units (RU) consumed. |
| Metric name | Metric type | Labels | Description |
|---|---|---|---|
tidbcloud_db_total_connection |
gauge | instance_id: <instance id>instance_name: <instance name> |
The number of current connections in your TiDB server |
tidbcloud_db_active_connections |
gauge | instance_id: <instance id>instance_name: <instance name> |
The number of active connections |
tidbcloud_db_disconnections |
gauge | result: Error|...instance_id: <instance id>instance_name: <instance name> |
The number of clients disconnected by connection result |
tidbcloud_db_database_time |
gauge | sql_type: Select|Insert|...instance_id: <instance id>instance_name: <instance name> |
A time model statistic that represents the sum of all processes' CPU consumption plus the sum of non-idle wait time |
tidbcloud_db_query_per_second |
gauge | type: Select|Insert|...instance_id: <instance id>instance_name: <instance name> |
The number of SQL statements executed per second, counted according to statement types |
tidbcloud_db_failed_queries |
gauge | type: planner:xxx|executor:2345|...instance_id: <instance id>instance_name: <instance name> |
The statistics of error types (for example, syntax errors, primary key conflicts) occurred when executing SQL statements per second |
tidbcloud_db_command_per_second |
gauge | type: Query|Ping|...instance_id: <instance id>instance_name: <instance name> |
The number of commands processed by TiDB per second |
tidbcloud_db_queries_using_plan_cache_ops |
gauge | instance_id: <instance id>instance_name: <instance name> |
The statistics of queries hitting the Execution Plan Cache per second |
tidbcloud_db_average_query_duration |
gauge | sql_type: Select|Insert|...instance_id: <instance id>instance_name: <instance name> |
The duration between the time a network request is sent to TiDB and returned to the client |
tidbcloud_db_transaction_per_second |
gauge | type: Commit|Rollback|...txn_mode: optimistic|pessimisticinstance_id: <instance id>instance_name: <instance name> |
The number of transactions executed per second |
tidbcloud_db_row_storage_used_bytes |
gauge | instance_id: <instance id>instance_name: <instance name> |
The row-based storage size of the {{{ .premium }}} instance in bytes |
tidbcloud_db_columnar_storage_used_bytes |
gauge | instance_id: <instance id>instance_name: <instance name> |
The columnar storage size of the {{{ .premium }}} instance in bytes. |
tidbcloud_resource_manager_resource_request_unit_total |
gauge | instance_id: <instance id>instance_name: <instance name> |
The total Request Units (RU) consumed. |
tidbcloud_changefeed_latency |
gauge | changefeed: <changefeed-id>instance_id: <instance id>instance_name: <instance name> |
The data replication latency between the upstream and the downstream of a changefeed |
tidbcloud_changefeed_status |
gauge | changefeed: <changefeed-id>instance_id: <instance id>instance_name: <instance name> |
Changefeed status:-1: Unknown0: Normal1: Warning2: Failed3: Stopped4: Finished6: Warning7: Other |
-
Why does the same metric have different values on Grafana and the TiDB Cloud console at the same time?
Grafana and TiDB Cloud use different aggregation calculation logic, so the displayed aggregated values might differ. You can adjust the
mini stepconfiguration in Grafana to get more fine-grained metric values.