Skip to content

Prometheus exporter enhancement#4438

Merged
DaanHoogland merged 17 commits intoapache:mainfrom
soreana:prometheus-exporter-enhancement
Sep 30, 2022
Merged

Prometheus exporter enhancement#4438
DaanHoogland merged 17 commits intoapache:mainfrom
soreana:prometheus-exporter-enhancement

Conversation

@soreana
Copy link
Copy Markdown
Member

@soreana soreana commented Oct 30, 2020

Description

In this pull request, I added new functionality to Cloudstack prometheus exporter. To see the differences please check the testing section.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

How Has This Been Tested?

This pull request contains seven commits. Except for the dfb35e5 commit, they are all added new functionality to the Prometheus. In the subsequent sections, I will describe every commit functionality. I tested them in my test environment with three management servers, one DB node (MySQL), and two KVM hypervisor.

1. Export count of total/up/down hosts by tags 0dbe9e7

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_hosts_total

Output Before Changes:

cloudstack_hosts_total{zone="mgt122-60",filter="online"} 2
cloudstack_hosts_total{zone="mgt122-60",filter="offline"} 0
cloudstack_hosts_total{zone="mgt122-60",filter="total"} 2

Output After Changes:

cloudstack_hosts_total{zone="mgt122-60",filter="online"} 2
cloudstack_hosts_total{zone="mgt122-60",filter="offline"} 0
cloudstack_hosts_total{zone="mgt122-60",filter="total"} 2
cloudstack_hosts_total{zone="mgt122-60",filter="total",tags="tage1"} 1
cloudstack_hosts_total{zone="mgt122-60",filter="online",tags="tage1"} 1
cloudstack_hosts_total{zone="mgt122-60",filter="offline",tags="tage1"} 0 

2. Export count of vms by state and host tag e6a81d1

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_vms_total_by_tag

After changes, the following line added to the Prometheus output:

cloudstack_vms_total_by_tag{zone="mgt122-60",filter="starting",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="running",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="stopping",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="stopped",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="destroyed",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="expunging",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="migrating",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="error",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="unknown",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="shutdown",tags="tage1"} 0

3. Add host tags to host cpu/cores/memory usage in Prometheus exporter eefd9f1

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run following command and justify output with the expected results. curl http://127.0.0.1:9595/metrics | grep cloudstack_host_vms_cores_total
  4. repeat step three for cloudstack_host_cpu_usage_mhz_total and cloudstack_host_memory_usage_mibs_total

Output Before Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4
cloudstack_host_vms_cores_total_by_tag\{zone="mgt122-60",filter="allocated",tags="tage1"} 0

Output After Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0",tags="tage1"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0",tags="tage1"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0",tags=""} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0",tags=""} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4 

4. Cloudstack Prometheus exporter: Add allocated capacity group by host tag. a489e3c

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_host_vms_cores_total

Output Before Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4
cloudstack_host_vms_cores_total_by_tag\{zone="mgt122-60",filter="allocated",tags="tage1"} 0

Output After Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0",tags="tage1"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0",tags="tage1"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0",tags=""} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0",tags=""} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4
cloudstack_host_vms_cores_total_by_tag\{zone="mgt122-60",filter="allocated",tags="tage1"} 0

5. Show count of Active domains on grafana de08479

============== Scenario One ==============

  1. Enable Prometheus.
  2. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_active_domains_total. Output is:
cloudstack_active_domains_total{zone="mgt122-60"} 1
  1. Create a new domain
  2. Repeat step two. The output will not change.
  3. Add a new account to the domain created in step three.
  4. Repeat step two. The output will change to:
cloudstack_active_domains_total{zone="mgt122-60"} 2

============== Scenario Two ==============

  1. Use previous environment
  2. Disable all account in domain created in step 3 of Scenario one.
  3. Repeat step two of Scenario one. The output will change to:
cloudstack_active_domains_total{zone="mgt122-60"} 1

6. Show count of Active accounts and vms by size on grafana d7aa19f

============== Scenario One ==============

  1. Enable Prometheus.
  2. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_active_accounts_total. output is:
cloudstack_active_accounts_total{zone="mgt122-60"} 1
  1. Create a new account
  2. Repeat step two. The output will change to:
cloudstack_active_accounts_total\{zone="mgt122-60"} 2

============== Scenario Two ==============

  1. Enable Prometheus.
  2. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_vms_total_by_size. output is:
cloudstack_vms_total_by_size\{zone="mgt122-60",cpu="1",memory="512"} 2
  1. Add new instance with different offering
  2. Repeat step two. The output will change to:
cloudstack_vms_total_by_size{zone="mgt122-60",cpu="1",memory="512"} 2
cloudstack_vms_total_by_size\{zone="mgt122-60",cpu="1",memory="1024"} 1

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Jul 1, 2021

Hi @soreana is this PR ready for review?
@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖️ centos7 ✔️ centos8 ✖️ debian. SL-JID 444

@soreana
Copy link
Copy Markdown
Member Author

soreana commented Jul 1, 2021

Hey @nvazquez
Yes, it is ready for review.

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Jul 1, 2021

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 452

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Jul 1, 2021

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1191)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 38732 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t1191-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 88 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Aug 9, 2021

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 814

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Aug 9, 2021

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Sep 8, 2021

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖️ el7 ✔️ el8 ✖️ debian ✔️ suse15. SL-JID 1165

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Sep 14, 2021

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1237

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@soreana
Copy link
Copy Markdown
Member Author

soreana commented Sep 28, 2022

@DaanHoogland, I noticed that the value for CPU allocated and CPU used are incorrect in my CloudStack test environment. Although I dug deeper, I noticed that those values are wrong in CloudStack itself and it wasn't related to the pr, and it was my test environment issue, not the PR.

Did you encounter that issue recently?

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 28, 2022

Codecov Report

Merging #4438 (6aab64b) into main (d9dd4c1) will decrease coverage by 0.00%.
The diff coverage is 2.89%.

@@             Coverage Diff              @@
##               main    #4438      +/-   ##
============================================
- Coverage     10.52%   10.52%   -0.01%     
  Complexity     6784     6784              
============================================
  Files          2464     2464              
  Lines        243988   244167     +179     
  Branches      38185    38204      +19     
============================================
+ Hits          25690    25696       +6     
- Misses       215065   215238     +173     
  Partials       3233     3233              
Impacted Files Coverage Δ
...n/java/com/cloud/capacity/dao/CapacityDaoImpl.java 3.24% <0.00%> (-0.13%) ⬇️
.../src/main/java/com/cloud/vm/dao/UserVmDaoImpl.java 0.77% <0.00%> (-0.07%) ⬇️
.../main/java/com/cloud/vm/dao/VMInstanceDaoImpl.java 27.34% <0.00%> (-0.74%) ⬇️
...che/cloudstack/metrics/PrometheusExporterImpl.java 0.00% <0.00%> (ø)
...e/cloudstack/metrics/PrometheusExporterServer.java 0.00% <0.00%> (ø)
...oudstack/metrics/PrometheusExporterServerImpl.java 0.00% <ø> (ø)
...c/main/java/com/cloud/user/dao/AccountDaoImpl.java 30.12% <66.66%> (+2.23%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖️ el7 ✔️ el8 ✔️ debian ✖️ suse15. SL-JID 4306

@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 10 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 4313

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test matrix

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-5048)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 41179 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t5048-kvm-centos7.zip
Smoke tests completed. 103 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-5047)
Environment: xenserver-71 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42024 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t5047-xenserver-71.zip
Smoke tests completed. 102 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_upgrade_kubernetes_ha_cluster Failure 882.91 test_kubernetes_clusters.py

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-5049)
Environment: vmware-65u2 (x2), Advanced Networking with Mgmt server 7
Total time taken: 46641 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t5049-vmware-65u2.zip
Smoke tests completed. 102 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_create_pvlan_network Error 0.06 test_pvlan.py

@DaanHoogland
Copy link
Copy Markdown
Contributor

@DaanHoogland, I noticed that the value for CPU allocated and CPU used are incorrect in my CloudStack test environment. Although I dug deeper, I noticed that those values are wrong in CloudStack itself and it wasn't related to the pr, and it was my test environment issue, not the PR.

Did you encounter that issue recently?

No I haven't @soreana , but I remember seeing an issue about something similar. I'll keep an eye out.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-5057)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 44156 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t5057-vmware-67u3.zip
Smoke tests completed. 102 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_02_upgrade_kubernetes_cluster Failure 568.13 test_kubernetes_clusters.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.