Skip to content

chore: better prometheus metrics#39475

Closed
Copilot wants to merge 13 commits into
developfrom
copilot/improve-metric-naming
Closed

chore: better prometheus metrics#39475
Copilot wants to merge 13 commits into
developfrom
copilot/improve-metric-naming

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 9, 2026

Motivation

The existing Prometheus metrics in Rocket.Chat had several issues:

  • Summaries are not aggregatable across instances; Histograms are preferred for latency measurements in multi-instance deployments.
  • Several Counter metrics were missing the _total suffix required by the OpenMetrics/Prometheus naming convention.
  • The REST API layer lacked observability for response sizes and concurrent in-flight requests.
  • We were not exposing the Event Loop Utilization metric

Changes

New metrics:

  • Histogram counterparts (_seconds) for all existing latency Summaries: rocketchat_meteor_methods_seconds, rocketchat_callbacks_seconds, rocketchat_hooks_seconds, rocketchat_rest_api_seconds, rocketchat_meteor_subscriptions_seconds, rocketchat_apps_bridge_methods_seconds, rocketchat_queue_wait_duration_seconds
  • _total suffixed duplicates for counters that lacked them: rocketchat_deprecations_total, rocketchat_metrics_requests_total, rocketchat_messages_sent_total, rocketchat_notifications_sent_total, rocketchat_ddp_rate_limit_exceeded_total, rocketchat_livechat_webhooks_success_total, rocketchat_livechat_webhooks_failures_total
  • rocketchat_rest_api_response_size_bytes – histogram of REST API response payload sizes in bytes
  • rocketchat_rest_api_active_requests – gauge tracking the number of concurrent in-flight REST API requests
  • nodejs_event_loop_utilization_ratio - gauge tracking the ratio of the event loop utilization reported by NodeJS

Metrics middleware improvements:

  • Renamed summary/histogram params to endpointTimeSummary/endpointTimeHistogram to clearly indicate what is being measured
  • Made endpointTimeHistogram a required parameter
  • Removed high-cardinality labels (user_agent, version) from the histogram to reduce cardinality while keeping them on the summary
  • Reordered middleware chains to metrics → tracer → logger in all three API entry points
Original prompt

You are working on PR #39095 in RocketChat/Rocket.Chat titled "add eventLoopUtilization to metrics".

Task: Improve metric naming in apps/meteor/app/metrics/server/lib/metrics.ts to follow Prometheus best practices, without breaking existing metrics.

Requirements:

  1. Counters: any new counter metrics must end with _total or _count.
  2. Summaries: do not introduce new Summaries; duplicate any existing Summary metric with a new metric that is a Histogram instead.
    • Histogram names must include a unit suffix representing the measured unit (e.g. _seconds, _bytes).
  3. Backward compatibility: Do not remove or rename existing metrics.
    • If an existing metric name does not conform, create a duplicate metric with a corrected name/type.
  4. Dual reporting: For every duplicated metric, find all usages across the repo and ensure the code records/updates both the old and the new metric, so they observe the same events.
    • This includes increments/dec/inc/observe/startTimer/etc.
  5. Keep changes focused and consistent with existing patterns in the repo.

Implementation guidance:

  • Update apps/meteor/app/metrics/server/lib/metrics.ts to export the new corrected metrics alongside existing ones.
  • Replace Summary usage by adding a Histogram duplicate (keeping Summary in place) and update call sites to observe both.
  • If adding histograms, choose sensible bucket boundaries consistent with existing Rocket.Chat metrics (if any).

Deliverables:

  • Code changes across metrics definition + all affected call sites.
  • Ensure TypeScript compilation passes.
  • Update or add tests if the repo has metrics-related tests; otherwise ensure no lint/type errors.

Context:

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: The metrics in the @RocketChat/Rocket.Chat/files/apps/meteor/app/metrics/server/lib/metrics.ts don't follow best practices. Add new ones to the file following these topics:

  • Counters should end with a _total or _count suffix
  • Summaries should be converted to Histograms, with a unit suffix that represents the unit that is being measured, e.g. _seconds or _bytes
  • Do not remove existing metrics - if a metric does not conform to best practices, duplicate it, adapt the name if needed, and make it adhere to the standards
  • When duplicating an existing metric, make sure to look for files that use that metric and add a call to the new metric as well, so both original and duplicates observe the same event

Assistant: We’re waiting for you to approve the pending change request (so I can access the repository and update apps/meteor/app/metrics/server/lib/metrics.ts plus all call sites) before I can continue.

This pull request was created from Copilot chat.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@dionisio-bot
Copy link
Copy Markdown
Contributor

dionisio-bot Bot commented Mar 9, 2026

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Mar 9, 2026

⚠️ No Changeset found

Latest commit: 371c277

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copilot AI changed the title [WIP] Improve metric naming to follow Prometheus best practices Add Prometheus best-practices metric duplicates (Histograms + _total counters) Mar 9, 2026
@d-gubert d-gubert changed the base branch from develop to test-new-metrics March 9, 2026 15:08
@d-gubert d-gubert force-pushed the copilot/improve-metric-naming branch from 27f3984 to 1697a76 Compare March 9, 2026 15:09
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 89.36170% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.95%. Comparing base (b570388) to head (371c277).
⚠️ Report is 77 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #39475      +/-   ##
===========================================
- Coverage    70.96%   70.95%   -0.01%     
===========================================
  Files         3198     3198              
  Lines       113449   113487      +38     
  Branches     20558    20541      -17     
===========================================
+ Hits         80507    80525      +18     
- Misses       30892    30912      +20     
  Partials      2050     2050              
Flag Coverage Δ
e2e 60.46% <ø> (+0.05%) ⬆️
e2e-api 48.29% <92.85%> (+0.10%) ⬆️
unit 71.61% <88.88%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@d-gubert d-gubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply each of the comments in this change request as a separate commit.

Additionally, add the following metrics:

  • A histogram to track response sizes in bytes
  • A gauge to measure active requests

Both can be observed in the metrics middleware

Comment thread apps/meteor/app/api/server/middlewares/metrics.ts Outdated
Comment thread apps/meteor/app/api/server/api.ts Outdated
Comment thread apps/meteor/app/integrations/server/api/api.ts Outdated
Comment thread apps/meteor/app/lib/server/startup/rateLimiter.js Outdated
Comment thread apps/meteor/app/metrics/server/lib/metrics.ts Outdated
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 9, 2026

@copilot apply each of the comments in this change request as a separate commit.

Additionally, add the following metrics:...

All comments applied as separate commits:

  1. 0e50faf – Remove user_agent/version labels from rocketchatRestApiSeconds histogram
  2. 1314857 – Make histogram non-optional in metricsMiddleware
  3. b616be8 – Extract rate-limit labels to variable in rateLimiter.js
  4. 0f3b27f – Move metrics middleware first, tracer second in api.ts
  5. 012a46a – Move metrics middleware to end of chain in integrations api.ts
  6. 1ff5b24 – Add rocketchat_rest_api_response_size_bytes (Histogram, buckets in bytes) and rocketchat_rest_api_active_requests (Gauge) to metrics.ts, with both observed in metricsMiddleware — the gauge is incremented before next() and decremented after, and the response-size histogram reads content-length from the response headers.

@d-gubert
Copy link
Copy Markdown
Member

d-gubert commented Mar 9, 2026

@copilot fix the errors in this CI run https://github.com/RocketChat/Rocket.Chat/actions/runs/22868898741?pr=39475

Each fix gets its own commit. Do not fix lint errors from files you did not previously edit

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 9, 2026

@copilot fix the errors in this CI run https://github.com/RocketChat/Rocket.Chat/actions/runs/22868898741?pr=39475

Each fix gets its own commit. **Do not fix lint errors from file...

Fixed in commit e5598b7:

  • TypeScript + unit test errors (metrics.spec.ts): added the required histogram, responseSizeHistogram, and activeRequestsGauge mock objects to each of the three test cases.
  • Lint (prettier) errors (api.ts, integrations/api.ts, rest.ts): broke the long metricsMiddleware({...}) calls onto multiple lines per prettier's expected format.

Copilot AI changed the title Add Prometheus best-practices metric duplicates (Histograms + _total counters) Rename summary/histogram params in metricsMiddleware to clarify intent Mar 9, 2026
@d-gubert d-gubert changed the title Rename summary/histogram params in metricsMiddleware to clarify intent chore: better prometheus metrics Mar 10, 2026
@d-gubert d-gubert force-pushed the copilot/improve-metric-naming branch from a552243 to 982ca27 Compare March 10, 2026 23:57
@d-gubert d-gubert changed the base branch from test-new-metrics to develop March 12, 2026 14:26
@d-gubert d-gubert force-pushed the copilot/improve-metric-naming branch from e5d5c1a to dd83528 Compare March 12, 2026 14:37
Copy link
Copy Markdown
Member

@d-gubert d-gubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address the change requests

Comment thread apps/meteor/app/metrics/server/lib/metrics.ts Outdated
Comment thread apps/meteor/app/metrics/server/lib/metrics.ts Outdated
d-gubert and others added 13 commits March 12, 2026 15:25
…counters)"

Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
…stogram

Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
…chain

Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
…leware

Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
…ctiveRequestsGauge mocks

Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
…stogram in metricsMiddleware

Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
Co-authored-by: d-gubert <1810309+d-gubert@users.noreply.github.com>
@d-gubert d-gubert force-pushed the copilot/improve-metric-naming branch from d8be014 to 371c277 Compare March 12, 2026 18:25
@d-gubert d-gubert marked this pull request as ready for review March 19, 2026 22:14
@d-gubert d-gubert requested review from a team as code owners March 19, 2026 22:14
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 22 files

@d-gubert
Copy link
Copy Markdown
Member

Closing in favor of #39857

@d-gubert d-gubert closed this Mar 25, 2026
@d-gubert d-gubert deleted the copilot/improve-metric-naming branch March 25, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants