Skip to content

feat(metrics): add block transaction count and SR set change monitoring#6624

Merged
kuny0707 merged 18 commits into
tronprotocol:developfrom
warku123:metric_modify_clean
Apr 29, 2026
Merged

feat(metrics): add block transaction count and SR set change monitoring#6624
kuny0707 merged 18 commits into
tronprotocol:developfrom
warku123:metric_modify_clean

Conversation

@warku123
Copy link
Copy Markdown
Contributor

@warku123 warku123 commented Apr 1, 2026

What does this PR do?

Add two new metrics enhancements for better network monitoring:

  1. Block Transaction Count Histogram

    • Replace tron:block_empty_total counter with tron:block_transaction_count histogram
    • Track transaction count distribution per block with buckets: [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
    • Empty blocks can be monitored via le=0.0 bucket
  2. SR Set Change Monitoring

    • Add tron:sr_set_change counter to track SR membership changes
    • Detect SR additions and removals at each maintenance time interval
    • Labels: action (add/remove), witness (SR address)

Changes:

  • Add overloaded init() method in MetricsHistogram for custom buckets
  • Add BLOCK_TRANSACTION_COUNT histogram metric
  • Add SR_SET_CHANGE counter with SR_ADD/SR_REMOVE labels
  • Implement SR set change detection logic in BlockChainMetricManager
  • Update PrometheusApiServiceTest with comprehensive test coverage

Why are these changes required?

  • Transaction histogram provides richer insights than simple counter for network analysis while maintaining empty block detection capability
  • SR set change monitoring enables tracking of network consensus participant changes for governance analysis

Compared to #6602, this PR provides a cleaner implementation focused solely on metrics enhancement without additional complexity.

This PR has been tested by:

  • Unit Tests (updated PrometheusApiServiceTest)
  • Manual Testing

Follow up

N/A

Extra details

…monitoring

Replace the dedicated tron:block_empty_total counter with a more comprehensive
tron:block_transaction_count histogram that tracks the distribution of
transaction counts per block.

Changes:
- Add overloaded init() method in MetricsHistogram to support custom buckets
- Add BLOCK_TRANSACTION_COUNT histogram with buckets [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000]
- Record transaction count for all blocks (including empty blocks with txCount=0)
- Empty blocks can be queried via bucket le=0.0
- Remove unused BLOCK_EMPTY counter

This provides richer insights for network analysis while still supporting
empty block monitoring via histogram bucket queries.

Closes tronprotocol#6590
@warku123 warku123 changed the title feat(metrics): add block transaction count histogram for empty block monitoring feat(metrics): add block transaction count and SR set change monitoring Apr 1, 2026
warku123 added a commit to warku123/java-tron that referenced this pull request Apr 2, 2026
Replace duplicated miner string literal with MINER_LABEL constant
to fix SonarQube code smell.

Relates tronprotocol#6624
warku123 added a commit to warku123/java-tron that referenced this pull request Apr 2, 2026
- Replace duplicated miner string literal with MINER_LABEL constant
  in PrometheusApiServiceTest to fix SonarQube code smell
- Add //NOSONAR comment for new BLOCK_TRANSACTION_COUNT histogram
  label name in MetricsHistogram

Relates tronprotocol#6624
@warku123 warku123 force-pushed the metric_modify_clean branch from 4433c52 to 93d0083 Compare April 2, 2026 03:44
warku123 added a commit to warku123/java-tron that referenced this pull request Apr 2, 2026
Extract repeated 'miner' string literal into MINER_LABEL constant
in MetricsHistogram to follow DRY principle and fix SonarQube warning.

Also update PrometheusApiServiceTest to use MINER_LABEL constant
for test assertions.

Relates tronprotocol#6624
@warku123 warku123 force-pushed the metric_modify_clean branch from 93d0083 to 6315ffc Compare April 2, 2026 03:54
Extract repeated 'miner' string literal into MINER_LABEL constant
in MetricsHistogram to follow DRY principle and fix SonarQube warning.

Also update PrometheusApiServiceTest to use MINER_LABEL constant
for test assertions.

Relates tronprotocol#6624
Comment thread common/src/main/java/org/tron/common/prometheus/MetricKeys.java
init(MetricKeys.Counter.TXS, "tron txs info .", "type", "detail");
init(MetricKeys.Counter.MINER, "tron miner info .", "miner", "type");
init(MetricKeys.Counter.BLOCK_FORK, "tron block fork info .", "type");
init(MetricKeys.Counter.SR_SET_CHANGE, "tron sr set change .", "action", "witness");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The witness label uses addresses. Although the Active SR pool is relatively limited (27 slots), as SRs rotate in and out over time, label values will accumulate monotonically — they only grow, never shrink. This is a known Prometheus anti-pattern.

Practical impact assessment:

  • TRON's SR candidate pool is not infinite, and rotation frequency is low, so it won't explode in the short term

  • However, as a long-running node metric, months or years of operation could still accumulate a significant number of time series

Possible approaches:

  • If the address dimension is truly needed, keep the current design, but document this characteristic

  • Or remove the witness label entirely, keeping only action (add/remove), and output address details via logs instead. Follow the principle that Prometheus should be used for alerting and awareness, while logs are for precise investigation and root-cause analysis.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the high-cardinality concern — it's a valid Prometheus best practice to watch for.

However, I believe the witness label is acceptable here given TRON's specific constraints:

  1. Bounded candidate pool: The SR candidate pool is finite. Even over years of operation, the total number of distinct addresses that have ever held an SR slot has a practical ceiling — far from the
    unbounded growth typical of high-cardinality anti-patterns.
  2. Low rotation frequency: SR set changes only occur at maintenance intervals (every 6 hours), so new label values are introduced very infrequently.
  3. Diagnostic value: The primary purpose of this metric is to identify which SR was added or removed. Dropping the witness label would reduce the metric to just "something changed", requiring operators to
    cross-reference logs — losing the at-a-glance visibility in Grafana dashboards that makes this metric useful in the first place.

In summary, the worst-case time series count is roughly candidate_pool_size × 2 (add/remove), which stays well within Prometheus's comfortable operating range.

That said, I agree it's worth documenting this characteristic. Could you suggest what form you'd prefer — a code comment at the metric registration site, a note in the PR description, or something else?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to a code comment at the metric registration site.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halibobo1205 Added a comment at the SR_SET_CHANGE constant definition in MetricKeys.java (commit 96bf315) explaining the cardinality rationale:

witness label: bounded cardinality -- SR candidate pool is finite, rotation is infrequent (at most once per maintenance interval); kept for at-a-glance SR identification in dashboards rather than requiring log cross-referencing.

- Extract MINER_LABEL to MetricLabels.Histogram.MINER as shared constant
- Use int instead of double for block count in tests
- Store witness addresses as base58 directly, removing unnecessary hex round-trip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@warku123 warku123 requested a review from halibobo1205 April 9, 2026 02:46
@warku123 warku123 requested a review from Little-Peony April 27, 2026 03:03
The SR set change detection used to live in BlockChainMetricManager.applyBlock
and inferred SR set transitions from nextMaintenanceTime changes plus a
diff against a cached HashSet. That approach was indirect (re-derived
state already known at the source) and required a fork-rollback guard.

Hoist the recording into the consensus layer where the transition is
explicit: MaintenanceManager.doMaintenance already computes currentWits
and newWits when the active witness set changes, so it can call
SrSetChangeMetric.record(currentWits, newWits) directly. The new helper
lives in the common module so both consensus and framework can depend on
it without inverting the existing module hierarchy.

Tests: SrSetChangeMetricTest exercises SR_ADD via the real
doMaintenance() flow, plus the no-vote / metrics-disabled / SR_REMOVE
branches. PrometheusApiServiceTest drops the obsolete SR assertions.
@Sunny6889 Sunny6889 requested a review from halibobo1205 April 27, 2026 08:07
@warku123 warku123 requested a review from Sunny6889 April 27, 2026 08:51
- move BLOCK_TRANSACTION_COUNT histogram recording from
  BlockChainMetricManager.applyBlock to Manager.applyBlock so that
  witness-produced blocks are also sampled (previously only peer-received
  blocks went through BlockChainMetricManager)
- rename SRMetrics.record() to recordSrSetChange() to leave room for
  additional SR-related metrics
Record the histogram at the start of pushBlock (inside the synchronized
block, before any early returns) so abnormal pushes — duplicate, stale,
fork-switched, or merkle-failed blocks — are also observable.
Copy link
Copy Markdown
Collaborator

@halibobo1205 halibobo1205 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

  1. Adding a METRICS_CHANGELOG.md to the repo to track metric additions, modifications, and deprecations — future PRs can maintain it as well
  2. Including a metrics change entry in release notes
  3. Filing a follow-up issue/TODO to update the official metrics documentation and the Grafana dashboard template

warku123 added a commit to warku123/tron-docker that referenced this pull request Apr 28, 2026
Add two new panel sections to the java-tron-server Grafana dashboard
to support metrics from tronprotocol/java-tron#6624:

- Block Metrics: empty block rate, per-SR empty block count,
  avg transactions per block, transaction count distribution
- SR Set Change: event table and cumulative totals by action
@kuny0707 kuny0707 merged commit 1a97a81 into tronprotocol:develop Apr 29, 2026
12 checks passed
warku123 added a commit to warku123/java-tron that referenced this pull request Apr 29, 2026
Add a single discovery point for Prometheus metric additions, changes,
and removals so node operators and dashboard maintainers do not need to
scan individual pull requests. Sectioning mirrors java-tron GitHub
Releases (Notice / New Features / ... grouped by module) for
consistency with the rest of the repository.

The 4.8.2 section records the two metrics introduced in tronprotocol#6624
(`tron:block_transaction_count`, `tron:sr_set_change`) and a
Pre-4.8.2 baseline snapshot of all metrics emitted prior to this file,
so the document doubles as a quick reference catalog. Per-version
provenance for baseline entries is not tracked; the file points at
git log on common/src/main/java/org/tron/common/prometheus/ for that.
warku123 added a commit to warku123/java-tron that referenced this pull request May 6, 2026
Add a single discovery point for Prometheus metric additions, changes,
and removals so node operators and dashboard maintainers do not need to
scan individual pull requests. Sectioning mirrors java-tron GitHub
Releases (Notice / New Features / ... grouped by module) for
consistency with the rest of the repository.

The 4.8.2 section records the two metrics introduced in tronprotocol#6624
(`tron:block_transaction_count`, `tron:sr_set_change`) and a
Pre-4.8.2 baseline snapshot of all metrics emitted prior to this file,
so the document doubles as a quick reference catalog. Per-version
provenance for baseline entries is not tracked; the file points at
git log on common/src/main/java/org/tron/common/prometheus/ for that.
@warku123 warku123 deleted the metric_modify_clean branch May 12, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature] Add Prometheus metrics for empty blocks and SR set changes

9 participants