feat(metrics): add block transaction count and SR set change monitoring#6624
Conversation
…monitoring Replace the dedicated tron:block_empty_total counter with a more comprehensive tron:block_transaction_count histogram that tracks the distribution of transaction counts per block. Changes: - Add overloaded init() method in MetricsHistogram to support custom buckets - Add BLOCK_TRANSACTION_COUNT histogram with buckets [0, 10, 50, 100, 200, 500, 1000, 2000, 5000, 10000] - Record transaction count for all blocks (including empty blocks with txCount=0) - Empty blocks can be queried via bucket le=0.0 - Remove unused BLOCK_EMPTY counter This provides richer insights for network analysis while still supporting empty block monitoring via histogram bucket queries. Closes tronprotocol#6590
Replace duplicated miner string literal with MINER_LABEL constant to fix SonarQube code smell. Relates tronprotocol#6624
- Replace duplicated miner string literal with MINER_LABEL constant in PrometheusApiServiceTest to fix SonarQube code smell - Add //NOSONAR comment for new BLOCK_TRANSACTION_COUNT histogram label name in MetricsHistogram Relates tronprotocol#6624
4433c52 to
93d0083
Compare
Extract repeated 'miner' string literal into MINER_LABEL constant in MetricsHistogram to follow DRY principle and fix SonarQube warning. Also update PrometheusApiServiceTest to use MINER_LABEL constant for test assertions. Relates tronprotocol#6624
93d0083 to
6315ffc
Compare
Extract repeated 'miner' string literal into MINER_LABEL constant in MetricsHistogram to follow DRY principle and fix SonarQube warning. Also update PrometheusApiServiceTest to use MINER_LABEL constant for test assertions. Relates tronprotocol#6624
6315ffc to
e92b8e2
Compare
| init(MetricKeys.Counter.TXS, "tron txs info .", "type", "detail"); | ||
| init(MetricKeys.Counter.MINER, "tron miner info .", "miner", "type"); | ||
| init(MetricKeys.Counter.BLOCK_FORK, "tron block fork info .", "type"); | ||
| init(MetricKeys.Counter.SR_SET_CHANGE, "tron sr set change .", "action", "witness"); |
There was a problem hiding this comment.
The witness label uses addresses. Although the Active SR pool is relatively limited (27 slots), as SRs rotate in and out over time, label values will accumulate monotonically — they only grow, never shrink. This is a known Prometheus anti-pattern.
Practical impact assessment:
-
TRON's SR candidate pool is not infinite, and rotation frequency is low, so it won't explode in the short term
-
However, as a
long-running node metric, months or years of operation could still accumulate a significant number of time series
Possible approaches:
-
If the address dimension is truly needed, keep the current design, but document this characteristic
-
Or remove the
witnesslabel entirely, keeping onlyaction(add/remove), and output address details via logs instead. Follow the principle thatPrometheus should be used for alerting and awareness, while logs are for precise investigation and root-cause analysis.
There was a problem hiding this comment.
Thanks for raising the high-cardinality concern — it's a valid Prometheus best practice to watch for.
However, I believe the witness label is acceptable here given TRON's specific constraints:
- Bounded candidate pool: The SR candidate pool is finite. Even over years of operation, the total number of distinct addresses that have ever held an SR slot has a practical ceiling — far from the
unbounded growth typical of high-cardinality anti-patterns. - Low rotation frequency: SR set changes only occur at maintenance intervals (every 6 hours), so new label values are introduced very infrequently.
- Diagnostic value: The primary purpose of this metric is to identify which SR was added or removed. Dropping the witness label would reduce the metric to just "something changed", requiring operators to
cross-reference logs — losing the at-a-glance visibility in Grafana dashboards that makes this metric useful in the first place.
In summary, the worst-case time series count is roughly candidate_pool_size × 2 (add/remove), which stays well within Prometheus's comfortable operating range.
That said, I agree it's worth documenting this characteristic. Could you suggest what form you'd prefer — a code comment at the metric registration site, a note in the PR description, or something else?
There was a problem hiding this comment.
prefer to a code comment at the metric registration site.
There was a problem hiding this comment.
@halibobo1205 Added a comment at the SR_SET_CHANGE constant definition in MetricKeys.java (commit 96bf315) explaining the cardinality rationale:
witness label: bounded cardinality -- SR candidate pool is finite, rotation is infrequent (at most once per maintenance interval); kept for at-a-glance SR identification in dashboards rather than requiring log cross-referencing.
- Extract MINER_LABEL to MetricLabels.Histogram.MINER as shared constant - Use int instead of double for block count in tests - Store witness addresses as base58 directly, removing unnecessary hex round-trip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The SR set change detection used to live in BlockChainMetricManager.applyBlock and inferred SR set transitions from nextMaintenanceTime changes plus a diff against a cached HashSet. That approach was indirect (re-derived state already known at the source) and required a fork-rollback guard. Hoist the recording into the consensus layer where the transition is explicit: MaintenanceManager.doMaintenance already computes currentWits and newWits when the active witness set changes, so it can call SrSetChangeMetric.record(currentWits, newWits) directly. The new helper lives in the common module so both consensus and framework can depend on it without inverting the existing module hierarchy. Tests: SrSetChangeMetricTest exercises SR_ADD via the real doMaintenance() flow, plus the no-vote / metrics-disabled / SR_REMOVE branches. PrometheusApiServiceTest drops the obsolete SR assertions.
- move BLOCK_TRANSACTION_COUNT histogram recording from BlockChainMetricManager.applyBlock to Manager.applyBlock so that witness-produced blocks are also sampled (previously only peer-received blocks went through BlockChainMetricManager) - rename SRMetrics.record() to recordSrSetChange() to leave room for additional SR-related metrics
Record the histogram at the start of pushBlock (inside the synchronized block, before any early returns) so abnormal pushes — duplicate, stale, fork-switched, or merkle-failed blocks — are also observable.
halibobo1205
left a comment
There was a problem hiding this comment.
TODO
- Adding a METRICS_CHANGELOG.md to the repo to track metric additions, modifications, and deprecations — future PRs can maintain it as well
- Including a metrics change entry in release notes
- Filing a follow-up issue/TODO to update the official metrics documentation and the Grafana dashboard template
Add two new panel sections to the java-tron-server Grafana dashboard to support metrics from tronprotocol/java-tron#6624: - Block Metrics: empty block rate, per-SR empty block count, avg transactions per block, transaction count distribution - SR Set Change: event table and cumulative totals by action
Add a single discovery point for Prometheus metric additions, changes, and removals so node operators and dashboard maintainers do not need to scan individual pull requests. Sectioning mirrors java-tron GitHub Releases (Notice / New Features / ... grouped by module) for consistency with the rest of the repository. The 4.8.2 section records the two metrics introduced in tronprotocol#6624 (`tron:block_transaction_count`, `tron:sr_set_change`) and a Pre-4.8.2 baseline snapshot of all metrics emitted prior to this file, so the document doubles as a quick reference catalog. Per-version provenance for baseline entries is not tracked; the file points at git log on common/src/main/java/org/tron/common/prometheus/ for that.
Add a single discovery point for Prometheus metric additions, changes, and removals so node operators and dashboard maintainers do not need to scan individual pull requests. Sectioning mirrors java-tron GitHub Releases (Notice / New Features / ... grouped by module) for consistency with the rest of the repository. The 4.8.2 section records the two metrics introduced in tronprotocol#6624 (`tron:block_transaction_count`, `tron:sr_set_change`) and a Pre-4.8.2 baseline snapshot of all metrics emitted prior to this file, so the document doubles as a quick reference catalog. Per-version provenance for baseline entries is not tracked; the file points at git log on common/src/main/java/org/tron/common/prometheus/ for that.
What does this PR do?
Add two new metrics enhancements for better network monitoring:
Block Transaction Count Histogram
tron:block_empty_totalcounter withtron:block_transaction_counthistogramle=0.0bucketSR Set Change Monitoring
tron:sr_set_changecounter to track SR membership changesaction(add/remove),witness(SR address)Changes:
init()method inMetricsHistogramfor custom bucketsBLOCK_TRANSACTION_COUNThistogram metricSR_SET_CHANGEcounter withSR_ADD/SR_REMOVElabelsBlockChainMetricManagerPrometheusApiServiceTestwith comprehensive test coverageWhy are these changes required?
Compared to #6602, this PR provides a cleaner implementation focused solely on metrics enhancement without additional complexity.
This PR has been tested by:
PrometheusApiServiceTest)Follow up
N/A
Extra details