Skip to content

Enable self-adjusting (native) histogram buckets for all duration metrics#1660

Merged
adecaro merged 2 commits into
hyperledger-labs:mainfrom
Soumya8898:Soumya8898/self-adjusting-histogram-buckets
May 8, 2026
Merged

Enable self-adjusting (native) histogram buckets for all duration metrics#1660
adecaro merged 2 commits into
hyperledger-labs:mainfrom
Soumya8898:Soumya8898/self-adjusting-histogram-buckets

Conversation

@Soumya8898
Copy link
Copy Markdown
Contributor

@Soumya8898 Soumya8898 commented May 5, 2026

Summary

Closes #1274

This PR enables Prometheus native histogram collection on all duration-based histogram metrics across the Token SDK. Native histograms use exponentially-spaced buckets that self-adjust at scrape time, providing significantly better percentile accuracy (especially p99/p999) without requiring operators to hand-tune bucket boundaries.

What was done

  • Defined a defaultNativeHistogramBucketFactor constant (1.1) representing schema=3 exponential bucketing (~9% growth between bucket boundaries).
  • Set NativeHistogramBucketFactor: 1.1 and NativeHistogramMaxBucketNumber: 100 on every duration-based HistogramOpts across all SDK services:
    • TTX service (endorsement, audit approval, ordering durations)
    • TTX finality (on-status duration)
    • Auditor (audit duration, append duration)
    • Certifier (certification request duration)
    • Sherdlock selector (selection duration)
    • Fabricx finality queue (processing duration)
    • ZKATDLog (ZK issue and transfer proof durations)
  • Intentionally excluded ImmediateRetries histogram since it tracks a discrete integer distribution (0–5) where native histograms provide no benefit.
  • The configuration runs in dual mode: both classic fixed buckets and native exponential buckets are emitted simultaneously, so existing dashboards and alerts continue working without any changes.

Why native histograms

Fixed buckets (e.g., [0.005, 0.01, 0.025, ...]) create blind spots. If a real p99 sits at 7.3s but the nearest bucket boundaries are 5s and 10s, Prometheus can only report something between 5–10s. Native histograms eliminate this by using fine-grained exponential boundaries that cover the full value range without configuration.

@Soumya8898 Soumya8898 force-pushed the Soumya8898/self-adjusting-histogram-buckets branch from 2251399 to 51f3f90 Compare May 5, 2026 21:26
@Soumya8898
Copy link
Copy Markdown
Contributor Author

Hi @adecaro
I’ve raised the PR with the changes, but it depends on a PR raised in FSC. Once that is reviewed and released in the next version, we’ll be able to use it.

For local testing with the FSC changes, I made some temporary modifications, which is why there are merge conflicts. Should we wait for the FSC PR to be released before proceeding?

I’ve also added steps outlining what to follow after the FSC release. In the meantime, would you prefer that we remove the local changes and keep only the required updates, then merge once the release is available? Or is there another approach you’d suggest?

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 6, 2026

Hi @Soumya8898 , thanks for the effort 🙏
I have reviewed and approved the PR on the smart client. Let's wait @mbrandenburger for a second checks and then we should good to go 😄

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 6, 2026

Hi @Soumya8898 , hyperledger-labs/fabric-smart-client#1398 has been merged.
Please, update the PR. Many many thanks 🙏

@Soumya8898
Copy link
Copy Markdown
Contributor Author

Soumya8898 commented May 6, 2026

Yah it's merged, will update the pr. Thanks

@Soumya8898 Soumya8898 force-pushed the Soumya8898/self-adjusting-histogram-buckets branch 3 times, most recently from ff70dc1 to f2fabc4 Compare May 6, 2026 18:33
@Soumya8898
Copy link
Copy Markdown
Contributor Author

Soumya8898 commented May 6, 2026

HI @adecaro I have a small doubt regarding how things work here between FSC and token-sdk, So as there is no tagged-release from FSC side for the merged PR hyperledger-labs/fabric-smart-client#1398 in FSC repo, So I tried the FSC dependency is pinned to a pseudo-version by targeting the exact commit. So is this the correct approach or I need to change it.

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 7, 2026

HI @adecaro I have a small doubt regarding how things work here between FSC and token-sdk, So as there is no tagged-release from FSC side for the merged PR hyperledger-labs/fabric-smart-client#1398 in FSC repo, So I tried the FSC dependency is pinned to a pseudo-version by targeting the exact commit. So is this the correct approach or I need to change it.

Targeting the exact commit is fine. When a new tag appears in FSC, we will update again the token-sdk and cut a tag too 😄

@adecaro adecaro self-requested a review May 7, 2026 05:37
@adecaro adecaro self-assigned this May 7, 2026
@adecaro adecaro added the metrics label May 7, 2026
@adecaro adecaro added this to the Q2/26 milestone May 7, 2026
Copy link
Copy Markdown
Contributor

@adecaro adecaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Soumya8898 , great work. Thanks a lot for the effort 🙏

I'm wondering if, in another PR, we could offer the developer a way to customize these numbers. Just wondering 😅

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 7, 2026

I think, there are other packages we can update:

Screenshot 2026-05-07 at 07 40 00

Please, have a look 🙏

@adecaro adecaro self-requested a review May 7, 2026 05:40
@adecaro adecaro force-pushed the Soumya8898/self-adjusting-histogram-buckets branch from f2fabc4 to a6105fd Compare May 7, 2026 06:14
@Soumya8898
Copy link
Copy Markdown
Contributor Author

Hi @Soumya8898 , great work. Thanks a lot for the effort 🙏

I'm wondering if, in another PR, we could offer the developer a way to customize these numbers. Just wondering 😅

@adecaro That's a nice idea 😊 Let me work on that. One thing I'm thinking is that developers need to have some understanding of how things work in the Native Histograms case. So I'm considering adding a small amount of context around it, along with customization options for choosing the variables. I'll create another PR after implementing that.

@Soumya8898 Soumya8898 force-pushed the Soumya8898/self-adjusting-histogram-buckets branch from a6105fd to 60d635d Compare May 7, 2026 09:00
@Soumya8898
Copy link
Copy Markdown
Contributor Author

Soumya8898 commented May 7, 2026

I think, there are other packages we can update:

Screenshot 2026-05-07 at 07 40 00 Please, have a look 🙏

token/core/common/metrics - This is just a type alias file. It re-exports HistogramOpts from FSC. No actual histogram definitions/instances here. Nothing to change.

token/services/selector/sherdlock/mocks - These are auto-generated mock/fake files (counterfeiter). They reference HistogramOpts type in function signatures but don't define any histograms. The mocks will automatically pick up the new fields since HistogramOpts is a type alias. Nothing to change.

For all others I updated the packages

@adecaro adecaro force-pushed the Soumya8898/self-adjusting-histogram-buckets branch from ecb99b9 to 93f85d8 Compare May 7, 2026 14:20
Copy link
Copy Markdown
Contributor

@adecaro adecaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 7, 2026

Thanks much @Soumya8898 for double checking. I'll merge as soon as the PR gives the okay 🙏

Soumya Mohapatra added 2 commits May 7, 2026 16:59
Enable Prometheus native histogram collection (dual mode) by setting
NativeHistogramBucketFactor: 1.1 on all duration-based histogram
definitions across the SDK services.

This provides exponentially-spaced buckets (schema=3, ~9% growth per
bucket) alongside the existing fixed buckets, improving percentile
accuracy without breaking existing monitoring setups.

Affected services:
- token/services/ttx (endorsement, audit approval, ordering durations)
- token/services/ttx/finality (on-status duration)
- token/services/auditor (audit, append durations)
- token/services/certifier/interactive (certification request duration)
- token/services/selector/sherdlock (selection duration)
- token/services/network/fabricx/finality/queue (processing duration)
- token/core/zkatdlog/nogh/v1 (ZK issue and transfer proof durations)

The ImmediateRetries histogram (discrete integer distribution 0-5)
is intentionally excluded as native histograms provide no benefit
for small discrete distributions.

Depends-On: hyperledger-labs/fabric-smart-client#XXXX

Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com>
Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
Remove the temporary local replace directive and update the dependency
to v0.10.2-0.20260506093942-1274969d717d which includes native histogram
support in HistogramOpts (hyperledger-labs/fabric-smart-client#1398).

Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
@adecaro adecaro force-pushed the Soumya8898/self-adjusting-histogram-buckets branch from 93f85d8 to 95489bc Compare May 7, 2026 14:59
@adecaro adecaro merged commit 4fc4ef2 into hyperledger-labs:main May 8, 2026
94 checks passed
@Soumya8898 Soumya8898 deleted the Soumya8898/self-adjusting-histogram-buckets branch May 8, 2026 05:50
SurbhiAgarwal1 pushed a commit to SurbhiAgarwal1/fabric-token-sdk that referenced this pull request May 13, 2026
…rics (hyperledger-labs#1660)

Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com>
Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
Co-authored-by: Soumya Mohapatra <mohapatras@microsoft.com>
HayimShaul pushed a commit that referenced this pull request May 14, 2026
…rics (#1660)

Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com>
Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
Co-authored-by: Soumya Mohapatra <mohapatras@microsoft.com>
HayimShaul pushed a commit that referenced this pull request May 14, 2026
…rics (#1660)

Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com>
Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
Co-authored-by: Soumya Mohapatra <mohapatras@microsoft.com>
Signed-off-by: Hayim.Shaul@ibm.com <hayimsha@fhe03.vpc.cloud9.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

histogram: investigate the use of self adjusting buckets

2 participants