Enable self-adjusting (native) histogram buckets for all duration metrics#1660
Conversation
2251399 to
51f3f90
Compare
|
Hi @adecaro For local testing with the FSC changes, I made some temporary modifications, which is why there are merge conflicts. Should we wait for the FSC PR to be released before proceeding? I’ve also added steps outlining what to follow after the FSC release. In the meantime, would you prefer that we remove the local changes and keep only the required updates, then merge once the release is available? Or is there another approach you’d suggest? |
|
Hi @Soumya8898 , thanks for the effort 🙏 |
|
Hi @Soumya8898 , hyperledger-labs/fabric-smart-client#1398 has been merged. |
|
Yah it's merged, will update the pr. Thanks |
ff70dc1 to
f2fabc4
Compare
|
HI @adecaro I have a small doubt regarding how things work here between FSC and token-sdk, So as there is no tagged-release from FSC side for the merged PR hyperledger-labs/fabric-smart-client#1398 in FSC repo, So I tried the FSC dependency is pinned to a pseudo-version by targeting the exact commit. So is this the correct approach or I need to change it. |
Targeting the exact commit is fine. When a new tag appears in FSC, we will update again the token-sdk and cut a tag too 😄 |
adecaro
left a comment
There was a problem hiding this comment.
Hi @Soumya8898 , great work. Thanks a lot for the effort 🙏
I'm wondering if, in another PR, we could offer the developer a way to customize these numbers. Just wondering 😅
f2fabc4 to
a6105fd
Compare
@adecaro That's a nice idea 😊 Let me work on that. One thing I'm thinking is that developers need to have some understanding of how things work in the Native Histograms case. So I'm considering adding a small amount of context around it, along with customization options for choosing the variables. I'll create another PR after implementing that. |
a6105fd to
60d635d
Compare
ecb99b9 to
93f85d8
Compare
|
Thanks much @Soumya8898 for double checking. I'll merge as soon as the PR gives the okay 🙏 |
Enable Prometheus native histogram collection (dual mode) by setting NativeHistogramBucketFactor: 1.1 on all duration-based histogram definitions across the SDK services. This provides exponentially-spaced buckets (schema=3, ~9% growth per bucket) alongside the existing fixed buckets, improving percentile accuracy without breaking existing monitoring setups. Affected services: - token/services/ttx (endorsement, audit approval, ordering durations) - token/services/ttx/finality (on-status duration) - token/services/auditor (audit, append durations) - token/services/certifier/interactive (certification request duration) - token/services/selector/sherdlock (selection duration) - token/services/network/fabricx/finality/queue (processing duration) - token/core/zkatdlog/nogh/v1 (ZK issue and transfer proof durations) The ImmediateRetries histogram (discrete integer distribution 0-5) is intentionally excluded as native histograms provide no benefit for small discrete distributions. Depends-On: hyperledger-labs/fabric-smart-client#XXXX Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com> Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
Remove the temporary local replace directive and update the dependency to v0.10.2-0.20260506093942-1274969d717d which includes native histogram support in HistogramOpts (hyperledger-labs/fabric-smart-client#1398). Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com>
93f85d8 to
95489bc
Compare
…rics (hyperledger-labs#1660) Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com> Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com> Co-authored-by: Soumya Mohapatra <mohapatras@microsoft.com>
…rics (#1660) Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com> Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com> Co-authored-by: Soumya Mohapatra <mohapatras@microsoft.com>
…rics (#1660) Signed-off-by: Soumya8898 <soumyaranjanmohapatra784@gmail.com> Signed-off-by: Soumya Mohapatra <mohapatras@microsoft.com> Co-authored-by: Soumya Mohapatra <mohapatras@microsoft.com> Signed-off-by: Hayim.Shaul@ibm.com <hayimsha@fhe03.vpc.cloud9.ibm.com>


Summary
Closes #1274
This PR enables Prometheus native histogram collection on all duration-based histogram metrics across the Token SDK. Native histograms use exponentially-spaced buckets that self-adjust at scrape time, providing significantly better percentile accuracy (especially p99/p999) without requiring operators to hand-tune bucket boundaries.
What was done
defaultNativeHistogramBucketFactorconstant (1.1) representing schema=3 exponential bucketing (~9% growth between bucket boundaries).NativeHistogramBucketFactor: 1.1andNativeHistogramMaxBucketNumber: 100on every duration-basedHistogramOptsacross all SDK services:ImmediateRetrieshistogram since it tracks a discrete integer distribution (0–5) where native histograms provide no benefit.Why native histograms
Fixed buckets (e.g.,
[0.005, 0.01, 0.025, ...]) create blind spots. If a real p99 sits at 7.3s but the nearest bucket boundaries are 5s and 10s, Prometheus can only report something between 5–10s. Native histograms eliminate this by using fine-grained exponential boundaries that cover the full value range without configuration.