feat(metrics): add bound instruments behind experimental feature flag#3421
feat(metrics): add bound instruments behind experimental feature flag#3421cijothomas merged 12 commits intoopen-telemetry:mainfrom
Conversation
db9517e to
fb725ae
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3421 +/- ##
=======================================
+ Coverage 83.7% 84.1% +0.3%
=======================================
Files 126 126
Lines 25386 26592 +1206
=======================================
+ Hits 21255 22364 +1109
- Misses 4131 4228 +97 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f8a03dc to
08649e8
Compare
|
#3420 is merged. @bryantbiggs can you rebase? The spec is progressing in parallel (open-telemetry/opentelemetry-specification#5050), so this is perfect time to get this merged! |
264009b to
e8cac00
Compare
91cb61a to
b0f4777
Compare
|
Pushed a round of fixes addressing all five comments; reply inline on the Two extras worth flagging that aren't tied to a comment: |
Add BoundCounter and BoundHistogram types that cache resolved aggregator references for a fixed attribute set. Created via Counter::bind() and Histogram::bind(), bound instruments bypass per-call attribute lookup for significant performance improvements (~28x for counters, ~9x for histograms). Architecture: - TrackerEntry.bound_count tracks live handles, preventing eviction - Direct/Fallback enum handles cardinality overflow gracefully - Unsupported aggregators (ExpoHistogram, LastValue, PrecomputedSum) fall back to unbound path instead of panicking All public API, traits, and internal plumbing are gated behind the experimental_metrics_bound_instruments feature flag. Includes 15 tests covering cumulative/delta temporality, overflow fallback, recovery after eviction, bound+unbound sharing, idle cycles, drop semantics, and empty attributes. Splits from open-telemetry#3392. Refs open-telemetry#1374.
cfg-gated imports must sort before non-gated imports of the same module per rustfmt's stable import ordering rules.
Add bound_count field to TrackerEntry to track live bound instrument handles. Entries with bound_count > 0 are never evicted during delta collection, ensuring bound handles always point to a live tracker. Moved from open-telemetry#3420 per review feedback — bound_count belongs with the bound instruments feature, not the delta collection refactor.
- Make no_attribute_tracker an Arc<TrackerEntry> so bind(&[]) returns a handle to the same tracker that measure(&[]) uses, preventing duplicate data points for empty attribute sets - Re-check bound_count under write lock in collect_and_reset() to prevent a TOCTOU race where bind() increments bound_count after collect() reads it as 0 under a concurrent read lock - Use Ordering::Release for has_been_updated stores in bound handles, matching the unbound measure() path
CI exercises bound instruments via --all-features, so the testing feature does not need to enable experimental_metrics_bound_instruments or pull in the tokio runtime additions.
Foundational OpenTelemetry guarantee is that telemetry must never crash the host application. A custom SyncInstrument impl that does not override bind() now returns a NoopBoundSyncInstrument and emits an otel_debug log, matching the silent-drop behavior of other default impls.
When bind() creates a new tracker, insert it under both the original and sorted attribute orderings, mirroring measure()'s insert pattern. Without this, an unbound measure() call with the same attrs in non-sorted order would miss the fast path and re-sort on every call when the bound side created the tracker first.
Switch from Box<dyn ...> to Arc<dyn ...> and derive Clone so a single bound state can be shared across threads or modules without re-binding. Drop semantics are unchanged: bound_count is decremented when the last clone is dropped, matching Rust's RAII expectations for handle types. Mirrors the existing Clone behavior on Counter and Histogram.
Tests:
* bound_histogram_empty_attributes_shares_with_unbound mirrors the
existing counter version for histograms.
* bound_counter_view_filters_attributes_at_bind_time and a histogram
analog confirm a view filters attributes at bind() time, with bound
and unbound recordings collapsing onto the same filtered data point.
Bench:
* Counter_Bound_With_View_Delta confirms view filtering happens at bind
time, leaving the hot path at unfiltered-bound speed. Updated bench
required-features and added a note that criterion does not enforce
regression on its own.
CHANGELOG entries for both crates updated to mention Clone, the noop
trait default, and the converged bound/unbound semantics.
a6ff02d to
de9da26
Compare
The previous Fallback path re-routed every bound.add() through the unbound measure() path, creating a ~25x perf cliff at the cardinality limit (~50ns vs ~1.8ns) and letting attribution drift across delta eviction cycles as space freed and refilled. ValueMap::bind() now looks up or lazily creates the overflow TrackerEntry at the limit and returns a direct handle to it. The bound handle writes directly to overflow via a single atomic update for its lifetime — perf parity with a normal bind. To recover after delta eviction frees space, drop the handle and rebind. Spec-aligned with open-telemetry/opentelemetry-specification#5050: overflow data lands in the otel.metric.overflow=true bucket (MUST) and per-call attribute lookup is bypassed (SHOULD). Bench (Apple M4 Max): Counter_Bound_AtOverflow_Delta: 1.82 ns Histogram_Bound_AtOverflow_Delta: 6.58 ns
Removes BoundFallbackHandle and the fallback parameter on Measure::bind, leaving every aggregator on the same direct-bind shape: bind() returns a BoundXxxHandle holding an Arc<TrackerEntry> that writes directly to the aggregator without per-call attribute lookup. LastValue, PrecomputedSum, and ExpoHistogram now match the Sum / Histogram pattern. PrecomputedSum's bind() is unreachable from user code today (async-only) but the impl exists so the trait is uniform and future Gauge / Observable bind() extensions are mechanical. The rare RwLock-poisoned case now yields a NoopBoundMeasure that drops measurements silently — mirroring measure()'s own poison handling. Test coverage added: - Bound ExponentialHistogram via View: delta with NaN/inf filter, bind-at-overflow attribution, persistence across delta eviction, view filter applied at bind time, and cumulative accumulation. - Cumulative bind-at-overflow tests for Counter and Histogram. - LastValue::bind and PrecomputedSum::bind unit tests exercising Measure / BoundMeasure traits and bound_count Drop semantics. - ValueMap::bind under a poisoned trackers RwLock returns None so the caller produces a NoopBoundMeasure rather than panicking.
de9da26 to
1f02cd8
Compare
cijothomas
left a comment
There was a problem hiding this comment.
LGTM!
Thanks for working on this. Spec PR is still pending, but give this is gated under feature-flag, okay to merge. I
ts aligned with current spec PR.
Splits from #3392, per maintainer feedback. Built on top of #3420 (in-place delta collection, now merged). Refs #1374.
Summary
Adds
BoundCounterandBoundHistogramto the public API behind theexperimental_metrics_bound_instrumentsfeature flag. These types cache the resolved aggregator reference for a fixed attribute set, allowing subsequent measurements to bypass per-call sort, dedup, hash, and HashMap lookup entirely.Architecture
TrackerEntry bound_count
bound_count(introduced in #3420) tracks how many liveBoundCounter/BoundHistogramhandles reference an entry. Entries withbound_count > 0are never evicted during delta collection.Cardinality overflow handling
ValueMap::bind()returnsOption>—Nonewhen at the cardinality limit. Bound handle types use aDirect/Fallbackenum:Measure::call()pathFeature flag
All public API, traits, noop impls, and internal plumbing are gated behind
experimental_metrics_bound_instruments.Benchmark Results
Apple M4 Max, 16 cores (12 performance + 4 efficiency), macOS 15.4:
EC2 Counter Results (from #3392)
EC2 Histogram Results (from #3392)
Test Coverage
15 tests covering:
Test plan