[Metrics] Resolve accumulated TODOs in SDK core#7277
Conversation
Wrap ExemplarReservoir.Offer in try/catch so a buggy custom reservoir can't silently crash the measurement path. Surfaces the error as a new Warning event (open-telemetry#57). Log a Verbose diagnostic in MeasurementsCompleted when the state object isn't a MetricState, consistent with every other callback in the file. Log unknown instrument types that fall through to the default case in MonotonicDeltaTemporalityPreferenceFunc. Call TrimExcess() after releasing the creation lock in AddMetricWithViews to reclaim wasted list capacity
There was a problem hiding this comment.
Pull request overview
Resolves several long-standing TODOs in the metrics SDK core by improving exception safety around exemplars, adding missing verbose diagnostics, and reducing unnecessary allocations from over-provisioned metric lists.
Changes:
- Wraps user-supplied
ExemplarReservoir.Offercalls with exception handling and introduces a new Warning-levelEventSourceevent for failures. - Adds verbose diagnostics for unexpected state in
MeasurementsCompletedand for unrecognized instrument types in temporality preference selection. - Calls
TrimExcess()on the metrics list created inAddMetricWithViewsafter releasing the lock to reclaim unused capacity.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/OpenTelemetry/Metrics/Reader/MetricReaderExt.cs | Trims preallocated metrics list capacity after view processing completes. |
| src/OpenTelemetry/Metrics/Reader/MetricReader.cs | Adds verbose log when temporality preference sees an unexpected instrument type. |
| src/OpenTelemetry/Metrics/MetricPoint/MetricPoint.cs | Adds exception-safe exemplar offering via helper methods. |
| src/OpenTelemetry/Metrics/MeterProviderSdk.cs | Adds verbose log when MeasurementsCompleted receives an unexpected state type. |
| src/OpenTelemetry/Internal/OpenTelemetrySdkEventSource.cs | Adds new EventId 57 warning event for exemplar reservoir offer exceptions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7277 +/- ##
==========================================
- Coverage 89.94% 89.93% -0.02%
==========================================
Files 276 276
Lines 14007 14034 +27
==========================================
+ Hits 12599 12621 +22
- Misses 1408 1413 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
|
martincostello
left a comment
There was a problem hiding this comment.
Also please review the Copilot review comments.
|
copilot feedback Code Review — PR #7277 [Metrics] Resolve accumulated TODOs in SDK core Consensus findings (both Opus 4.7 + Opus 4.6 agreed) — real issues
Where the models disagreed — I verified by reading the code
Unique to Opus 4.7
Plus useful verifications: event id 57 is unique; [MethodImpl(NoInlining)] is present (not relying on JIT); MonotonicDeltaTemporalityPreferenceFunc outer lambda remains static with no closure allocation; OpenTelemetrySdkEventSource is internal (no .publicApi/ update needed); the PR does include unit tests with ThrowingExemplarReservoir (correcting a wrong premise in my prompt). Model comparison notes
Recommended actions for the PR
|
…and logging - Drop ExemplarReservoirException event from Warning to Verbose — a throwing reservoir fires on every measurement, making Warning level impractical on hot paths where exemplar capture is best-effort anyway - Remove TrimExcess() from AddMetricWithViews — the caller immediately calls .ToArray() on the returned list, so the trim just allocates a backing array that is discarded on the next line - Align MeasurementsCompleted error logging with its siblings MeasurementRecordedLong/Double, which use MeasurementDropped at Warning for the same state mismatch condition - Add CHANGELOG entry for the exception-safety behaviour introduced by OfferExemplarSafe, which is a user-observable contract change
|
Thanks for the thorough review @rajkumar-rangaraj ExemplarReservoirException log level TrimExcess() MeasurementsCompleted severity CHANGELOG I'm really excited to see this merged and looking forward to contributing more to the project in the future. |
Hey everyone! 👋 I was going through the codebase and spotted a few TODOs that looked straightforward to fix, so I went ahead and took care of them. Thanks for maintaining such a great project and for everything you do for the open source community 🙏
Changes
Clears four long-standing
TODOcomments in the metrics SDK. Each change is small and self-contained; they are grouped together because they all touch the same area and share the same low risk profile.Exception safety for custom
ExemplarReservoirMetricPoint.UpdateExemplarcalledExemplarReservoir.Offerwith no protection. Any exception from a user-supplied reservoir would propagate up and corrupt the measurement path. The fix wraps the call in a try/catch and emits a newWarning-levelEventSourceevent (#57,ExemplarReservoirException). To preserveAggressiveInliningon the hot path, the guard (if (offerExemplar)) stays inlined inUpdateExemplarwhile the offer + exception handling moves into a non-inlinedOfferExemplarSafehelper.Diagnostic log in
MeasurementsCompletedWhen
MeasurementsCompletedreceives a state that isn't aMetricState, it now logs aVerbose-levelMeterProviderSdkEventbefore returning — consistent withMeasurementRecordedLongandMeasurementRecordedDoublein the same file.Log unknown instrument type in temporality preference
MonotonicDeltaTemporalityPreferenceFunchad a silent_ => AggregationTemporality.Deltadefault. It is now a block lambda that emits aVerboselog when an unrecognized instrument type hits that arm. Return behavior is unchanged.TrimExcess()afterAddMetricWithViewsThe
metricslist is pre-allocated to the maximum view count, but views can be skipped for several reasons (invalid name, existing metric,Dropconfig, limit exceeded).TrimExcess()is now called after the lock is released to reclaim unused capacity without extending lock hold time.Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changes