[Metrics] Implement cumulative aggregation for async Long and Double counters in MetricPoint#7227
[Metrics] Implement cumulative aggregation for async Long and Double counters in MetricPoint#7227nabutabu wants to merge 12 commits into
Conversation
…counters in MetricPoint
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7227 +/- ##
==========================================
- Coverage 89.86% 89.85% -0.01%
==========================================
Files 276 276
Lines 14007 14023 +16
==========================================
+ Hits 12588 12601 +13
- Misses 1419 1422 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
|
||
| case AggregationType.LongSumIncomingCumulative: | ||
| { | ||
| Interlocked.Add(ref this.runningValue.AsLong, number); |
There was a problem hiding this comment.
This is not correct. If incoming is cumulative, then it is already the sum. We (i.e SDK) shouldn't add it.
cijothomas
left a comment
There was a problem hiding this comment.
See
https://github.com/open-telemetry/opentelemetry-dotnet/pull/7227/changes#r3168921622
Let me know if I misunderstood the fix.
… Double counters in MetricPoint" This reverts commit 76d6a88.
|
Fix was incorrect and the issue was that I did not take into account a couple of things which the failing tests also pointed out. Changing Basically if a MetricPoint is in state CollectPending and is IsAsynchronous then the value sent by |
| ref var metricPoint = ref this.metricPoints[index]; | ||
| if (metricPoint.MetricPointStatus == MetricPointStatus.CollectPending) | ||
| { | ||
| value += metricPoint.GetRunningValueLong(); |
There was a problem hiding this comment.
The idea of modifying the incoming value to achieve "spatial aggregation" is good - but it does create incorrect Exemplar values.
It is not that bad - as Exemplars have limited utility in Observable instruments anyway and Exemplars are disabled also by-default for Observable. We should see if the fix can keep Exemplars correct as well. (Exemplar should get the original unmodified value, but Metricpoint should get the updated one..)
There was a problem hiding this comment.
Read more about Exemplars and this does complicate this issue a bit. Blurting out my understanding of what's happening for future me:
Exemplars are defined as: "access to the raw measurement value, time stamp when measurement was made, and trace context", which means each exemplar recording should be the raw reading instead of the cumulative value that the (non-exemplar) async cumulative instrument uses.
This is important because each exemplar metric recording is also associated with a trace. If exemplars stored aggregated values, we'd lose the connection to individual traces for each recording. The exemplar's purpose is to preserve the context of specific measurements that contribute to the aggregate, which this current change loses.
I'll come back when I have more on how we could fix this.
- First thought is to add a parameter to
UpdateWithExemplarinMetricPoint-> "number" would be the cumulative value and the new parameter is the measured value. - MetricPoint does already have access to its own running value, maybe we could just subtract that when calling UpdateExemplar, this would need to happen before
Interlocked.Addis called on the running value. Also would this work for every kind of instrument?
| } | ||
|
|
||
| [Theory(Skip = "https://github.com/open-telemetry/opentelemetry-specification/issues/1874")] | ||
| [Theory] |
There was a problem hiding this comment.
I added just one test to prove that we don't do spatial-aggregation in this SDK. We need much more coverage when fixing the issue. Covering delta/cumulative, multiple cycles, various view combinations.
cijothomas
left a comment
There was a problem hiding this comment.
The current iteration looks very promising (thanks! This was indeed a hard problem!).
Requesting changes to address below comment
https://github.com/open-telemetry/opentelemetry-dotnet/pull/7227/changes#r3183554865
Also, we need to vastly improve test coverage when doing this fix- lot of potential edge cases. (I spotted only one about Exemplar value, but could be more)
…ng and double metrics
|
@cijothomas I added as many tests as I could think of, but I am not sure if this is all that needs coverage. Would appreciate any guidance on any other nuances of the measurements that could potentially be affected! |
Fixes #7215
Design discussion issue NA
Changes
Use
InterlockedHelper.Addin Update calls and reset aggregation values inTakeSnapshotusingInterlocked.Exchange(ref this.runningValue.AsLong, 0);Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changes