Skip to content

[ibm-mq-metrics] move to produce metrics via MetricProducer#2836

Merged
breedx-splk merged 15 commits into
open-telemetry:mainfrom
atoulme:producer
May 22, 2026
Merged

[ibm-mq-metrics] move to produce metrics via MetricProducer#2836
breedx-splk merged 15 commits into
open-telemetry:mainfrom
atoulme:producer

Conversation

@atoulme
Copy link
Copy Markdown
Contributor

@atoulme atoulme commented May 15, 2026

Description:

Move away from the meter, and use the SDK API to create a MetricProducer to produce data points to be directly sent to the backend.

Copilot AI review requested due to automatic review settings May 15, 2026 07:02
@atoulme atoulme requested a review from a team as a code owner May 15, 2026 07:02
@github-actions github-actions Bot requested a review from breedx-splk May 15, 2026 07:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the OpenTelemetry Meter-based instrumentation in the ibm-mq-metrics module with a custom MetricProducer implementation that builds MetricData directly and registers itself with the SDK's MeterProvider via registerMetricProducer. The Metrics.java weaver template (which generated Meter-based factory methods) is replaced by a MetricProducer.java template that generates one record* method per metric, each appending a fully-formed MetricData to an internal list that the SDK drains on each collection cycle.

Changes:

  • New MetricProducer and (auto-value-backed) MetricData classes, plus updated weaver template/config; Metrics.java.j2 removed.
  • All collectors (ChannelMetricsCollector, QueueManagerMetricsCollector, QueueCollectionBuddy, etc.), WmqMonitor, and Main are reworked to take a MetricProducer instead of a Meter and call producer.record* directly.
  • All unit/integration tests are migrated off OpenTelemetryExtension/Meter to construct a MetricProducer and assert on producer.produce(Resource.empty()).

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
ibm-mq-metrics/templates/registry/java/weaver.yaml Switches generation to the new MetricProducer.java.j2 template.
ibm-mq-metrics/templates/registry/java/Metrics.java.j2 Deleted (replaced by MetricProducer.java.j2).
ibm-mq-metrics/templates/registry/java/MetricProducer.java.j2 New template generating one record* method per metric.
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java Generated producer class implementing the SDK MetricProducer SPI.
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricData.java AutoValue-backed MetricData with a createMetricData factory.
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/WmqMonitor.java Drops Meter/Long* fields, uses producer directly for heartbeat & connection-error metrics.
ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/opentelemetry/Main.java Wires the producer into the auto-configured SDK via registerMetricProducer.
ibm-mq-metrics/src/main/java/.../metricscollector/ChannelMetricsCollector.java Replaces gauge fields with producer recorder calls.
ibm-mq-metrics/src/main/java/.../metricscollector/InquireChannelCmdCollector.java Replaces gauge fields with producer recorder calls.
ibm-mq-metrics/src/main/java/.../metricscollector/InquireQueueManagerCmdCollector.java Replaces gauge field with producer recorder call.
ibm-mq-metrics/src/main/java/.../metricscollector/InquireTStatusCmdCollector.java Replaces gauge fields with producer recorder calls.
ibm-mq-metrics/src/main/java/.../metricscollector/ListenerMetricsCollector.java Replaces gauge field with producer recorder call.
ibm-mq-metrics/src/main/java/.../metricscollector/PerformanceEventQueueCollector.java Replaces counter fields with producer recorder calls.
ibm-mq-metrics/src/main/java/.../metricscollector/QueueCollectionBuddy.java Switches AllowedGauge registry from LongGauge to BiConsumer<Long,Attributes> recorder.
ibm-mq-metrics/src/main/java/.../metricscollector/QueueManagerEventCollector.java Replaces counter field with producer recorder call.
ibm-mq-metrics/src/main/java/.../metricscollector/QueueManagerMetricsCollector.java Replaces gauge fields with producer recorder calls.
ibm-mq-metrics/src/main/java/.../metricscollector/QueueMetricsCollector.java Constructor now takes a MetricProducer.
ibm-mq-metrics/src/main/java/.../metricscollector/ReadConfigurationEventQueueCollector.java Replaces gauge field with producer recorder call.
ibm-mq-metrics/src/main/java/.../metricscollector/TopicMetricsCollector.java Constructor now takes a MetricProducer.
ibm-mq-metrics/src/test/java/.../*CollectorTest.java (5 files) Migrated tests to construct a MetricProducer directly and assert on its produce() output.
ibm-mq-metrics/src/integrationTest/java/.../TestWMQMonitor.java Holds a MetricProducer instead of a Meter.
ibm-mq-metrics/src/integrationTest/java/.../WMQMonitorIntegrationTest.java Replaces OpenTelemetryExtension with a directly-instantiated MetricProducer.
ibm-mq-metrics/build.gradle.kts Adds AutoValue annotations + processor for MetricData.
ibm-mq-metrics/config.yml Trailing newline added.
Comments suppressed due to low confidence (4)

ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:902

  • The produce() method copies metricData into a new list and then clears it as two separate operations. Although metricData is a synchronizedList, the copy-then-clear sequence is not atomic: any record* invocation that occurs between the new ArrayList<>(this.metricData) (which internally synchronizes on the list) and this.metricData.clear() will be silently dropped. Additionally, iterating the synchronized list inside the ArrayList copy constructor without an explicit synchronized(this.metricData) block can throw ConcurrentModificationException if a recorder thread mutates the list during iteration. Wrap both operations in a single synchronized(this.metricData) block (or use a different concurrent structure such as draining via ConcurrentLinkedQueue).
  public List<MetricData> produce(Resource resource) {
    List<MetricData> collectedPoints = new ArrayList<>(this.metricData);
    this.metricData.clear();
    this.currentEpochNanos = Clock.getDefault().nanoTime();
    return collectedPoints;
  }

ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:902

  • The Resource parameter passed by the SDK to produce(Resource resource) is ignored — the producer always exports metrics tagged with the Resource provided in the constructor. Per the MetricProducer SPI contract this argument is the resource the SDK wants the producer to associate metrics with (typically the SDK's own resource), so ignoring it can cause exported metrics to carry a different/empty resource than other metrics from the same SDK pipeline. Either use the supplied resource when building the MetricData, or document why it's deliberately ignored.
  @Override
  public List<MetricData> produce(Resource resource) {
    List<MetricData> collectedPoints = new ArrayList<>(this.metricData);
    this.metricData.clear();
    this.currentEpochNanos = Clock.getDefault().nanoTime();
    return collectedPoints;
  }

ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:56

  • Each record* call allocates a singleton List, a LongPointData, a GaugeData/SumData, and a MetricData object, plus a synchronized-list add (and a Clock.getDefault() lookup). Previously, LongGauge.set / LongCounter.add cost essentially nothing on the hot path, with aggregation done at collection time. For collectors that fire frequently or with many attribute combinations this is a meaningful regression in CPU and GC pressure. Consider aggregating points per (name, attributes) so that only one point per metric stream is emitted per collection cycle, similar to what the SDK aggregator did before.
  public void recordIbmMqMessageRetryCount(long value, Attributes attributes) {
    this.metricData.add(
        createMetricData(
            this.resource,
            this.instrumentationScopeInfo,
            "ibm.mq.message.retry.count",
            "Number of message retries",
            "{message}",
            MetricDataType.LONG_GAUGE,
            GaugeData.createLongGaugeData(
                Collections.singletonList(
                    LongPointData.create(
                        this.currentEpochNanos,
                        Clock.getDefault().nanoTime(),
                        attributes,
                        value)))));
  }

ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java:292

  • Counter-style metrics (recordIbmMq*Event, recordIbmMqUnauthorizedEvent, recordIbmMqConnectionErrors) emit one LongPointData per call carrying only the per-call delta value (e.g. 1), but they are declared as AggregationTemporality.CUMULATIVE and isMonotonic=true. A cumulative sum point is supposed to carry the total since startEpochNanos, not a delta; with this implementation each export interval will emit several independent "cumulative" points all of value 1 with the same start/end (or with monotonic-time values), which downstream cumulative-temporality consumers will interpret incorrectly (e.g., as resets to 1 each interval). Either pre-aggregate per (name, attributes) before emission, or declare these as AggregationTemporality.DELTA.
  public void recordIbmMqQueueDepthFullEvent(long value, Attributes attributes) {
    this.metricData.add(
        createMetricData(
            this.resource,
            this.instrumentationScopeInfo,
            "ibm.mq.queue.depth.full.event",
            "The number of full queue events",
            "{event}",
            MetricDataType.LONG_SUM,
            SumData.createLongSumData(
                /* isMonotonic= */ true,
                AggregationTemporality.CUMULATIVE,
                Collections.singletonList(
                    LongPointData.create(
                        this.currentEpochNanos,
                        Clock.getDefault().nanoTime(),
                        attributes,
                        value)))));
  }

Comment thread ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java Outdated
Comment thread ibm-mq-metrics/templates/registry/java/MetricProducer.java.j2 Outdated
Comment thread ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java Outdated
Comment thread ibm-mq-metrics/templates/registry/java/MetricProducer.java.j2 Outdated
@atoulme atoulme changed the title move to produce metrics via MetricProducer [ibm-mq-metrics] move to produce metrics via MetricProducer May 15, 2026
Comment thread ibm-mq-metrics/src/main/java/io/opentelemetry/ibm/mq/metrics/MetricProducer.java Outdated
…etricProducer.java

Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
Comment thread ibm-mq-metrics/templates/registry/java/MetricProducer.java.j2 Outdated
Comment on lines +757 to +758
List<MetricData> collectedPoints = new ArrayList<>(this.metricData);
this.metricData.clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not thread safe.

It's possible for other methods to add data points to the metricData as the new array is being copied but before the clear(). There is real risk of those data points being dropped.

Two possible ways that I can of solving this:

  1. ditch the Collections.synchronizedList for a plain ArrayList, create a mutex object, and explicitly protect all accesses thru the mutex. It's a little cumbersome.
  2. change metricData from List to AtomicReference<List<...>> and instead of clearing it here use metricDataReference.getAndSet(new ArrayList<>()) to atomically swap out the reference and get the underlying (old) list for returning.

Of those, I think 2 is a bit cleaner.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do 2. Thanks for the feedback.

@breedx-splk
Copy link
Copy Markdown
Contributor

I don't fully understand this, but I'll share in case this makes sense to you @atoulme:

The new producer changes counters from “accumulate per series” to “append
one MetricData object per call”, which breaks cumulative sum semantics. For
example, recordIbmMqQueueDepthFullEvent, recordIbmMqUnauthorizedEvent, and
recordIbmMqConnectionErrors each emit a standalone LONG_SUM point with value
1 (MetricProducer.java:235-249, 706-720, 738-752), and the collectors call
them once per event/error (PerformanceEventQueueCollector.java:85-99,
QueueManagerEventCollector.java:54-67, WmqMonitor.java:109-118). If two
matching events happen before the next export, this no longer produces one
cumulative point with value 2; it produces multiple cumulative points for
the same series, each still at 1. The updated tests are already asserting
this append-only shape for gauges too (TopicMetricsCollectorTest.java:59-75,
ListenerMetricsCollectorTest.java:57-65), which suggests the implementation
drift is intentional rather than caught accidentally.

it feels like a type of aggregation is being lost there, but I'm not sure.

Other than that, this looks good. I'm honestly surprised at how much cleaner it makes things! 🥳

Co-authored-by: jason plumb <75337021+breedx-splk@users.noreply.github.com>
@atoulme
Copy link
Copy Markdown
Contributor Author

atoulme commented May 21, 2026

I don't fully understand this, but I'll share in case this makes sense to you @atoulme:

The new producer changes counters from “accumulate per series” to “append
one MetricData object per call”, which breaks cumulative sum semantics. For
example, recordIbmMqQueueDepthFullEvent, recordIbmMqUnauthorizedEvent, and
recordIbmMqConnectionErrors each emit a standalone LONG_SUM point with value
1 (MetricProducer.java:235-249, 706-720, 738-752), and the collectors call
them once per event/error (PerformanceEventQueueCollector.java:85-99,
QueueManagerEventCollector.java:54-67, WmqMonitor.java:109-118). If two
matching events happen before the next export, this no longer produces one
cumulative point with value 2; it produces multiple cumulative points for
the same series, each still at 1. The updated tests are already asserting
this append-only shape for gauges too (TopicMetricsCollectorTest.java:59-75,
ListenerMetricsCollectorTest.java:57-65), which suggests the implementation
drift is intentional rather than caught accidentally.

it feels like a type of aggregation is being lost there, but I'm not sure.

Other than that, this looks good. I'm honestly surprised at how much cleaner it makes things! 🥳

I will have to check this. If MQ reports the cumulative value itself, then we shouldn't keep score.

@atoulme
Copy link
Copy Markdown
Contributor Author

atoulme commented May 22, 2026

I don't fully understand this, but I'll share in case this makes sense to you @atoulme:

I think the AI is right to complain here. I don't do a good job of keeping track of increments of counter values. I added code (making this change uglier, mind you) to keep track of instruments and the counter values.

Copy link
Copy Markdown
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well shucks. Now that I'm looking at this again, I'm pretty sure my guidance about the AtomicReference isn't correct and still leaves a concurrency gap. I'll try to explain the problematic sequence:

  • recordWhatever() method gets the list from the atomic reference.
  • produce() swaps in the new list and returns the old one
  • the caller of produce() starts iterating over the list
  • the same recordWhatever() as above mutates that list while it is being iterated.

So that's either a ConcurrentModificationException waiting to happen and/or lost data.

I think we can use a concurrent BlockingQueue implementation and leverage the fact that drainTo is blocking in order to resolve this. I have PR'd into this PR here atoulme#1. Please have a look! ❤️

@breedx-splk breedx-splk enabled auto-merge May 22, 2026 18:46
Update list type based on errorprone
auto-merge was automatically disabled May 22, 2026 18:59

Head branch was pushed to by a user without write access

@breedx-splk breedx-splk added this pull request to the merge queue May 22, 2026
Merged via the queue into open-telemetry:main with commit b7b1c61 May 22, 2026
22 checks passed
@atoulme atoulme deleted the producer branch May 22, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants