Skip to content

chore: Implement Default Client Side Metrics Otel #12718

Draft
lqiu96 wants to merge 19 commits intomainfrom
datastore-csm-impl-2
Draft

chore: Implement Default Client Side Metrics Otel #12718
lqiu96 wants to merge 19 commits intomainfrom
datastore-csm-impl-2

Conversation

@lqiu96
Copy link
Copy Markdown
Member

@lqiu96 lqiu96 commented Apr 8, 2026

No description provided.

lqiu96 added 19 commits March 31, 2026 23:41
Update DatastoreAdminStubSettings and DatastoreStubSettings to include
the library version via Version.VERSION. Add the two new Version.java
files that hold the library version constant. Update native image
reflect-config.json for both the admin and core stub packages.
…gRPC transport coverage

Replace the old MetricsRecorder / OpenTelemetryMetricsRecorder /
NoOpMetricsRecorder types with the new DatastoreMetricsRecorder family,
which extends GAX's MetricsRecorder interface for a unified recording
contract.

Key changes:
- Delete MetricsRecorder.java, OpenTelemetryMetricsRecorder.java,
  NoOpMetricsRecorder.java and their tests
- Add DatastoreMetricsRecorder interface (with simple getInstance() that
  returns an OTel recorder when metrics are enabled, NoOp otherwise)
- Add NoOpDatastoreMetricsRecorder, OpenTelemetryDatastoreMetricsRecorder,
  and DatastoreMetricsRecorderTest
- Remove the !GRPC transport guard from TelemetryUtils.recordOperationMetrics()
  and attemptMetricsCallable() so all transports record metrics uniformly
- Remove the isHttpTransport field from RetryAndTraceDatastoreRpcDecorator
  and DatastoreImpl; remove buildMetricsTracerFactory() from GrpcDatastoreRpc
- Update TelemetryConstants with the new METRIC_PREFIX, DATASTORE_METER_NAME,
  and typed AttributeKey constants needed by the new recorder classes
- Update DatastoreOptions to pass the full DatastoreOptions to getInstance()
  so the recorder factory can inspect credentials and project at creation time
Add a default-on, private OpenTelemetry SDK pipeline that automatically
exports Datastore client-side metrics to Google Cloud Monitoring without
requiring any user configuration.

Key additions:
- DatastoreBuiltInMetricsView: registers OTel views for metric renaming
  (GAX prefix → custom.googleapis.com/internal/client), custom histogram
  bucket boundaries, and an attribute allowlist that prevents unexpected
  labels from reaching Cloud Monitoring
- DatastoreCloudMonitoringExporter: MetricExporter impl that batches
  TimeSeries and calls MetricServiceClient.createTimeSeries(); graceful
  degradation (logs warnings, never crashes); best-effort flush()
- DatastoreCloudMonitoringExporterUtils: converts OTel MetricData to
  Cloud Monitoring TimeSeries protos
- BuiltInDatastoreMetricsProvider: creates a private OpenTelemetrySdk
  per Datastore client; computes stable client_uid and client_name
  attributes; registers a shutdown hook as a last-resort safety net
- CompositeDatastoreMetricsRecorder: fans out all record*() calls to
  both the built-in and any user-provided backends simultaneously
- DatastoreMetricsRecorder.getInstance(): updated to composite factory
  that wires up the built-in path (unless emulator or env-var disabled)
  alongside any custom OTel backend; adds default close() for lifecycle
- OpenTelemetryDatastoreMetricsRecorder: add ownsOpenTelemetry flag and
  close() so the built-in SDK is flushed when DatastoreImpl.close() runs
- DatastoreOpenTelemetryOptions: add exportBuiltinMetricsToGoogleCloudMonitoring
  flag (default true) with @BetaApi getter and setter; clarify isEnabled()
  Javadoc to note it does not cover the built-in path
- DatastoreImpl.close(): call getMetricsRecorder().close() after RPC
  channel teardown so the built-in SDK flushes buffered metrics
- pom.xml: promote opentelemetry-sdk, opentelemetry-sdk-common, and
  opentelemetry-sdk-metrics from test→compile scope; add
  google-cloud-monitoring and proto-google-cloud-monitoring-v3 compile deps
- New tests: DatastoreCloudMonitoringExporterTest, BuiltInDatastoreMetricsProviderTest,
  DatastoreMetricsRecorderTest (updated), DatastoreBuiltInAndCustomMetricsIT
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces built-in metrics support for the Datastore SDK, allowing client-side metrics to be exported to Google Cloud Monitoring. It adds a new BuiltInDatastoreMetricsProvider, a DatastoreCloudMonitoringExporter, and updates DatastoreOpenTelemetryOptions to manage this feature. My review identified several critical issues: creating a new MetricServiceClient per client instance is inefficient, registering individual shutdown hooks for each client causes a memory leak, the Javadoc for the default metrics export state is contradictory, and the CompositeDatastoreMetricsRecorder.close() method lacks proper exception handling and LIFO ordering for resource cleanup.

SdkMeterProviderBuilder sdkMeterProviderBuilder = SdkMeterProvider.builder();
// Register Datastore-specific views and the PeriodicMetricReader.
DatastoreBuiltInMetricsView.registerBuiltinMetrics(
DatastoreCloudMonitoringExporter.create(projectId, credentials), sdkMeterProviderBuilder);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Creating a new DatastoreCloudMonitoringExporter (and consequently a new MetricServiceClient with its own gRPC channel) for every Datastore client instance is resource-intensive. Each gRPC channel consumes memory, threads, and connection overhead. Consider caching and sharing the MetricServiceClient or the exporter instance, perhaps keyed by projectId and credentials, to improve efficiency.

Resource.create(createResourceAttributes(projectId, databaseId)));
SdkMeterProvider sdkMeterProvider = sdkMeterProviderBuilder.build();
// Ensure cleanup on shutdown.
Runtime.getRuntime().addShutdownHook(new Thread(sdkMeterProvider::close));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Registering a new shutdown hook for every Datastore client instance created will lead to a memory leak. Runtime.addShutdownHook stores these Thread objects in a static list that is only cleared when the JVM exits. Since DatastoreImpl.close() now explicitly closes the metrics recorder (which shuts down the OTel SDK), these hooks are redundant for well-behaved applications. If a 'last-resort' safety net is required, consider a single shared shutdown hook that manages all active providers, or removing the hook entirely and relying on the AutoCloseable contract.

/**
* Sets whether built-in metrics should be exported to Google Cloud Monitoring.
*
* <p>When enabled (the default), client-side metrics are automatically exported to Google Cloud
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a contradiction between the Javadoc and the implementation. The Javadoc states that built-in metrics export is enabled by default, but the Builder constructor (line 132) initializes it to false. Based on the TODO comment at line 131, it seems false is the intended default for now. Please update the Javadoc to reflect this.

Suggested change
* <p>When enabled (the default), client-side metrics are automatically exported to Google Cloud
* <p>When enabled, client-side metrics are automatically exported to Google Cloud

Comment on lines +88 to +92
public void close() {
for (DatastoreMetricsRecorder recorder : recorders) {
recorder.close();
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The close() implementation should be exception-safe and follow LIFO order to ensure that an error closing one recorder does not prevent others from being closed. This aligns with the general rule for managing collections of closeable resources.

  @Override
  public void close() {
    for (int i = recorders.size() - 1; i >= 0; i--) {
      try {
        recorders.get(i).close();
      } catch (Exception e) {
        logger.log(java.util.logging.Level.WARNING, "Failed to close metrics recorder", e);
      }
    }
  }
References
  1. When managing a collection of closeable resources (e.g., scopes), ensure they are closed in the reverse order of their creation (LIFO). The implementation must be exception-safe to prevent resource leaks, meaning all opened resources should be closed even if exceptions occur during their creation or closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant