chore: Implement Default Client Side Metrics Otel #12718
chore: Implement Default Client Side Metrics Otel #12718
Conversation
Update DatastoreAdminStubSettings and DatastoreStubSettings to include the library version via Version.VERSION. Add the two new Version.java files that hold the library version constant. Update native image reflect-config.json for both the admin and core stub packages.
…gRPC transport coverage Replace the old MetricsRecorder / OpenTelemetryMetricsRecorder / NoOpMetricsRecorder types with the new DatastoreMetricsRecorder family, which extends GAX's MetricsRecorder interface for a unified recording contract. Key changes: - Delete MetricsRecorder.java, OpenTelemetryMetricsRecorder.java, NoOpMetricsRecorder.java and their tests - Add DatastoreMetricsRecorder interface (with simple getInstance() that returns an OTel recorder when metrics are enabled, NoOp otherwise) - Add NoOpDatastoreMetricsRecorder, OpenTelemetryDatastoreMetricsRecorder, and DatastoreMetricsRecorderTest - Remove the !GRPC transport guard from TelemetryUtils.recordOperationMetrics() and attemptMetricsCallable() so all transports record metrics uniformly - Remove the isHttpTransport field from RetryAndTraceDatastoreRpcDecorator and DatastoreImpl; remove buildMetricsTracerFactory() from GrpcDatastoreRpc - Update TelemetryConstants with the new METRIC_PREFIX, DATASTORE_METER_NAME, and typed AttributeKey constants needed by the new recorder classes - Update DatastoreOptions to pass the full DatastoreOptions to getInstance() so the recorder factory can inspect credentials and project at creation time
…atastore-csm-impl-1
Add a default-on, private OpenTelemetry SDK pipeline that automatically exports Datastore client-side metrics to Google Cloud Monitoring without requiring any user configuration. Key additions: - DatastoreBuiltInMetricsView: registers OTel views for metric renaming (GAX prefix → custom.googleapis.com/internal/client), custom histogram bucket boundaries, and an attribute allowlist that prevents unexpected labels from reaching Cloud Monitoring - DatastoreCloudMonitoringExporter: MetricExporter impl that batches TimeSeries and calls MetricServiceClient.createTimeSeries(); graceful degradation (logs warnings, never crashes); best-effort flush() - DatastoreCloudMonitoringExporterUtils: converts OTel MetricData to Cloud Monitoring TimeSeries protos - BuiltInDatastoreMetricsProvider: creates a private OpenTelemetrySdk per Datastore client; computes stable client_uid and client_name attributes; registers a shutdown hook as a last-resort safety net - CompositeDatastoreMetricsRecorder: fans out all record*() calls to both the built-in and any user-provided backends simultaneously - DatastoreMetricsRecorder.getInstance(): updated to composite factory that wires up the built-in path (unless emulator or env-var disabled) alongside any custom OTel backend; adds default close() for lifecycle - OpenTelemetryDatastoreMetricsRecorder: add ownsOpenTelemetry flag and close() so the built-in SDK is flushed when DatastoreImpl.close() runs - DatastoreOpenTelemetryOptions: add exportBuiltinMetricsToGoogleCloudMonitoring flag (default true) with @BetaApi getter and setter; clarify isEnabled() Javadoc to note it does not cover the built-in path - DatastoreImpl.close(): call getMetricsRecorder().close() after RPC channel teardown so the built-in SDK flushes buffered metrics - pom.xml: promote opentelemetry-sdk, opentelemetry-sdk-common, and opentelemetry-sdk-metrics from test→compile scope; add google-cloud-monitoring and proto-google-cloud-monitoring-v3 compile deps - New tests: DatastoreCloudMonitoringExporterTest, BuiltInDatastoreMetricsProviderTest, DatastoreMetricsRecorderTest (updated), DatastoreBuiltInAndCustomMetricsIT
…ne after impl-1 refactor
There was a problem hiding this comment.
Code Review
This pull request introduces built-in metrics support for the Datastore SDK, allowing client-side metrics to be exported to Google Cloud Monitoring. It adds a new BuiltInDatastoreMetricsProvider, a DatastoreCloudMonitoringExporter, and updates DatastoreOpenTelemetryOptions to manage this feature. My review identified several critical issues: creating a new MetricServiceClient per client instance is inefficient, registering individual shutdown hooks for each client causes a memory leak, the Javadoc for the default metrics export state is contradictory, and the CompositeDatastoreMetricsRecorder.close() method lacks proper exception handling and LIFO ordering for resource cleanup.
| SdkMeterProviderBuilder sdkMeterProviderBuilder = SdkMeterProvider.builder(); | ||
| // Register Datastore-specific views and the PeriodicMetricReader. | ||
| DatastoreBuiltInMetricsView.registerBuiltinMetrics( | ||
| DatastoreCloudMonitoringExporter.create(projectId, credentials), sdkMeterProviderBuilder); |
There was a problem hiding this comment.
Creating a new DatastoreCloudMonitoringExporter (and consequently a new MetricServiceClient with its own gRPC channel) for every Datastore client instance is resource-intensive. Each gRPC channel consumes memory, threads, and connection overhead. Consider caching and sharing the MetricServiceClient or the exporter instance, perhaps keyed by projectId and credentials, to improve efficiency.
| Resource.create(createResourceAttributes(projectId, databaseId))); | ||
| SdkMeterProvider sdkMeterProvider = sdkMeterProviderBuilder.build(); | ||
| // Ensure cleanup on shutdown. | ||
| Runtime.getRuntime().addShutdownHook(new Thread(sdkMeterProvider::close)); |
There was a problem hiding this comment.
Registering a new shutdown hook for every Datastore client instance created will lead to a memory leak. Runtime.addShutdownHook stores these Thread objects in a static list that is only cleared when the JVM exits. Since DatastoreImpl.close() now explicitly closes the metrics recorder (which shuts down the OTel SDK), these hooks are redundant for well-behaved applications. If a 'last-resort' safety net is required, consider a single shared shutdown hook that manages all active providers, or removing the hook entirely and relying on the AutoCloseable contract.
| /** | ||
| * Sets whether built-in metrics should be exported to Google Cloud Monitoring. | ||
| * | ||
| * <p>When enabled (the default), client-side metrics are automatically exported to Google Cloud |
There was a problem hiding this comment.
There is a contradiction between the Javadoc and the implementation. The Javadoc states that built-in metrics export is enabled by default, but the Builder constructor (line 132) initializes it to false. Based on the TODO comment at line 131, it seems false is the intended default for now. Please update the Javadoc to reflect this.
| * <p>When enabled (the default), client-side metrics are automatically exported to Google Cloud | |
| * <p>When enabled, client-side metrics are automatically exported to Google Cloud |
| public void close() { | ||
| for (DatastoreMetricsRecorder recorder : recorders) { | ||
| recorder.close(); | ||
| } | ||
| } |
There was a problem hiding this comment.
The close() implementation should be exception-safe and follow LIFO order to ensure that an error closing one recorder does not prevent others from being closed. This aligns with the general rule for managing collections of closeable resources.
@Override
public void close() {
for (int i = recorders.size() - 1; i >= 0; i--) {
try {
recorders.get(i).close();
} catch (Exception e) {
logger.log(java.util.logging.Level.WARNING, "Failed to close metrics recorder", e);
}
}
}References
- When managing a collection of closeable resources (e.g., scopes), ensure they are closed in the reverse order of their creation (LIFO). The implementation must be exception-safe to prevent resource leaks, meaning all opened resources should be closed even if exceptions occur during their creation or closing.
No description provided.