Skip to content

Metric Centralization through Dependency Injection#6354

Merged
san81 merged 1 commit into
opensearch-project:mainfrom
chrisale000:centralize-remaining-metrics
Dec 18, 2025
Merged

Metric Centralization through Dependency Injection#6354
san81 merged 1 commit into
opensearch-project:mainfrom
chrisale000:centralize-remaining-metrics

Conversation

@chrisale000

@chrisale000 chrisale000 commented Dec 12, 2025

Copy link
Copy Markdown
Contributor

Description

  • Migrated M365 metrics to use dependency injection pattern instead of MetricHelper static methods
  • Pros of using the dependency injection framework:
    • Testability: 100% mockable dependencies vs untestable static methods
    • Performance: Zero runtime metric lookups vs repeated expensive counter searches
    • Type Safety: Compile-time error prevention vs runtime string-based failures
    • Short-term: Faster development with fluent APIs and IDE autocomplete
    • Medium-term: Reduced maintenance costs and technical debt elimination
    • Code Quality: SOLID principles compliance vs brittle utility classes
    • Team Productivity: Self-documenting interfaces vs scattered implementation patterns
    • Production Reliability: Context-aware metrics enable rapid issue identification
  • Standardized metric naming and definitions across Office365 source plugin
  • Updated the MetricsHelperTest for better variable management.
  • Fixed intermittent failures in AwsCloudMapPeerListProviderTest by doubling the wait time.

Metrics Available in VendorAPIMetricsRecorder

Search Metrics

searchRequestsSuccess - Counter for successful search requests
searchRequestsFailed - Counter for failed search requests
searchRequestLatency - Timer for search request latency
searchResponseSizeBytes - Summary for search response size in bytes

Get Metrics

getRequestsSuccess - Counter for successful get requests
getRequestsFailed - Counter for failed get requests
getRequestLatency - Timer for get request latency
getResponseSizeBytes - Summary for get response size in bytes

Authentication Metrics

authenticationRequestsSuccess - Counter for successful authentication requests
authenticationRequestsFailed - Counter for failed authentication requests
authenticationRequestLatency - Timer for authentication request latency

Shared Metrics

totalDataApiRequests - Counter for total data API requests
logsRequested - Counter for logs requested

Error Metrics

requestAccessDenied - Counter for access denied errors
requestThrottled - Counter for throttled requests
resourceNotFound - Counter for resource not found errors

Future Work:

  • Migrate other existing 3pE plugins to this new framework.

Testing:

  • Spun up local M365 pipeline and confirmed that metrics were generated.
image
alchrisk_m365_pipeline.microsoft_office365.getRequestLatency.sum
0
alchrisk_m365_pipeline.microsoft_office365.getRequestsSuccess.count
0
alchrisk_m365_pipeline.microsoft_office365.getRequestLatency.count
0
alchrisk_m365_pipeline.microsoft_office365.getResponseSizeBytes.count
0
alchrisk_m365_pipeline.microsoft_office365.getResponseSizeBytes.sum
0
alchrisk_m365_pipeline.microsoft_office365.getRequestsFailed.count
0
alchrisk_m365_pipeline.microsoft_office365.searchResultsFound.count
0
alchrisk_m365_pipeline.microsoft_office365.searchRequestsFailed.count
0
alchrisk_m365_pipeline.microsoft_office365.searchCallLatency.count
5
alchrisk_m365_pipeline.microsoft_office365.searchCallLatency.sum
1.86K
alchrisk_m365_pipeline.microsoft_office365.searchRequestsSuccess.count
5
alchrisk_m365_pipeline.microsoft_office365.searchResponseSizeBytes.count
5
alchrisk_m365_pipeline.microsoft_office365.searchResponseSizeBytes.sum
10
alchrisk_m365_pipeline.microsoft_office365.logsRequested.count
0
alchrisk_m365_pipeline.microsoft_office365.authenticationSuccess.count
0
alchrisk_m365_pipeline.microsoft_office365.authenticationRequests.count
0
alchrisk_m365_pipeline.microsoft_office365.authenticationFailed.count
0
alchrisk_m365_pipeline.microsoft_office365.authenticationLatency.count
0
alchrisk_m365_pipeline.microsoft_office365.authenticationLatency.sum
0
alchrisk_m365_pipeline.microsoft_office365.totalDataApiRequests.count
5
alchrisk_m365_pipeline.microsoft_office365.requestAccessDenied.count
0
alchrisk_m365_pipeline.microsoft_office365.requestThrottled.count
0
alchrisk_m365_pipeline.microsoft_office365.resourceNotFound.count
0
➜  GitHubOsiDataPrepper git:(centralize-remaining-metrics) ✗ ./gradlew :data-prepper-plugins:saas-source-plugins:build

Deprecated Gradle features were used in this build, making it incompatible with Gradle 9.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

For more on this, please refer to https://docs.gradle.org/8.8/userguide/command_line_interface.html#sec:command_line_warnings in the Gradle documentation.

BUILD SUCCESSFUL in 1s
70 actionable tasks: 70 up-to-date
➜  GitHubOsiDataPrepper git:(centralize-remaining-metrics) 

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

*/
private void waitUntilPeerListPopulated(final AwsCloudMapPeerListProvider objectUnderTest) {
await().atMost(5, TimeUnit.SECONDS)
await().atMost(30, TimeUnit.SECONDS)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this value?

@chrisale000 chrisale000 Dec 12, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit tests fails intermittently.

AwsCloudMapPeerListProviderTest > WithSeveralFailedAttempts > listener_gets_list_after_several_failed_attempts() FAILED
java.lang.AssertionError:
Expected: a value greater than <0>
but: <0> was equal to <0>

AwsCloudMapPeerListProviderTest > WithSeveralFailedAttempts > getPeerList_returns_value_after_several_failed_attempts() FAILED
java.lang.AssertionError:
Expected: a value greater than <0>
but: <0> was equal to <0>

Increasing the wait time appeared to fix the issue. So I am assuming there is some scheduling issue with the tests that doesn't allow the full test loop to complete before the 5 second wait period.

*/
private void waitUntilPeerListPopulated(final AwsCloudMapPeerListProvider objectUnderTest) {
await().atMost(5, TimeUnit.SECONDS)
await().atMost(30, TimeUnit.SECONDS)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long. Do you really need this big of a wait time? It increased overall build time, which is already not in a good state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure what is a good wait time but I can try reducing it to 10.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 is definitely better than 30, but I would first understand why 5 is not enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert if need be, I am not sure why the peer list is sometimes not populated within the 5 second time frame but seems to be more consistent with the 10 second time frame and I don't have the bandwidth to take a deeper look at the moment.

@chrisale000 chrisale000 Dec 16, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that it still occasionally fails to populate the list when this is 10 seconds, removed this change.

try {
headers.setBearerAuth(authConfig.getAccessToken());
apiCallsCounter.increment();
provideApiRequestsCounter(pluginMetrics).increment();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure this kind of MetricHeper static method based seggregation is really helping with anything. I would argue that having pluginMetrics.counter(TOTAL_API_REQUESTS).increment being here is more readable than jumping to a method to get that reference. What are we trying to solve with additional method hop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to unify metrics for consistency across plugins—addressing naming inconsistencies, creating a single source of truth for metric definitions, and reducing redundant test burden by centralizing metric logic in one place rather than duplicating it across each plugin. Once this change is merged in, I'll work on migrating the other crawlers over to use this same pattern.

@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch 5 times, most recently from 2aad063 to 9da6959 Compare December 15, 2025 23:12
@chrisale000 chrisale000 changed the title Centralize Remaining Metrics and Fix Unstable Unit Test Metric Centralization through Dependency Injection and Fix Unstable Unit Test Dec 15, 2025
@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch 5 times, most recently from 796fc85 to 8a55dda Compare December 16, 2025 19:05
public Office365RestClient(final Office365AuthenticationInterface authConfig,
final PluginMetrics pluginMetrics) {
// TODO: Abstract into a Office365PluginMetrics
final SearchApiMetricsRecorder searchApiMetrics,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good reason to have 3 record. This constructor would need to be updated if we add additional metric.

We should combine all of them to one Recorder called VendorAPIMetricsRecorder

The usage is like
VendorAPIMetricsRecorder.searchAPICounter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, combine the recorders into one recorder.

@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch from 8a55dda to 7b4ab5f Compare December 16, 2025 23:37
@chrisale000 chrisale000 changed the title Metric Centralization through Dependency Injection and Fix Unstable Unit Test Metric Centralization through Dependency Injection Dec 16, 2025
@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch from 7b4ab5f to e714a37 Compare December 16, 2025 23:48

public VendorAPIMetricsRecorder error(Exception exception) {
// Record specific error types
if (exception instanceof HttpClientErrorException) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we still emit these metrics like previously emit here ? These metrics were agreed upon with ATP team.

    private static final String REQUEST_ACCESS_DENIED  = "requestAccessDenied";
    private static final String REQUEST_THROTTLED = "requestThrottled";
    private static final String RESOURCE_NOT_FOUND = "resourceNotFound";

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will update this.

san81
san81 previously approved these changes Dec 17, 2025
graytaylor0
graytaylor0 previously approved these changes Dec 17, 2025
}, authConfig::renewCredentials);
}, authConfig::renewCredentials, metricsRecorder::recordAuthFailure);
}
return null; // Void return for Supplier

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: returning null is fine and works, but to be cleaner, we can have an overload of recordAuthLatency as a runnable operation in VendorAPIMetricsRecorder this way its a little more explicit for devs later and we can get rid of the redundant return null. Like this:

public void recordAuthLatency(Runnable operation) {
    authLatencyTimer.record(operation);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


public void recordGetResponseSize(ResponseEntity<?> response) {
if (response != null && response.getBody() != null) {
getResponseSizeSummary.record(response.getHeaders().getContentLength());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does getContentLength return -1 if not found? could record -1 as a metric for those cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. We should not record when the value is negative

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick offline discussion summary:

The current implementation already emits -1 following the getContentLength() behavior. However, this could cause a small downward skew when calculating the average of this metric. To avoid this, we'll emit 0 instead of -1 when no body is present.

@chrisale000 chrisale000 dismissed stale reviews from graytaylor0 and san81 via 8adf278 December 17, 2025 21:13
@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch 2 times, most recently from 8adf278 to fe03411 Compare December 17, 2025 21:22
HttpClientErrorException httpE = (HttpClientErrorException) exception;
HttpStatus status = httpE.getStatusCode();

if (status == HttpStatus.FORBIDDEN || status == HttpStatus.UNAUTHORIZED) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: We should use HttpStatus.FORBIDDEN == status instead. Same comment for the other checks here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update that really quick.

@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch from fe03411 to 93df770 Compare December 17, 2025 21:50
String.class
);
log.debug("Started subscription for {}: {}", contentType, response.getBody());
metricsRecorder.recordAuthSuccess();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think m365 subscription is an auth action. We should not emit metric here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renewCredentials function is where auth metric should be emitted.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change, will work on Auth Metrics and API specific metrics in another PR.

@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch from 2520f76 to 23b6494 Compare December 18, 2025 00:20
This change centralizes metrics creation and management by implementing
dependency injection for metrics in the Office365 source plugin and
other remaining components. This ensures consistent metrics handling
across the codebase.

Signed-off-by: Alexander Christensen <alchrisk@amazon.com>
@chrisale000 chrisale000 force-pushed the centralize-remaining-metrics branch from 23b6494 to 8844f10 Compare December 18, 2025 19:08
@san81 san81 merged commit 6ef4e48 into opensearch-project:main Dec 18, 2025
47 checks passed
wandna-amazon pushed a commit to wandna-amazon/data-prepper that referenced this pull request Jan 8, 2026
…t#6354)

This change centralizes metrics creation and management by implementing
dependency injection for metrics in the Office365 source plugin and
other remaining components. This ensures consistent metrics handling
across the codebase.

Signed-off-by: Alexander Christensen <alchrisk@amazon.com>
Co-authored-by: Alexander Christensen <alchrisk@amazon.com>
Signed-off-by: Nathan Wand <wandna@amazon.com>
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
…t#6354)

This change centralizes metrics creation and management by implementing
dependency injection for metrics in the Office365 source plugin and
other remaining components. This ensures consistent metrics handling
across the codebase.

Signed-off-by: Alexander Christensen <alchrisk@amazon.com>
Co-authored-by: Alexander Christensen <alchrisk@amazon.com>
Signed-off-by: Simon ELBAZ <elbazsimon9@gmail.com>
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
…t#6354)

This change centralizes metrics creation and management by implementing
dependency injection for metrics in the Office365 source plugin and
other remaining components. This ensures consistent metrics handling
across the codebase.

Signed-off-by: Alexander Christensen <alchrisk@amazon.com>
Co-authored-by: Alexander Christensen <alchrisk@amazon.com>
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
…t#6354)

This change centralizes metrics creation and management by implementing
dependency injection for metrics in the Office365 source plugin and
other remaining components. This ensures consistent metrics handling
across the codebase.

Signed-off-by: Alexander Christensen <alchrisk@amazon.com>
Co-authored-by: Alexander Christensen <alchrisk@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants