Skip to content

M365 Crawler Metric, Buffer, and Unit Test Updates#6142

Merged
graytaylor0 merged 1 commit into
opensearch-project:mainfrom
chrisale000:load-test-ais
Oct 29, 2025
Merged

M365 Crawler Metric, Buffer, and Unit Test Updates#6142
graytaylor0 merged 1 commit into
opensearch-project:mainfrom
chrisale000:load-test-ais

Conversation

@chrisale000

@chrisale000 chrisale000 commented Oct 6, 2025

Copy link
Copy Markdown
Contributor

Description

This pull request implements comprehensive monitoring enhancements and performance optimizations:

Core Functionality:

  • Implemented TokenPaginationCrawler metrics integration for enhanced monitoring.
  • Integrated comprehensive M365 telemetry including API payload size, latency, and success rate measurements.
  • Updated M365 Partition Execution Logic to write records to the buffer for each page processed in searchAuditsLogs.
  • Updated the acknowledgementSet to complete in partition execution function rather than buffer write function.

Testing and Validation:

  • Verified metrics emission to CloudWatch through local testing
  • Expanded unit test coverage for new metrics implementation
  • Updated crawler client test suite to validate buffer writing logic
  • Resolved logging-related build configuration issues in build.gradle

The changes have been thoroughly tested and documented, with all tests passing successfully. The implementation follows established coding standards and includes appropriate unit test coverage.

Issues Resolved

N/A

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
  • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

new ParameterizedTypeReference<>() {}
);
// Record search request size.
searchRequestSizeSummary.record(response.getHeaders().getContentLength());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we recording the response header length for the search request size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method returns the size of the body in terms of bytes: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/http/HttpHeaders.html#getContentLength()

I could match the implementation with getAuditLog or vice versa. The main difference here was one is the size of bytes from the string and the other is from responseEntity itself.

private static final String AUDIT_LOG_FETCH_LATENCY = "auditLogFetchLatency";
private static final String SEARCH_CALL_LATENCY = "searchCallLatency";
private static final String AUDIT_LOGS_REQUESTED = "auditLogsRequested";
private static final String AUDIT_LOG_REQUEST_SIZE = "auditLogRequestSizeBytes";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't naming these RequestSize a little misleading? It looks like this is measuring the response size / the size of the audit logs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

if (configuration.isAcknowledgments()) {
acknowledgementSet.add(record.getData());
buffer.write(record, (int) Duration.ofSeconds(BUFFER_TIMEOUT_IN_SECONDS).toMillis());
acknowledgementSet.complete();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is correct. The "acknowledgementSet.complete();" should be done outside of the loop.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or create an acknowledgementset inside the loop

@chrisale000 chrisale000 Oct 17, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. You are correct - I will move the acknowledgementSet.Complete() to avoid premature acknowledgementSet completion.

@chrisale000 chrisale000 Oct 17, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed acknowledgementSet.complete() to execute partition logic where all events are being processed.

@chrisale000 chrisale000 force-pushed the load-test-ais branch 2 times, most recently from ddc117c to ad597c2 Compare October 17, 2025 23:43
if (record != null) {
records.add(record);
// Write each record individually as it's processed
writeRecordWithRetry(record, buffer, acknowledgementSet);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should write records for each api instead of writing per event. Writing per event has a bad performance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is per API call for each getAuditLog API call. This is to reduce the potential of buffer throttling for too many events. The AI is to remove batch processing the events.

san81
san81 previously approved these changes Oct 28, 2025

@san81 san81 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Just Make sure to resolve the merge conflicts

@chrisale000 chrisale000 force-pushed the load-test-ais branch 3 times, most recently from 047aaa8 to 5c07ead Compare October 28, 2025 20:25
@chrisale000 chrisale000 force-pushed the load-test-ais branch 5 times, most recently from 50d535c to 2566176 Compare October 28, 2025 21:49
graytaylor0
graytaylor0 previously approved these changes Oct 28, 2025
san81
san81 previously approved these changes Oct 28, 2025
@chrisale000 chrisale000 dismissed stale reviews from san81 and graytaylor0 via 6a23949 October 29, 2025 17:09
@chrisale000 chrisale000 force-pushed the load-test-ais branch 2 times, most recently from 6a23949 to 48f1213 Compare October 29, 2025 17:10
Signed-off-by: Alex Christensen <alchrisk@amazon.com>
@graytaylor0 graytaylor0 merged commit ddd0ce0 into opensearch-project:main Oct 29, 2025
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants