Skip to content

CloudWatchLogs Sink: Update max event size and drop error events if DLQ is not configured#6154

Merged
kkondaka merged 3 commits into
opensearch-project:mainfrom
kkondaka:cloudwatch-sink-fixes
Oct 17, 2025
Merged

CloudWatchLogs Sink: Update max event size and drop error events if DLQ is not configured#6154
kkondaka merged 3 commits into
opensearch-project:mainfrom
kkondaka:cloudwatch-sink-fixes

Conversation

@kkondaka

@kkondaka kkondaka commented Oct 9, 2025

Copy link
Copy Markdown
Collaborator

Description

CloudWatchLogs Sink

  1. Update max event size threshold
  2. drop error events if DLQ is not configured

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [X ] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [ X] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Size(min = 1, max = 256, message = "max_event_size amount should be between 1 to 256 kilobytes")
private String maxEventSize = DEFAULT_EVENT_SIZE;
@JsonProperty(value = "max_event_size", defaultValue="1mb")
private String maxEventSize = DEFAULT_MAX_EVENT_SIZE;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow you to simplify your validations below since you will already have a ByteCount to use.


@Test
void testWithLargeSingleMessagesWhenDLQNotConfigured() {
s3Client = S3Client.builder()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be unused. Why would we have an S3Client in a CloudWatch Logs test anyway?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DLQ sending to s3, it is needed in some tests

GetLogEventsResponse response = cloudWatchLogsClient.getLogEvents(getRequest);
List<OutputLogEvent> events = response.events();
assertThat(events.size(), equalTo(NUM_RECORDS));
for (int i = 0; i < events.size(); i++) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move this part to outside of the await() block. This way we are timing out on reaching the expected records.

long maxEventSizeBytes = ByteCount.parse(maxEventSize).getBytes();
long maxRequestSizeBytes = ByteCount.parse(maxRequestSize).getBytes();

return maxEventSizeBytes > 0 &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block inspired me to create a PR to make ByteCount Comparable in #6156.

What that PR and a change to the fields, you can simplify this:

@AssertTrue(message = "Both the maximum event size and maximum request size must be configured with a value greater than zero (0) and less than 1 megabyte (MB)")
boolean isValidConfig() {
  return maxEventSize > ByteCount.zeroBytes() &&
    maxEventSize < ByteCount.parse("1mb") &&
    maxRequestSize < ByteCount.zeroBytes() &&
    maxRequestSize < ByteCount.parse("1mb");
}

I'm also going to look into making a custom validator. this would be nice!

…LQ is not configured

Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
@kkondaka kkondaka force-pushed the cloudwatch-sink-fixes branch from 25db334 to 19a6170 Compare October 13, 2025 22:39
dlvenable
dlvenable previously approved these changes Oct 14, 2025

@dlvenable dlvenable left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements, thanks!

Signed-off-by: Kondaka <krishkdk@amazon.com>
public static final String CLOUDWATCH_LOGS_EVENTS_SUCCEEDED = "cloudWatchLogsEventsSucceeded";
public static final String CLOUDWATCH_LOGS_EVENTS_FAILED = "cloudWatchLogsEventsFailed";
public static final String CLOUDWATCH_LOGS_REQUESTS_FAILED = "cloudWatchLogsRequestsFailed";
public static final String CLOUDWATCH_LOGS_LARGE_EVENTS_DROPPED = "cloudWatchLogsLargeEventsDropped";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have a metric specifically for this. There could be other reasons to drop events. Then we'd need a metric for each one.

It seems to me that we could solve this with a generic cloudWatchLogsEventsDropped metric. This could alert Data Prepper administrators that events are being dropped. Then other metrics and/or logs can clue them into the exact issue. In this case, we can look at the distribution summary.

I think as it is, Data Prepper has a lot of metrics. I think we should consider ways to avoid continuing this growth.

@kkondaka kkondaka merged commit 5ad4885 into opensearch-project:main Oct 17, 2025
48 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants