Skip to content

Add read failure metric to S3 source#6258

Merged
dinujoh merged 1 commit into
opensearch-project:mainfrom
mzurita-amz:s3-object-read-failed-metric
Nov 12, 2025
Merged

Add read failure metric to S3 source#6258
dinujoh merged 1 commit into
opensearch-project:mainfrom
mzurita-amz:s3-object-read-failed-metric

Conversation

@mzurita-amz

Copy link
Copy Markdown
Contributor

Description

Adding read failure metric to S3 source for SQS notification and Select scan modes. This metric attempts to catch codec, compression and stream failures that act against the S3 payload retrieved from GetObject.

Issues Resolved

N/A

Check List

  • New functionality includes testing.
  • [-] New functionality has a documentation issue. Please link to it in this PR.
    • [-] New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@viquer viquer left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some changes on S3SelectObjectWorkerIT to make sure we prove it's correctly flagging only one ReadFailed when it's really a problem on the read phase?

assertThrows(IOException.class,
() -> createObjectUnderTest(s3ObjectPluginMetrics).processS3Object(s3ObjectReference, S3DataSelection.DATA_AND_METADATA, acknowledgementSet, null, null));

verify(s3ObjectReadFailedCounter).increment();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we verify that it's only called once on all the different paths? My concern is that somehow we are double counting.

}
bufferAccumulator.add(eventRecord);
} catch (final Exception ex) {
s3ObjectPluginMetrics.getS3ObjectReadFailedCounter().increment();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could fail because of writing to the buffer, can't it?

Also the actual transformation to JSON happens on line 272 and 274 (I assume if 272 worked, 274 will work.

try {
bufferAccumulator.flush();
} catch (Exception e) {
s3ObjectPluginMetrics.getS3ObjectReadFailedCounter().increment();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a read failure or more an internal failure? Maybe I am not getting this right, but here seems like we are trying to flush the buffer (either in memory or persistent buffer) and that fails. But I would not categorise this as a READ_FAILURE.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is a problem with writing to the buffer. The read has already happened. So this would be more of an internal exception and not an issue with the Data Prepper to S3 read.

((ObjectNode) node).set(S3_BUCKET_REFERENCE_NAME, getS3BucketObjectReference(s3ObjectReference, mapper));
return Optional.of(node);
} catch (final Exception e) {
s3ObjectPluginMetrics.getS3ObjectReadFailedCounter().increment();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to happen AFTER we have read the JSON object, so I would neither categorise this as a READ_FAILURE either.

return inputFile.getLength();
} catch (final Exception e) {
s3ObjectPluginMetrics.getS3ObjectReadFailedCounter().increment();
LOG.error("Error reading from S3 object: s3ObjectReference={}. {}", s3ObjectReference, e.getMessage());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are throwing this exception, it will get logged again. Maybe a good solution here would be to create an exception for S3ReadFailedException and then throw new S3ReadFailedException(e) on line 166.

try {
bufferAccumulator.flush();
} catch (Exception e) {
s3ObjectPluginMetrics.getS3ObjectReadFailedCounter().increment();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this is a problem with writing to the buffer. The read has already happened. So this would be more of an internal exception and not an issue with the Data Prepper to S3 read.

finalJson.put(generateColumnHeader(column+1),jsonObject.get("_"+(column+1)));
}
} catch (final Exception e) {
s3ObjectPluginMetrics.getS3ObjectReadFailedCounter().increment();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the S3 Select worker should count this metric unless there is an IO issue to S3. For S3 Select, S3 is responsible for reading the object. And then it gives a well-defined response format. So any error here would be related to how we read the S3 content, right?

@mzurita-amz mzurita-amz force-pushed the s3-object-read-failed-metric branch from 38f4336 to 56a9e08 Compare November 12, 2025 17:34
@@ -0,0 +1,16 @@
/*

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzurita-amz mzurita-amz force-pushed the s3-object-read-failed-metric branch from 7947cad to 01d69ee Compare November 12, 2025 18:55
Signed-off-by: Manuel Mangas Zurita <mzurita@amazon.com>
@mzurita-amz mzurita-amz force-pushed the s3-object-read-failed-metric branch from 01d69ee to c5ecaae Compare November 12, 2025 18:56

@dlvenable dlvenable left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mzurita-amz !

@dinujoh dinujoh merged commit 7ad502b into opensearch-project:main Nov 12, 2025
46 of 47 checks passed
@mzurita-amz mzurita-amz deleted the s3-object-read-failed-metric branch November 13, 2025 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants