Add metric tracking total number of open shards, stop setting ending sequence number once open shards are discovered#6260
Merged
Conversation
…ust because there is no record at the ending sequence number of that shard Signed-off-by: Taylor Gray <tylgry@amazon.com>
cf553ec to
dd1f2cb
Compare
san81
approved these changes
Nov 12, 2025
dinujoh
approved these changes
Nov 12, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds a metric
totalOpenShardsas a DistributionSummary that will be reported roughly every 10 minutes. This will show how many shards without anendingSequenceNumberare contained in the DynamoDB streams, and is useful to track due to the limitation of each Data Prepper instance only being able to process 150 open shards in parallel.This change also fixes a data loss scenario where shards could be skipped if there were no records returned in a call to GetRecords using the
endingSequenceNumberof the shard to check AT_SHARD_ITERATOR if there were any records in the shard, and was incorrectly assuming this meant that shards are 100% empty in this case. Now, instead of assuming the shard is empty, we will paginate fully through it to look for records, and filter out records from before the export or start time. This could result on some extra pagination through shards during and right after the export, but it will not result in duplicate processing due to the filtering based on timestamp.Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.