Fix file source infinite re-read in non-tail mode with codec (#6934) #6937
Merged
dlvenable merged 2 commits intoJun 30, 2026
Conversation
…ch-project#6934) FileReaderPool.onReaderComplete inferred the "should I reschedule?" decision from the completed reader's RotationType, which has no terminal value. In non-tail mode any path that did not result in DELETED or CREATE_RENAME (notably NO_ROTATION and the codec one-shot path that never updates lastRotationType) was rescheduled every 500 ms, producing duplicate events. Make tail mode the single source of truth for rescheduling: when the reader completes in non-tail mode, mark the checkpoint completed, promote pending files, and exit. This restores the documented "non-tail = read once, stop" contract for the modern path and matches the behavior of the legacy ClassicFileStrategy. Resolves opensearch-project#6934 Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…oject#6934) readFileWithCodecOneShot returned without updating the checkpoint entry, so a successful one-shot read advanced no offset. After the pool-side fix (no reschedule in non-tail mode), an in-process loop no longer occurs, but a restart would still re-read the file from offset 0 and produce duplicate events. After parseWithCodec returns true, advance readOffset, the checkpoint's read offset, and the committed offset to the file size. On parse failure the readErrors counter is incremented and the offsets stay at zero, matching the pre-fix semantics for the error path. Resolves opensearch-project#6934 Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Collaborator
Author
|
Hello @yavmanis @kkondaka @dlvenable Can you please review this PR. |
dlvenable
approved these changes
Jun 30, 2026
Member
|
Thanks @srikanthpadakanti for fixing this! |
kkondaka
pushed a commit
to kkondaka/kk-data-prepper-f2
that referenced
this pull request
Jul 1, 2026
…rch-project#6934) (opensearch-project#6937) * Stop file source from rescheduling readers in non-tail mode (opensearch-project#6934) FileReaderPool.onReaderComplete inferred the "should I reschedule?" decision from the completed reader's RotationType, which has no terminal value. In non-tail mode any path that did not result in DELETED or CREATE_RENAME (notably NO_ROTATION and the codec one-shot path that never updates lastRotationType) was rescheduled every 500 ms, producing duplicate events. Make tail mode the single source of truth for rescheduling: when the reader completes in non-tail mode, mark the checkpoint completed, promote pending files, and exit. This restores the documented "non-tail = read once, stop" contract for the modern path and matches the behavior of the legacy ClassicFileStrategy. Resolves opensearch-project#6934 Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Persist read offset after non-tail codec one-shot read (opensearch-project#6934) readFileWithCodecOneShot returned without updating the checkpoint entry, so a successful one-shot read advanced no offset. After the pool-side fix (no reschedule in non-tail mode), an in-process loop no longer occurs, but a restart would still re-read the file from offset 0 and produce duplicate events. After parseWithCodec returns true, advance readOffset, the checkpoint's read offset, and the committed offset to the file size. On parse failure the readErrors counter is incremented and the offsets stay at zero, matching the pre-fix semantics for the error path. Resolves opensearch-project#6934 Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> --------- Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Two changes:
FileReaderPool.onReaderCompleteno longer infers "should I reschedule?" from RotationType. In non-tail mode it marks the checkpoint completed and exits. Tail-mode logic is unchanged.FileReader.readFileWithCodecOneShotnow persists the read and committed offsets to the checkpoint after a successful parse, so a restart does not re-read the file.Issues Resolved
Resolves #6934
#6934
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.