Skip to content

Fix file source infinite re-read in non-tail mode with codec (#6934) #6937

Merged
dlvenable merged 2 commits into
opensearch-project:mainfrom
srikanthpadakanti:fix-file-source-non-tail-codec-loop-6934
Jun 30, 2026
Merged

Fix file source infinite re-read in non-tail mode with codec (#6934) #6937
dlvenable merged 2 commits into
opensearch-project:mainfrom
srikanthpadakanti:fix-file-source-non-tail-codec-loop-6934

Conversation

@srikanthpadakanti

Copy link
Copy Markdown
Collaborator

Description

Two changes:

  1. FileReaderPool.onReaderComplete no longer infers "should I reschedule?" from RotationType. In non-tail mode it marks the checkpoint completed and exits. Tail-mode logic is unchanged.
  2. FileReader.readFileWithCodecOneShot now persists the read and committed offsets to the checkpoint after a successful parse, so a restart does not re-read the file.

Issues Resolved

Resolves #6934
#6934

Check List

  • [ X ] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
  • [ X ] New functionality has javadoc added
  • [ X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ch-project#6934)

FileReaderPool.onReaderComplete inferred the "should I reschedule?"
decision from the completed reader's RotationType, which has no
terminal value. In non-tail mode any path that did not result in
DELETED or CREATE_RENAME (notably NO_ROTATION and the codec one-shot
path that never updates lastRotationType) was rescheduled every
500 ms, producing duplicate events.

Make tail mode the single source of truth for rescheduling: when the
reader completes in non-tail mode, mark the checkpoint completed,
promote pending files, and exit. This restores the documented
"non-tail = read once, stop" contract for the modern path and matches
the behavior of the legacy ClassicFileStrategy.

Resolves opensearch-project#6934

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…oject#6934)

readFileWithCodecOneShot returned without updating the checkpoint
entry, so a successful one-shot read advanced no offset. After the
pool-side fix (no reschedule in non-tail mode), an in-process loop
no longer occurs, but a restart would still re-read the file from
offset 0 and produce duplicate events.

After parseWithCodec returns true, advance readOffset, the
checkpoint's read offset, and the committed offset to the file
size. On parse failure the readErrors counter is incremented and
the offsets stay at zero, matching the pre-fix semantics for the
error path.

Resolves opensearch-project#6934

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
@srikanthpadakanti

Copy link
Copy Markdown
Collaborator Author

Hello @yavmanis @kkondaka @dlvenable Can you please review this PR.

@dlvenable

Copy link
Copy Markdown
Member

Thanks @srikanthpadakanti for fixing this!

@dlvenable dlvenable merged commit e96c877 into opensearch-project:main Jun 30, 2026
76 of 80 checks passed
kkondaka pushed a commit to kkondaka/kk-data-prepper-f2 that referenced this pull request Jul 1, 2026
…rch-project#6934)  (opensearch-project#6937)

* Stop file source from rescheduling readers in non-tail mode (opensearch-project#6934)

FileReaderPool.onReaderComplete inferred the "should I reschedule?"
decision from the completed reader's RotationType, which has no
terminal value. In non-tail mode any path that did not result in
DELETED or CREATE_RENAME (notably NO_ROTATION and the codec one-shot
path that never updates lastRotationType) was rescheduled every
500 ms, producing duplicate events.

Make tail mode the single source of truth for rescheduling: when the
reader completes in non-tail mode, mark the checkpoint completed,
promote pending files, and exit. This restores the documented
"non-tail = read once, stop" contract for the modern path and matches
the behavior of the legacy ClassicFileStrategy.

Resolves opensearch-project#6934

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Persist read offset after non-tail codec one-shot read (opensearch-project#6934)

readFileWithCodecOneShot returned without updating the checkpoint
entry, so a successful one-shot read advanced no offset. After the
pool-side fix (no reschedule in non-tail mode), an in-process loop
no longer occurs, but a restart would still re-read the file from
offset 0 and produce duplicate events.

After parseWithCodec returns true, advance readOffset, the
checkpoint's read offset, and the committed offset to the file
size. On parse failure the readErrors counter is incremented and
the offsets stay at zero, matching the pre-fix semantics for the
error path.

Resolves opensearch-project#6934

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

---------

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] File source re-reads file indefinitely when using codec in non-tail mode

2 participants