Skip to content

Do not clear offsets after failure to commit offsets due to rebalance exception#6346

Merged
graytaylor0 merged 1 commit into
opensearch-project:mainfrom
graytaylor0:KafkabufferImprovements
Dec 15, 2025
Merged

Do not clear offsets after failure to commit offsets due to rebalance exception#6346
graytaylor0 merged 1 commit into
opensearch-project:mainfrom
graytaylor0:KafkabufferImprovements

Conversation

@graytaylor0

@graytaylor0 graytaylor0 commented Dec 9, 2025

Copy link
Copy Markdown
Member

Description

This change handles RebalanceInProgressException by not clearing offsets on this exception and retrying, rather than completely clearing offsets and not committing them, which can lead to duplicate processing unnecessarily (in the case that retry would succeed after rebalance)

Tested by modifying the integration test locally to kill one of the consumers and verifying that the data is still processed.

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

} catch (final RebalanceInProgressException ex) {
LOG.error("Failed to commit offsets in topic {} due to rebalance in progress", topicName, ex);
try {
Thread.sleep(100);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we sleep? Is it related to retrying? If so, can we add some check that we are not still rebalancing?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep is meant to add some backoff to let the rebalance complete without just spamming commit offsets. Supposedly we have to call poll again to complete the rebalance though, so maybe sleep is not that useful.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It looks like sleep() is not needed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a revision removing the sleep

throw new RuntimeException("Interrupted while waiting to retry after failing to commit offsets");
}
return;
} catch (Exception e) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this exception set the lastCommitTime. But, now we don't. Is that ok?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it was incorrectly saying we committed offsets when we didn't if we get an exception

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable it is correct because the last commit time should be set only if commit really happens.

… exception

Signed-off-by: Taylor Gray <tylgry@amazon.com>
@graytaylor0 graytaylor0 force-pushed the KafkabufferImprovements branch from 5f72d6c to fd976fe Compare December 10, 2025 16:44
@graytaylor0 graytaylor0 merged commit 1125cd1 into opensearch-project:main Dec 15, 2025
47 of 58 checks passed
wandna-amazon pushed a commit to wandna-amazon/data-prepper that referenced this pull request Jan 8, 2026
… exception (opensearch-project#6346)

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Nathan Wand <wandna@amazon.com>
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
… exception (opensearch-project#6346)

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Simon ELBAZ <elbazsimon9@gmail.com>
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants