-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][ml] Prevent terminated managed ledger from transitioning back to writable state #25795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
void-ptr974
wants to merge
2
commits into
apache:master
Choose a base branch
from
void-ptr974:fix-managed-ledger-terminate-state-race
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this branch still leaves one terminate-vs-rollover race unresolved. By the time this callback reaches
operationComplete,updateLedgersListAfterRolloverhas already successfully written metadata that includesnewLedger. Ifterminate()is concurrently writing the terminated metadata, the twostore.asyncUpdateLedgerIds(...)calls can still race becauseasyncTerminateis not serialized with thismetadataMutexpath.There are two problematic outcomes: if the rollover metadata update wins first, this branch only closes
lhand relies on a later terminate metadata update to remove the new ledger from metadata, but the unused BookKeeper ledger is not deleted; if terminate wins the metadata version race first, this rollover callback can go throughoperationFailed(BadVersionException)and callhandleBadVersion, fencing a ledger that should remain terminated. The TODO here is therefore part of the correctness fix, not just cleanup.Can we make the in-flight rollover metadata update terminal-state aware as well, e.g. serialize terminate with the same metadata update path, or handle
state == Terminatedin both the success and BadVersion failure callbacks as a stale rollover completion, while ensuring the unused ledger is removed from metadata and deleted if it was already written?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation. I’ve also identified this remaining
terminate-vs-rollover metadata race.
The ManagedLedger code path is quite complex, so I think it would be clearer
and safer to split the related fixes into smaller PRs instead of carrying all
of them in this one. This PR focuses on the terminated-state transition and
pending-add handling.
After this PR is merged, I’ll submit a follow-up PR to address the metadata
update race, including the stale rollover success path, the BadVersion path,
and cleanup of any unused ledger that may have been written to metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unresolving this comment since there's very useful observations and analysis so far and it's useful to keep the thread visible until we agree on how to address the problems in short term and long term.
I agree that the current ManagedLedgerImpl code path is quite complex. It's hard to reason about the correctness since it's a mixture of locks, synchronization and different executors.
Just wondering if terminate should be implemented in a way where the state is set to "Terminating" which would be used to prevent any new operations starting, but existing inflight operations would first be completed before the ManagedLedgerImpl could be transitioned into "Terminated" state? The transitioning from Terminating to Terminated would be handled after the metadata update has been successfully completed. If there's a crash in the metadata update, the ledger will continue to be operational. The client requesting the termination can retry if it doesn't receive a successful response to the termination request.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review. I agree the terminate-vs-rollover metadata race is important.
For this PR, I would like to keep the scope focused on the immediate terminated-state and pending-add behavior: once termination has taken effect, late callbacks should not move the managed ledger back to a writable state, and pending add callbacks should complete consistently.
The metadata race is a different path and the fix is more involved. A focused follow-up PR would need to handle the terminate metadata update together with the
metadataMutex-serialized metadata update path, including:BadVersionExceptionshould not fence a ledger that is already terminating/terminated.Mixing that into this PR would make the review much harder because it combines state/callback behavior with metadata serialization and ledger cleanup. My preference is to keep this PR focused, then send a separate PR for the
metadataMutex/ stale rollover metadata path.I agree that a separate
Terminatingstate would be a cleaner long-term model.The distinction I have in mind is:
Terminating: termination has started on this broker. New writes and new rollovers should be rejected, late callbacks should not move the ledger back to writable, and existing in-flight adds should either drain or fail deterministically.Terminated: the terminate metadata update has succeeded andterminatedPositionis durable.With this model,
Terminatingdoes not need to be the durable state. If the broker crashes before the metadata update succeeds, recovery can treat the ledger as non-terminated and the caller can retry termination. If the metadata update succeeds, recovery observesterminatedPositionand restoresTerminated.This would make the state machine easier to reason about, but it still needs a clear design for how it interacts with rollover callbacks, pending add entries, metadata updates, and
BadVersionExceptionhandling. It also would not replace the need to fix themetadataMutexrace; it mainly gives us a clearer lifecycle boundary so those cases can be handled consistently.I think this is worth doing as a follow-up design/change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just feels that this approach isn't correct. The problem itself is real.
Yes, it most likely makes sense to start with the analysis and design. In Pulsar, it's most natural to document the problem, analysis and design into a PIP before implementation (prototypes and experimentation are obviously recommended and necessary when coming up with the design).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense to me.
I’ll take a deeper look at the overall ManagedLedger termination flow and how these issues should be handled together, including the state transition model, the metadata update/locking path, and stale rollover callbacks.
After that, I’ll try to prepare a proper PIP or design proposal so we can discuss the problem, analysis, and proposed approach more clearly with the community.