[fix][ml] Prevent terminated managed ledger from transitioning back to writable state#25795
Open
void-ptr974 wants to merge 2 commits into
Open
[fix][ml] Prevent terminated managed ledger from transitioning back to writable state#25795void-ptr974 wants to merge 2 commits into
void-ptr974 wants to merge 2 commits into
Conversation
When terminate() races with ledger rollover, a delayed rollover completion can overwrite the Terminated state and move the ManagedLedger back to LedgerOpened. The concrete race is: an add fills the current ledger and starts creating the next ledger, terminate() marks the ManagedLedger as Terminated, then the delayed createComplete/updateLedgersIdsComplete callback resumes the old rollover path and reopens the ledger for writes. This commit makes the rollover completion path respect Terminated as a final write state. Late createComplete callbacks close the newly created ledger handle and fail queued adds with ManagedLedgerTerminatedException. Late updateLedgersIdsComplete callbacks return without setting LedgerOpened, and the close-ledger path no longer creates a replacement ledger after termination.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
ManagedLedger.terminate()seals the managed ledger at the current BookKeeper committed boundary. After termination, no new entries should be accepted, and the managed ledger must notbecome writable again.
The key invariant is:
terminate()does not wait for every submitted add to succeed. It closes the current BookKeeper ledger and uses the ledger's final LAC as theterminatedPosition.Therefore:
terminate();ManagedLedgerTerminatedException;There is a race between
terminate()and ledger rollover:ClosingLedger/CreatingLedger.terminate()runs before the rollover create/switch callback finishes and marks the managed ledger asTerminated.createComplete()orupdateLedgersIdsComplete()callback resumes the old rollover flow.LedgerOpened, making a terminated managed ledger writable again.This breaks the terminate semantics. It can also leave pending writes handled as normal rollover writes instead of terminated writes, or leave in-flight add callbacks hanging when
BookKeeper close drains writes that were not included in the final LAC.
Modifications
This change keeps
Terminatedas the final write state afterterminate()takes ownership.The implementation follows one invariant:
To keep that invariant, this PR handles each race path explicitly:
Queued adds that have not been sent to BookKeeper
terminate()takes ownership, these adds are failed withManagedLedgerTerminatedException.In-flight adds already sent to BookKeeper
terminatedPosition.Failed add callbacks after terminate
failAddIfTerminated()for failed add callbacks that arrive after terminate has taken ownership.Terminatedstate,ledgerClosed()returns without replaying or failing the add, leaving the client callbackhanging.
Late ledger create callbacks
createComplete()now completes the ledger-create future before terminal-state checks.Late ledger switch callbacks
updateLedgersIdsComplete()now returns when the managed ledger is alreadyTerminated.LedgerOpened.Replacement ledger creation after close
Terminated.This does not change the public API, protocol, schema, or metric names. It only fixes the behavior of existing state transitions and balances the existing pending ledger-create metric in
the late-callback path.
Verifying this change
This change added tests and can be verified as follows:
terminateDuringLedgerSwitchKeepsTerminatedStateterminate().Terminated.ManagedLedgerTerminatedException.terminatePositionIncludesAddAlreadyAckedByBookKeeperterminate()returns aterminatedPositionthat includes the acknowledged add.terminateFailsInflightAddDrainedByLedgerCloseManagedLedgerTerminatedException.ledgerSwitchCompletionDoesNotReopenTerminatedLedgerTerminatedback toLedgerOpened.Local verification:
./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.ManagedLedgerTerminationTest
./gradlew :managed-ledger:checkstyleMain :managed-ledger:checkstyleTest
Does this pull request potentially affect one of the following parts: