Skip to content

[fix][ml] Prevent terminated managed ledger from transitioning back to writable state#25795

Open
void-ptr974 wants to merge 2 commits into
apache:masterfrom
void-ptr974:fix-managed-ledger-terminate-state-race
Open

[fix][ml] Prevent terminated managed ledger from transitioning back to writable state#25795
void-ptr974 wants to merge 2 commits into
apache:masterfrom
void-ptr974:fix-managed-ledger-terminate-state-race

Conversation

@void-ptr974
Copy link
Copy Markdown
Contributor

@void-ptr974 void-ptr974 commented May 16, 2026

Motivation

ManagedLedger.terminate() seals the managed ledger at the current BookKeeper committed boundary. After termination, no new entries should be accepted, and the managed ledger must not
become writable again.

The key invariant is:

Any add operation that is acknowledged successfully to the caller must have a position less than or equal to the final terminatedPosition.

terminate() does not wait for every submitted add to succeed. It closes the current BookKeeper ledger and uses the ledger's final LAC as the terminatedPosition.

Therefore:

  • adds already acknowledged by BookKeeper and included in LAC may still complete successfully, even if the ManagedLedger callback is delivered after terminate();
  • adds that have not reached LAC, including queued writes waiting for a future ledger, must fail with ManagedLedgerTerminatedException;
  • termination must not create or switch to another ledger to replay pending writes.

There is a race between terminate() and ledger rollover:

  1. An add fills the current ledger and triggers rollover.
  2. The managed ledger moves into ClosingLedger / CreatingLedger.
  3. terminate() runs before the rollover create/switch callback finishes and marks the managed ledger as Terminated.
  4. The delayed createComplete() or updateLedgersIdsComplete() callback resumes the old rollover flow.
  5. The callback can set the state back to LedgerOpened, making a terminated managed ledger writable again.

This breaks the terminate semantics. It can also leave pending writes handled as normal rollover writes instead of terminated writes, or leave in-flight add callbacks hanging when
BookKeeper close drains writes that were not included in the final LAC.

Modifications

This change keeps Terminated as the final write state after terminate() takes ownership.

The implementation follows one invariant:

Any add that completes successfully to the caller must be included in the final terminatedPosition.

To keep that invariant, this PR handles each race path explicitly:

  • Queued adds that have not been sent to BookKeeper

    • When terminate() takes ownership, these adds are failed with ManagedLedgerTerminatedException.
    • They are only waiting for a future ledger, and terminate must not create or switch to another ledger to replay them.
  • In-flight adds already sent to BookKeeper

    • These adds are not failed immediately.
    • BookKeeper decides whether they are inside the final LAC:
      • if BookKeeper has acked the add and advanced LAC, the add may still complete successfully;
      • if BookKeeper close drains the add before it reaches LAC, the add is failed as terminated.
    • This preserves the rule that successful add positions are covered by terminatedPosition.
  • Failed add callbacks after terminate

    • Added failAddIfTerminated() for failed add callbacks that arrive after terminate has taken ownership.
    • Without this, a drained add would enter the normal write-failure path. In Terminated state, ledgerClosed() returns without replaying or failing the add, leaving the client callback
      hanging.
  • Late ledger create callbacks

    • createComplete() now completes the ledger-create future before terminal-state checks.
    • This keeps the timeout checker and pending create-op metric balanced even if the create callback arrives after terminate.
    • If the late callback created a ledger after termination, the ledger is closed and deleted because terminated ledgers cannot use a future ledger for pending writes.
  • Late ledger switch callbacks

    • updateLedgersIdsComplete() now returns when the managed ledger is already Terminated.
    • This prevents a delayed rollover callback from setting the state back to LedgerOpened.
  • Replacement ledger creation after close

    • The close path no longer creates a replacement ledger when the state is already Terminated.
    • Terminate closes the current ledger to seal the committed prefix; it must not continue the rollover flow by creating another writable ledger.

This does not change the public API, protocol, schema, or metric names. It only fixes the behavior of existing state transitions and balances the existing pending ledger-create metric in
the late-callback path.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • terminateDuringLedgerSwitchKeepsTerminatedState

    • Simulates a delayed BookKeeper ledger create callback returning after terminate().
    • Verifies the managed ledger stays Terminated.
    • Verifies queued pending writes fail with ManagedLedgerTerminatedException.
    • Verifies the late-created ledger is deleted.
    • Verifies the ledger-create future is completed and the pending create-op metric is balanced.
    • Verifies reopening the managed ledger still observes the terminated state.
  • terminatePositionIncludesAddAlreadyAckedByBookKeeper

    • Simulates BookKeeper already acking an add and advancing LAC, while the ManagedLedger client callback is delayed.
    • Verifies terminate() returns a terminatedPosition that includes the acknowledged add.
    • Verifies the delayed client callback can still complete successfully with a position inside the terminated range.
  • terminateFailsInflightAddDrainedByLedgerClose

    • Simulates BookKeeper close draining an outstanding add that has not reached LAC.
    • Verifies the add fails with ManagedLedgerTerminatedException.
    • Verifies the pending add queue is cleared.
    • Verifies the callback does not hang.
  • ledgerSwitchCompletionDoesNotReopenTerminatedLedger

    • Verifies a late ledger-switch completion cannot move the state from Terminated back to LedgerOpened.

Local verification:

./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.ManagedLedgerTerminationTest
./gradlew :managed-ledger:checkstyleMain :managed-ledger:checkstyleTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

  When terminate() races with ledger rollover, a delayed rollover completion
  can overwrite the Terminated state and move the ManagedLedger back to
  LedgerOpened. The concrete race is: an add fills the current ledger and starts
  creating the next ledger, terminate() marks the ManagedLedger as Terminated,
  then the delayed createComplete/updateLedgersIdsComplete callback resumes the
  old rollover path and reopens the ledger for writes.

  This commit makes the rollover completion path respect Terminated as a final
  write state. Late createComplete callbacks close the newly created ledger
  handle and fail queued adds with ManagedLedgerTerminatedException. Late
  updateLedgersIdsComplete callbacks return without setting LedgerOpened, and
  the close-ledger path no longer creates a replacement ledger after
  termination.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant