Skip to content

HDDS-15578. Prevent throwing InvalidStateTransitionException from updateContainerStateWithSequenceId#10525

Merged
ivandika3 merged 6 commits into
apache:masterfrom
ivandika3:HDDS-15578
Jun 26, 2026
Merged

HDDS-15578. Prevent throwing InvalidStateTransitionException from updateContainerStateWithSequenceId#10525
ivandika3 merged 6 commits into
apache:masterfrom
ivandika3:HDDS-15578

Conversation

@ivandika3

@ivandika3 ivandika3 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

There are methods annotated with @Replicate that can throw InvalidStateTransitionException like ContainerStateManager#updateContainerStateWithSequenceId.

When the method is applied by SCM Ratis, an exception from the StateMachineUpdater path can terminate SCM although it is not really a critical error (e.g. if there are duplicate events, we can simply ignore one).

Example risk (Edit: Due to the makeStateTransitionIdempotent, this is not really valid):

  • Leader submits FINALIZE for OPEN.
  • Before/apply ordering or duplicate report causes the current state to already be CLOSING.
  • Applying FINALIZE at CLOSING is invalid.
  • Exception escapes from replicated apply path.

The chance is very low since most of there is a check of the container state before in updateContainerStateWithSequenceId the caller will check that the current container state is expected, but it's there.

Note that the makeStateTransitionIdempotent also adds a guard to make sure that transition is idempotent.

We can try to fix it by

  • Inside the replicated implementation, catch InvalidStateTransitionException.
  • Log and return without mutation.
  • Treat it as a stale/duplicate lifecycle event, not a fatal replicated-state-machine error.

Or we can simply catch InvalidStateTransitionException SCMStateMachine#applyTransaction, but this means that all future InvalidStateTransitionException are non-fatal.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15578

How was this patch tested?

CI (https://github.com/ivandika3/ozone/actions/runs/27731733234)

@ivandika3 ivandika3 self-assigned this Jun 18, 2026
@ivandika3 ivandika3 marked this pull request as ready for review June 18, 2026 05:20
@adoroszlai adoroszlai requested a review from ashishkumar50 June 18, 2026 14:39

@ashishkumar50 ashishkumar50 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivandika3 Thanks for the patch, LGTM. Need below one clarification.

if (newState.getNumber() > oldState.getNumber()) {
ExecutionUtil.create(() -> {
containers.updateState(id, oldState, newState);
transactionBuffer.addToBuffer(containerStore, id,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above containerInfo.updateSequenceId needs to move here?

@ivandika3 ivandika3 Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's needed since we simply want to catch the InvalidStateTransitionException.

@ivandika3

ivandika3 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@ashishkumar50 Note that this is technically a no-op patch since the SCM transition container lifecycle uses markStateTransitionIdempotent to prevent this invalid transition. However, I think the default behavior of crashing SCM is quite harsh and the exception InvalidStateTransitionException in the method signature gives a wrong signal that we can simply throw InvalidStateTransitionException. I was working on our internal feature that introduces a separate lifecycle and encountered the SCM crash issue which was unexpected.

@ivandika3 ivandika3 requested a review from nandakumar131 June 22, 2026 10:33
@adoroszlai adoroszlai requested a review from szetszwo June 23, 2026 19:30

@szetszwo szetszwo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivandika3 , thanks for working on this! Then change looks good. Just a minor comment inlined.

@szetszwo szetszwo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@ivandika3 ivandika3 merged commit 7543cfd into apache:master Jun 26, 2026
47 checks passed
@ivandika3

Copy link
Copy Markdown
Contributor Author

Thanks @ashishkumar50 @szetszwo for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants