Skip to content

[FLINK-38077] Make sure jobmanager is accessible when trying to cancel for suspend/upgrade#1050

Merged
gyfora merged 1 commit into
apache:mainfrom
gyfora:FLINK-38077
Jan 14, 2026
Merged

[FLINK-38077] Make sure jobmanager is accessible when trying to cancel for suspend/upgrade#1050
gyfora merged 1 commit into
apache:mainfrom
gyfora:FLINK-38077

Conversation

@gyfora
Copy link
Copy Markdown
Contributor

@gyfora gyfora commented Jan 14, 2026

What is the purpose of the change

Cancel job upgrades (last-state without HA or when we fall back from savepoint) currently can get indefinitely stuck if the JobManager deployment is missing or otherwise the REST API not accessible.
This is caused by a bug in the "isCancellable" logic that decides that cancel should be used for upgrade or not.

Verifying this change

Unit tests added, tested in various environments

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: no
  • Core observer or reconciler logic that is regularly executed: yes

Documentation

  • Does this pull request introduce a new feature? no

@gyfora gyfora requested review from gaborgsomogyi and mxm January 14, 2026 12:58
@gyfora gyfora merged commit 824d96f into apache:main Jan 14, 2026
118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants