Skip to content

[FLINK-39507] Terminal jobs should never be restarted by cluster/job …#1097

Merged
gyfora merged 1 commit into
apache:mainfrom
gyfora:FLINK-39507
Apr 27, 2026
Merged

[FLINK-39507] Terminal jobs should never be restarted by cluster/job …#1097
gyfora merged 1 commit into
apache:mainfrom
gyfora:FLINK-39507

Conversation

@gyfora
Copy link
Copy Markdown
Contributor

@gyfora gyfora commented Apr 21, 2026

What is the purpose of the change

Currently the cluster / job health check logic is sometimes executed on terminal/failed jobs which can lead to the operator trying to restart these from HA metadata inevitably leading to an unrecoverable failure.
We should simply exclude these deployments based on the job status.

This PR adds the required checks on job status and HA metadata to avoid unrecoverable errors.

Verifying this change

Unit tests extended

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changes to the CustomResourceDescriptors: no
  • Core observer or reconciler logic that is regularly executed: yes

Documentation

  • Does this pull request introduce a new feature? no

@gyfora gyfora merged commit 3246630 into apache:main Apr 27, 2026
216 of 239 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants