Is this a new feature, an enhancement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
Ingested node was stuck in state HostInitializing/PollingBiosSetup for for ~17h+ (SLA is 30 min). I reset the BMC with Manager.Reset GracefulRestart which did not help. The eventual fix was clicking Machine Setup via UI and doing a Force Restart of the node, which allowed the machine to move to the next state. There should be automatic retries that does exactly this instead of requiring manual intervention.
Feature Description
From an operator perspective, I would like automatic retries for when machines are stuck in state HostInitializing/PollingBiosSetup so that manual intervention is not needed.
Describe your ideal solution
I would not need to intervene when the "fix" to this situation consisted only of staging settings and restarting, which are actions automatic retries, triggered upon a certain amount of time out of SLA, could take.
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct
Is this a new feature, an enhancement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
Ingested node was stuck in state
HostInitializing/PollingBiosSetupfor for ~17h+ (SLA is 30 min). I reset the BMC withManager.Reset GracefulRestartwhich did not help. The eventual fix was clickingMachine Setupvia UI and doing aForce Restartof the node, which allowed the machine to move to the next state. There should be automatic retries that does exactly this instead of requiring manual intervention.Feature Description
From an operator perspective, I would like automatic retries for when machines are stuck in state
HostInitializing/PollingBiosSetupso that manual intervention is not needed.Describe your ideal solution
I would not need to intervene when the "fix" to this situation consisted only of staging settings and restarting, which are actions automatic retries, triggered upon a certain amount of time out of SLA, could take.
Describe any alternatives you have considered
No response
Additional context
No response
Code of Conduct