Hi,
We have multiple observations of this error across multiple customers that only "sometimes" the user-cluster MD does not create new Nodes and the logs throw below error in machine-controller pod.
{"level":"error","time":"2026-02-27T06:03:38.291Z","logger":"machine-controller","caller":"machine/controller.go:388","msg":"Reconciling failed","machine":"kube-system/xxxxx-nngcb","error":"cloud-init configuration: cloud config \"bootstrap\" is not ready yet"}
The error - upon investigation happens due to failure from OSM controller to update the bootstrap secret. Restarting machine-controller pods or OSM pods does not make desired impact. Only way to resolve the situation is to manually delete the <machine-deployment>-kube-system-bootstrap-config secret in user-cluster's cloud-init-settings namespace. After deletion of the secret, if we restart OSM pod, it recreates those secrets properly and then machine-controller can create the nodes.
We don't have a reproducible conditions for this. But mostly it happens when there is a long time between OSP change and node rotation like a month etc.
We have seen this at two customers between Myself and Akash. Also it seems Demo env also faces this issue. So My request is to "generally" look at the code to see why will OSC ignores to update the secret even when it should.
Hi,
We have multiple observations of this error across multiple customers that only "sometimes" the user-cluster MD does not create new Nodes and the logs throw below error in machine-controller pod.
The error - upon investigation happens due to failure from OSM controller to update the bootstrap secret. Restarting machine-controller pods or OSM pods does not make desired impact. Only way to resolve the situation is to manually delete the
<machine-deployment>-kube-system-bootstrap-configsecret in user-cluster'scloud-init-settingsnamespace. After deletion of the secret, if we restart OSM pod, it recreates those secrets properly and then machine-controller can create the nodes.We don't have a reproducible conditions for this. But mostly it happens when there is a long time between OSP change and node rotation like a month etc.
We have seen this at two customers between Myself and Akash. Also it seems Demo env also faces this issue. So My request is to "generally" look at the code to see why will OSC ignores to update the secret even when it should.