Onboarding: Cleanup all test servers#134
Conversation
| }).AllPages(ctx) | ||
|
|
||
| if err != nil && !gophercloud.ResponseCodeIs(err, http.StatusNotFound) { | ||
| return ctrl.Result{}, err |
There was a problem hiding this comment.
Nova errors are not reconciliation errors, should just reschedule and maybe print a log message, optionally add a condition.
There was a problem hiding this comment.
That is how we handle the errors all over the code. If I just retry, I won't get exponential backoff.
There was a problem hiding this comment.
throwing errors don't do exponential backoff, but instant retry - which causes an infinite loop that starves the other controllers.
There was a problem hiding this comment.
so according the code: https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/internal/controller/controller.go#L470
throwing an error has pretty much the same effect as returning an immediate Requeue. The rate limiter should have taken care either way of an exponential backoff, but in our setup I've seen that the operator was looping for infinity, without any backoff - so not sure what went wrong here.
Anyway, we should agree on one way how and when we throw errors inside an reconciliation loop, since we have now two patterns.
I prefer handling Openstack errors (or external errors) graciously and expose them via condition for better debugging, this is way easier to debug than to investigate the logs. Also you don't want to hammer the openstack API with the operator, neither via backoff nor without.
There was a problem hiding this comment.
kubernetes-sigs/controller-runtime#731 maybe needs some love
There was a problem hiding this comment.
Anyway, we should agree on one way how and when we throw errors inside an reconciliation loop, since we have now two patterns.
I'd argue, that it is out of scope of this ticket to decide that, and I haven't changed how the code returns error.
There was a problem hiding this comment.
ing an error has pretty much the same effect as returning an immediate
Requeue. The rate limiter should have taken care either way of an exponential backoff, but in our setup I've seen that the operator was looping for infinity, without any backoff - so not sure what went wrong here.
valid. ok then let's get this change merged first.
607ccc3 to
36f7df7
Compare
The prior logic only works, if we run through the cycle completely, but leaks resources, if the node uid changes. So, we delete now all instances which match the prefix excluding the uid.
The prior logic only works, if we run through the cycle completely, but leaks resources, if the node uid changes.
So, we delete now all instances which match the prefix excluding the uid.