Leader election lease tunables#1442
Conversation
|
@hjensas Resurected and this PR works for me. I made a few adjustments. k8s env's like the values to be strings. I also split the envs into individual variables for the templates as this just seemed cleaner. Also the managers.yaml template is a bit special since it was nested loops. I just reread the openshift guidance, and I would somewhat like to adjust our defaults a bit lower. As is it appears to cause an initial election of the controller-operator to immediately wait over a minute for election. And then it settles out. If you want to make a change with this PR you just edit the CSV for openstack-operator and it applies them to all our operators accordingly. Increasing the settings would have the impact of making things get applied slower I think so with these settings it takes over a minute to apply that config change :/ |
| value: quay.io/openstack-k8s-operators/openstack-operator:latest | ||
| - name: ENABLE_WEBHOOKS | ||
| value: false | ||
| - name: LEASE_DURATION |
There was a problem hiding this comment.
@hjensas I'm inclined to decrease these back towards the defaults. Perhaps 30/20/5 would be a reasonable default for us. And then folks who want to try out OpenShift cluster high availability mode have that option which we could document. The defaults seem to be https://pkg.go.dev/k8s.io/client-go/tools/leaderelection 15/10/2 I think.
There was a problem hiding this comment.
ok, yeah my eyes when blink blink a couple of times when I saw how much difference there was the openshift/library-go compared to https://pkg.go.dev/k8s.io/client-go/tools/leaderelection.
I also noticed in my testing that if the operators where already running - the "old" operator would terminate and then the new one would wait for over a minute to get the lock. It made me think, it would be good if controlled termination of a leader released/deleted the lease ...
Let's try it a bit lower, 30/20/5 sounds good. I can test that a bit, and if I need to patch the csv in labs to bump it a bit more that is ok.
There was a problem hiding this comment.
ok, yeah my eyes when blink blink a couple of times when I saw how much difference there was the openshift/library-go compared to https://pkg.go.dev/k8s.io/client-go/tools/leaderelection.
thank you for that link. I was wondering how lose values were chosen
hjensas
left a comment
There was a problem hiding this comment.
In this job I see: https://softwarefactory-project.io/zuul/t/rdoproject.org/build/2f399e8aa8da40cfa514ec52a9dda98a
openstack-operator-controller-operator
2025-05-14T20:50:08.193Z INFO setup manager configured with lease duration {"seconds": 137}
2025-05-14T20:50:08.193Z INFO setup manager configured with renew deadline {"seconds": 107}
2025-05-14T20:50:08.193Z INFO setup manager configured with retry period {"seconds": 26}
And it propagates to the other operators.
ironic-operator-controller-manager
2025-05-14T20:50:54Z INFO setup manager configured with lease duration {"seconds": 137}
2025-05-14T20:50:54Z INFO setup manager configured with renew deadline {"seconds": 107}
2025-05-14T20:50:54Z INFO setup manager configured with retry period {"seconds": 26}
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprince, hjensas The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test openstack-operator-build-deploy-kuttl-4-18 |
Set leader election LeaseDuration, RenewDealine and RetryPeriod via env variables. Doubling the default values from here: https://pkg.go.dev/k8s.io/client-go/tools/leaderelection See also: https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability Co-authored-by: Dan Prince <dprince@redhat.com> Jira: OSPRH-16335
|
New changes are detected. LGTM label has been removed. |
8e6d567
into
openstack-k8s-operators:main
Set leader election LeaseDuration, RenewDealine and RetryPeriod via env variables.
The values used are the ones mentioned as recommended in:
https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability
Co-authored-by: Dan Prince dprince@redhat.com
Jira: OSPRH-16335