Conversation
mayankshah1607
left a comment
There was a problem hiding this comment.
LGTM, lets address @egegunes comment
There was a problem hiding this comment.
Pull request overview
This PR adds configuration knobs for controller-runtime leader election timing to the operator’s manifests and wires those env vars into the operator’s manager initialization.
Changes:
- Expose leader election env vars (
PGO_CONTROLLER_LEASE_*) in deployment manifests (including generated bundle YAMLs). - Parse the duration env vars in
initManagerand set controller-runtime leader election options (LeaseDuration,RenewDeadline,RetryPeriod). - Extend
initManagerunit test to assert the new default duration values.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/operator.yaml | Adds leader election-related env vars to the main deployment manifest. |
| deploy/cw-operator.yaml | Adds the same env vars to the CW deployment manifest. |
| deploy/cw-bundle.yaml | Propagates env vars into the CW generated bundle. |
| deploy/bundle.yaml | Propagates env vars into the generated bundle. |
| config/manager/default/manager.yaml | Adds env vars to the controller manager default deployment. |
| cmd/postgres-operator/main.go | Reads duration env vars and sets controller-runtime leader election timing options. |
| cmd/postgres-operator/main_test.go | Asserts default leader election timing values and keeps zero-value checks working. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // K8SPG-915 | ||
| getEnvDuration := func(name string, defaultDur time.Duration) (time.Duration, error) { |
There was a problem hiding this comment.
Can we try to use a library like https://github.com/kelseyhightower/envconfig?tab=readme-ov-file#struct-tag-support ? It will save us a lot of this boilerplate because it supports time.Duration type as well. Example
type LeaderElectionConfig struct {
LeaderElectionEnabled bool `default:"true" envconfig:"PGO_LEADER_ELECTION_ENABLED"`
LeaderElectionID string `default:"everest-leader-election" envconfig:"PGO_LEADER_ELECTION_ID"`
LeaseDuration time.Duration `default:"15s" envconfig:"PGO_CONRTOLLER_LEASE_DURATION"`
RenewDeadline time.Duration `default:"10s" envconfig:"PGO_CONRTOLLER_RENEW_DEADLINE"`
RetryPeriod time.Duration `default:"2s" envconfig:"PGO_CONRTOLLER_RETRY_PERIOD"`
}
func main() {
leaderElectionConfig := &LeaderElectionConfig{}
err := envconfig.Process("", leaderElectionConfig)
if err != nil {
// handle err
}
leaderElectionConfig.IntoOptions(options) // parse into controller runtime
}There was a problem hiding this comment.
I don't like an extra dependency a small feature, but I like this approach, so I implemented it without the dependency: 518e892
There was a problem hiding this comment.
I'd argue that this is a small dependency (~400 LOC) meant for exactly such use-cases - avoiding boilerplate when parsing envvars. I think we should use it rather than re-inventing it ourselves, I don't see the benefit in avoiding it
There was a problem hiding this comment.
Having a small dependency is not the worst thing. On the other hand, the boilerplate Andrii needed to add is not huge either ¯_(ツ)_/¯
Only advantage of using the dependency is the ability to pass durations as env vars I guess. @pooknull if you're not fundamentally against using it, let's use this thing
There was a problem hiding this comment.
The only "issue" is that the library seems unmaintained. The last release happened in 2019 and there is not answer in the open Github issues or PRs from anyone. We should keep that in mind.
There was a problem hiding this comment.
it's a simple library with no external dependencies. I am not sure if we need to worry about its last release date.
There was a problem hiding this comment.
I dont fully agree here. The risk is low, but it’s not zero: newer Go versions can introduce subtle reflect or type-system changes, and if something breaks we fully own the fix. Given the lack of maintainer activity, even small bugs become our responsibility, and some orgs do flag unmaintained dependencies regardless of size.
There was a problem hiding this comment.
anyway, let's move on for now and we can evaluate this again if needed
egegunes
left a comment
There was a problem hiding this comment.
LGTM but consider @mayankshah1607's suggestion
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| type envConfig struct { | ||
| LeaderElection bool `default:"true" envconfig:"PGO_CONTROLLER_LEADER_ELECTION_ENABLED"` | ||
| LeaderElectionID string `envconfig:"PGO_CONTROLLER_LEASE_NAME"` | ||
| LeaderElectionNamespace string `envconfig:"PGO_NAMESPACE"` | ||
|
|
There was a problem hiding this comment.
- Problem:
envConfiguses unexported (lowercase) struct fields, whichgithub.com/kelseyhightower/envconfigcannot populate via reflection, so defaults/env vars may not be applied. - Why it matters: This can silently disable leader election and set zero durations, potentially breaking operator behavior at runtime.
- Fix: Export the fields (e.g.,
LeaderElection,LeaseDuration, etc.) or switch to a parsing approach that does not rely on setting unexported fields.
| // K8SPG-915 | ||
| envs := new(envConfig) | ||
| if err := envconfig.Process("", envs); err != nil { | ||
| return options, errors.Wrap(err, "parse env vars") | ||
| } |
There was a problem hiding this comment.
- Problem:
envconfig.Processnow makesinitManagerfail whenPGO_WORKERSis set to a non-integer value (e.g.,3.14), whereas previously invalid values were logged and ignored. - Why it matters: This is a behavior/compatibility change that can prevent the operator from starting if a bad value is present in an existing deployment.
- Fix: If fail-fast isn’t intended, parse
PGO_WORKERSseparately (string -> int) and keep the previous log-and-ignore behavior for invalid values, while still using envconfig for the new leader-election durations.
| options, err := initManager(ctx) | ||
| assert.NilError(t, err) | ||
| if v == "3.14" { | ||
| assert.ErrorContains(t, err, "parse env vars: envconfig.Process: assigning PGO_WORKERS to Workers: converting '3.14' to type int. details: strconv.ParseInt: parsing \"3.14\": invalid syntax") |
There was a problem hiding this comment.
- Problem: The test asserts an overly specific substring for the envconfig parse error, including library-internal wording.
- Why it matters: Minor changes in
envconfig(or Go’s strconv error text) can break the test even when behavior is still correct. - Fix: Assert on a few stable substrings (e.g., "parse env vars", "PGO_WORKERS", and "invalid syntax") instead of the full error text.
| assert.ErrorContains(t, err, "parse env vars: envconfig.Process: assigning PGO_WORKERS to Workers: converting '3.14' to type int. details: strconv.ParseInt: parsing \"3.14\": invalid syntax") | |
| assert.ErrorContains(t, err, "parse env vars") | |
| assert.ErrorContains(t, err, "PGO_WORKERS") | |
| assert.ErrorContains(t, err, "invalid syntax") |
mayankshah1607
left a comment
There was a problem hiding this comment.
LGTM, thanks for making all the changes
9d46b97
| value: 60s | ||
| - name: PGO_CONTROLLER_RENEW_DEADLINE | ||
| value: 40s | ||
| - name: PGO_CONTROLLER_RETRY_PERIOD | ||
| value: 10s |
There was a problem hiding this comment.
The duration values in this YAML file are inconsistent with config/manager/default/manager.yaml. Here they are unquoted (e.g., 60s), while in config/manager/default/manager.yaml they are quoted (e.g., "60s"). While both formats are technically valid YAML, consistency across deployment files is important for maintainability. All other similar YAML files in this PR (deploy/operator.yaml, deploy/cw-bundle.yaml, deploy/bundle.yaml) use unquoted values. Consider quoting these values to match config/manager/default/manager.yaml, or vice versa, to maintain consistency across all deployment files.
| value: 60s | |
| - name: PGO_CONTROLLER_RENEW_DEADLINE | |
| value: 40s | |
| - name: PGO_CONTROLLER_RETRY_PERIOD | |
| value: 10s | |
| value: "60s" | |
| - name: PGO_CONTROLLER_RENEW_DEADLINE | |
| value: "40s" | |
| - name: PGO_CONTROLLER_RETRY_PERIOD | |
| value: "10s" |
| value: 60s | ||
| - name: PGO_CONTROLLER_RENEW_DEADLINE | ||
| value: 40s | ||
| - name: PGO_CONTROLLER_RETRY_PERIOD | ||
| value: 10s |
There was a problem hiding this comment.
The duration values in this YAML file are inconsistent with config/manager/default/manager.yaml. Here they are unquoted (e.g., 60s), while in config/manager/default/manager.yaml they are quoted (e.g., "60s"). While both formats are technically valid YAML, consistency across deployment files is important for maintainability. Consider quoting these values to match config/manager/default/manager.yaml, or vice versa, to maintain consistency across all deployment files.
| value: 60s | |
| - name: PGO_CONTROLLER_RENEW_DEADLINE | |
| value: 40s | |
| - name: PGO_CONTROLLER_RETRY_PERIOD | |
| value: 10s | |
| value: "60s" | |
| - name: PGO_CONTROLLER_RENEW_DEADLINE | |
| value: "40s" | |
| - name: PGO_CONTROLLER_RETRY_PERIOD | |
| value: "10s" |
| value: 60s | ||
| - name: PGO_CONTROLLER_RENEW_DEADLINE | ||
| value: 40s | ||
| - name: PGO_CONTROLLER_RETRY_PERIOD | ||
| value: 10s |
There was a problem hiding this comment.
The duration values in this YAML file are inconsistent with config/manager/default/manager.yaml. Here they are unquoted (e.g., 60s), while in config/manager/default/manager.yaml they are quoted (e.g., "60s"). While both formats are technically valid YAML, consistency across deployment files is important for maintainability. Consider quoting these values to match config/manager/default/manager.yaml, or vice versa, to maintain consistency across all deployment files.
| value: 60s | ||
| - name: PGO_CONTROLLER_RENEW_DEADLINE | ||
| value: 40s | ||
| - name: PGO_CONTROLLER_RETRY_PERIOD | ||
| value: 10s |
There was a problem hiding this comment.
The duration values in this YAML file are inconsistent with config/manager/default/manager.yaml. Here they are unquoted (e.g., 60s), while in config/manager/default/manager.yaml they are quoted (e.g., "60s"). While both formats are technically valid YAML, consistency across deployment files is important for maintainability. Consider quoting these values to match config/manager/default/manager.yaml, or vice versa, to maintain consistency across all deployment files.
| assert.Assert(t, options.LeaseDuration.Seconds() == 60) | ||
| assert.Assert(t, options.RenewDeadline.Seconds() == 40) | ||
| assert.Assert(t, options.RetryPeriod.Seconds() == 10) | ||
|
|
There was a problem hiding this comment.
The test directly dereferences options.LeaseDuration without first checking if it's nil. While the code currently sets this to a pointer to a non-zero duration, if envconfig.Process were to fail to parse the environment variable or set it to nil for any reason, this test would panic. Consider adding a nil check before dereferencing, similar to how options.Cache.SyncPeriod is checked on line 22-24.
| assert.Assert(t, options.LeaseDuration.Seconds() == 60) | |
| assert.Assert(t, options.RenewDeadline.Seconds() == 40) | |
| assert.Assert(t, options.RetryPeriod.Seconds() == 10) | |
| if assert.Check(t, options.LeaseDuration != nil) { | |
| assert.Assert(t, options.LeaseDuration.Seconds() == 60) | |
| } | |
| if assert.Check(t, options.RenewDeadline != nil) { | |
| assert.Assert(t, options.RenewDeadline.Seconds() == 40) | |
| } | |
| if assert.Check(t, options.RetryPeriod != nil) { | |
| assert.Assert(t, options.RetryPeriod.Seconds() == 10) | |
| } |
| t.Setenv("PGO_CONTROLLER_RETRY_PERIOD", "3s") | ||
| options, err := initManager(ctx) | ||
| assert.NilError(t, err) | ||
|
|
There was a problem hiding this comment.
The test directly dereferences options.LeaseDuration, options.RenewDeadline, and options.RetryPeriod without first checking if they're nil. While the code currently sets these to pointers to non-zero durations, if envconfig.Process were to fail to parse the environment variables or set them to nil for any reason, this test would panic. Consider adding nil checks before dereferencing, similar to how options.Cache.SyncPeriod is checked on line 22-24.
| if options.LeaseDuration == nil { | |
| t.Fatalf("LeaseDuration is nil; expected non-nil *time.Duration") | |
| } | |
| if options.RenewDeadline == nil { | |
| t.Fatalf("RenewDeadline is nil; expected non-nil *time.Duration") | |
| } | |
| if options.RetryPeriod == nil { | |
| t.Fatalf("RetryPeriod is nil; expected non-nil *time.Duration") | |
| } |
gkech
left a comment
There was a problem hiding this comment.
approving but my main concern here still stands: #1433 (comment)
commit: 49fedb1 |
Due to the high volume of requests, we're unable to provide free service for this account. To continue using the service, please upgarde to a paid plan.
https://perconadev.atlassian.net/browse/K8SPG-915
DESCRIPTION
This PR exposes the following operator options through env vars:
PGO_CONTROLLER_LEASE_NAME- determines the name of the resource that leader election will use for holding the leader lockPGO_CONTROLLER_LEASE_DURATION- duration that non-leader candidates will wait to force acquire leadership. This is measured against time of last observed ackPGO_CONTROLLER_RENEW_DEADLINE- duration that the acting controlplane will retry refreshing leadership before giving upPGO_CONTROLLER_RETRY_PERIOD- duration the LeaderElector clients should wait between tries of actionsPGO_CONTROLLER_LEADER_ELECTION_ENABLED- disable/enable leader electionCHECKLIST
Jira
Needs Doc) and QA (Needs QA)?Tests
Config/Logging/Testability