docs: recreate without leader election (cherry-pick argoproj#16034 for 4.0) (argoproj#16046)

argo-cd-cherry-pick-bot[bot] · Joibel · web-flow · commit 9be7e269736e · 2026-04-27T11:36:14.000+01:00
Signed-off-by: Alan Clucas &lt;alan@clucas.org&gt;
Co-authored-by: Alan Clucas &lt;alan@clucas.org&gt;
diff --git a/.spelling b/.spelling
@@ -231,6 +231,8 @@ rc2
 repo
 retryStrategy
 roadmap
+rollout
+rollouts
 runtime
 runtimes
 s3
diff --git a/docs/environment-variables.md b/docs/environment-variables.md
@@ -36,7 +36,7 @@ This document outlines environment variables that can be used to customize behav
 | `HEALTHZ_AGE`                            | `time.Duration`     | `5m`                                                                                        | How old a un-reconciled workflow is to report unhealthy.                                                                                                                                                                                                                 |
 | `INDEX_WORKFLOW_SEMAPHORE_KEYS`          | `bool`              | `true`                                                                                      | Whether or not to index semaphores.                                                                                                                                                                                                                                      |
 | `LEADER_ELECTION_IDENTITY`               | `string`            | Controller's `metadata.name`                                                                | The ID used for workflow controllers to elect a leader.                                                                                                                                                                                                                  |
-| `LEADER_ELECTION_DISABLE`                | `bool`              | `false`                                                                                     | Whether leader election should be disabled.                                                                                                                                                                                                                              |
+| `LEADER_ELECTION_DISABLE`                | `bool`              | `false`                                                                                     | Whether leader election should be disabled. When set to `true`, also set the Deployment's rollout strategy to `Recreate` to prevent two controllers running concurrently during rollouts — see [High Availability](high-availability.md#deployment-rollout-strategy).    |
 | `LEADER_ELECTION_LEASE_DURATION`         | `time.Duration`     | `15s`                                                                                       | The duration that non-leader candidates will wait to force acquire leadership.                                                                                                                                                                                           |
 | `LEADER_ELECTION_RENEW_DEADLINE`         | `time.Duration`     | `10s`                                                                                       | The duration that the acting master will retry refreshing leadership before giving up.                                                                                                                                                                                   |
 | `LEADER_ELECTION_RETRY_PERIOD`           | `time.Duration`     | `5s`                                                                                        | The duration that the leader election clients should wait between tries of actions.                                                                                                                                                                                      |
diff --git a/docs/high-availability.md b/docs/high-availability.md
@@ -19,6 +19,23 @@ By disabling the leader election process, you can avoid unnecessary communicatio
 
 By using the `PriorityClass`, you can ensure that the Workflow Controller Pod is scheduled before other Pods in the cluster.
 
+### Deployment rollout strategy
+
+When leader election is disabled, the Deployment's rollout strategy must not surge a second Pod.
+The default `RollingUpdate` strategy with `maxSurge: 25%` rounds up to `maxSurge: 1` for a single-replica Deployment, so on every rollout (image bump, ConfigMap change, resource edit) the new Pod becomes Ready before the old Pod is terminated.
+Without a leader lease, both Pods reconcile the same Workflows during that window, which can duplicate Pod creations, clobber Workflow status updates, and cause their informer caches to diverge.
+
+Set the Deployment's `spec.strategy` to `Recreate` so the old Pod is terminated before the new Pod starts:
+
+```yaml
+spec:
+  strategy:
+    type: Recreate
+```
+
+This produces a few seconds of controller downtime during each rollout.
+Running Workflows keep executing; reconciliation resumes when the new Pod is Ready.
+
 ### Multiple Workflow Controller Replicas
 
 It is possible to run multiple replicas of the Workflow Controller to provide high-availability.
@@ -28,6 +45,8 @@ Only one replica of the Workflow Controller will actively manage Workflows at an
 The other replicas will be on standby, ready to take over if the active replica fails.
 This means that you are guaranteeing resource allocations for replicas that are not actively contributing to the running of Workflows.
 
+With leader election enabled, the default `RollingUpdate` Deployment strategy is safe: only the replica holding the lease reconciles Workflows, so a surging replica simply waits to acquire the lease when the previous leader steps down.
+
 The leader election process requires frequent communication with the Kubernetes API.
 When running Workflows at scale, the Kubernetes API may become unresponsive, causing the leader election to take longer than 10 seconds (`LEADER_ELECTION_RENEW_DEADLINE`) to respond, which will disrupt the controller.