@@ -103,6 +103,49 @@ expected outage.
103103Enabling a new configuration option to delay failover provides a mechanism to
104104prevent premature failover for short-lived network or node instability.
105105
106+ ## Detection of node-level failures
107+
108+ When the node hosting the primary becomes unreachable (for example, due to a
109+ kubelet crash or a network partition between the node and the Kubernetes API
110+ server), the operator relies on the pod's ` Ready ` condition to decide that the
111+ primary is no longer serviceable. While the node is healthy the kubelet keeps
112+ that condition up to date from the readiness probe; once the node stops
113+ reporting, the Kubernetes node lifecycle controller is the one that flips the
114+ condition to ` False ` as soon as it declares the node ` Unknown ` .
115+
116+ With stock kube-controller-manager settings, the transition is governed by
117+ ` --node-monitor-grace-period ` (default ` 40s ` on Kubernetes 1.29-1.31, raised
118+ to ` 50s ` in 1.32 and later): after that window the controller marks the node
119+ ` Unknown ` and, in the same monitoring pass, issues a patch per pod on that
120+ node to flip the ` Ready ` condition. In practice the operator observes the
121+ primary as unready about ** 40 to 55 seconds** after the node becomes
122+ unreachable (the grace period plus up to one ` --node-monitor-period ` poll,
123+ default ` 5s ` ). Managed Kubernetes distributions (GKE, EKS, AKS) may tune
124+ these values; consult the provider's documentation if the observed timing
125+ does not match. After that, the failover procedure starts (further gated by
126+ ` .spec.failoverDelay ` ).
127+
128+ The ` Ready ` condition flip is not subject to the rate limiters that throttle
129+ pod * eviction* during partial-zonal or large-cluster disruptions
130+ (` --node-eviction-rate ` , ` --secondary-node-eviction-rate ` ,
131+ ` --unhealthy-zone-threshold ` ). The operator reacts to the condition flip as
132+ soon as the controller emits the patch, regardless of the zone or cluster-wide
133+ health state.
134+
135+ Pod * eviction* (actual deletion from the unreachable node) is a separate
136+ mechanism, driven by ` tolerationSeconds ` on the
137+ ` node.kubernetes.io/unreachable ` ` NoExecute ` taint (` 300s ` by default). That
138+ timer does not hold up the operator's failover decision; CloudNativePG
139+ promotes a new primary as soon as the ` Ready ` condition flips. By that point
140+ the kubelet on the isolated node has already stopped the old PostgreSQL
141+ container locally: with the default
142+ ` .spec.probes.liveness.isolationCheck.enabled: true ` , the instance manager
143+ fails its own liveness probe once it can reach neither the API server nor
144+ the rest of the cluster, and the kubelet kills the container within
145+ approximately three probe periods (` ~30s ` ). Full high availability
146+ (recreation of the old primary on a healthy node by the operator) is still
147+ gated on the taint-based eviction actually deleting the pod.
148+
106149## Failover Quorum (Quorum-based Failover)
107150
108151Failover quorum is a mechanism that enhances data durability and safety during
0 commit comments