Skip to content

Commit dc043b7

Browse files
committed
K8SPXC-1828: Make operator aware of last recovered seqno for auto recovery
For auto recovery from full cluster crash, operator selects the PXC pod with highest seqno (wsrep_last_applied) and rebootstraps the cluster from there. This logic is sound but there's a problem: operator immediately forgets the position (uuid:seqno) it used to recover the cluster. Imagine the scenario: Crash happens, pods report their positions: pod-0 -> uuid:100 pod-1 -> uuid:97 pod-2 -> uuid:102 Operator picks pod-2 to recover. Another crash happens, pods report their positions: pod-0 -> uuid:91 pod-1 -> uuid:88 pod-2 -> uuid:89 Operator picks pod-0 to recover. But the position actually regressed and doing the recovery from this regressed position will result in data loss. (Why would wsrep_last_applied regress is a question I don't know the answer of but we've seen it in highly unstable environments where operator needed to perform recovery repeatedly for prolonged periods.) With these changes, we are making the operator aware of last recovered position and adding a guardrail to auto recovery logic. Operator is going to store the last recovery information in `.status.recovery`: RecoveryStatus{ clusterUUID // Galera cluster UUID reported by the pod. lastRecoveryTime // the time when the operator triggered the most recent recovery. lastRecoveryPod // the pod the operator picked to bootstrap from (the one with the highest reported seqno). lastRecoverySeqNo // wsrep sequence number of the pod that was used to bootstrap. } This information will be used in subsequent recoveries to ensure the recovery position doesn't regress. If it does, operator will reject doing the recovery itself. In this case, a human needs to step in and manually do the recovery. Anti-regression guardrail depends on the fact that wsrep_last_applied (seqno) is monotonic with the same cluster UUID. Operator always recovers the cluster with same UUID, so the UUID stays the same in whole lifecycle of a PXC cluster on K8s. But users do something manually to change the cluster UUID. In this case they will need to update or clean up the last recovery info in PerconaXtraDBCluster object's status.
1 parent 98c52d2 commit dc043b7

11 files changed

Lines changed: 473 additions & 36 deletions

File tree

build/pxc-entrypoint.sh

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -700,6 +700,7 @@ if [ "$1" = 'mysqld' ] && [ -z "$wantHelp" ]; then
700700
| sed 's/^[ \t]*//'
701701
)"
702702
wsrep_start_position_opt="--wsrep_start_position=$start_pos"
703+
uuid=$(echo "$start_pos" | awk -F':' '{print $1}' || :)
703704
seqno=$(echo "$start_pos" | awk -F':' '{print $NF}' || :)
704705
else
705706
# The server prints "..skipping position recovery.." if started without wsrep.
@@ -755,6 +756,9 @@ if [ "$1" = 'mysqld' ] && [ -z "$wantHelp" ]; then
755756
|| [[ -z $is_primary_exists && -f $grastate_loc && $safe_to_bootstrap == 1 && -n ${CLUSTER_JOIN} ]]; then
756757
trap '{ node_recovery "$@" ; }' USR1
757758
touch /tmp/recovery-case
759+
if [[ -z ${uuid} ]]; then
760+
uuid="00000000-0000-0000-0000-000000000000"
761+
fi
758762
if [[ -z ${seqno} ]]; then
759763
seqno="-1"
760764
fi
@@ -765,12 +769,13 @@ if [ "$1" = 'mysqld' ] && [ -z "$wantHelp" ]; then
765769
echo "#####################################################FULL_PXC_CLUSTER_CRASH:$NODE_NAME#####################################################"
766770
echo 'You have the situation of a full PXC cluster crash. In order to restore your PXC cluster, please check the log'
767771
echo 'from all pods/nodes to find the node with the most recent data (the one with the highest sequence number (seqno).'
772+
echo "Cluster UUID: $uuid"
768773
echo "It is $NODE_NAME node with sequence number (seqno): $seqno"
769774
echo 'Cluster will recover automatically from the crash now.'
770775
echo 'If you have set spec.pxc.autoRecovery to false, run the following command to recover manually from this node:'
771776
echo "kubectl -n $POD_NAMESPACE exec $(hostname) -c pxc -- sh -c 'kill -s USR1 1'"
772777
#DO NOT CHANGE THE LINE BELOW. OUR AUTO-RECOVERY IS USING IT TO DETECT SEQNO OF CURRENT NODE. See K8SPXC-564
773-
echo "#####################################################LAST_LINE:$NODE_NAME:$seqno:#####################################################"
778+
echo "#####################################################LAST_LINE:$NODE_NAME:$uuid:$seqno:#####################################################"
774779

775780
for (( ; ; )); do
776781
is_primary_exists=$(get_primary)

config/crd/bases/pxc.percona.com_perconaxtradbclusters.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11033,6 +11033,19 @@ spec:
1103311033
ready:
1103411034
format: int32
1103511035
type: integer
11036+
recovery:
11037+
properties:
11038+
clusterUUID:
11039+
type: string
11040+
lastRecoveryPod:
11041+
type: string
11042+
lastRecoverySeqNo:
11043+
format: int64
11044+
type: integer
11045+
lastRecoveryTime:
11046+
format: date-time
11047+
type: string
11048+
type: object
1103611049
size:
1103711050
format: int32
1103811051
type: integer

deploy/bundle.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12363,6 +12363,19 @@ spec:
1236312363
ready:
1236412364
format: int32
1236512365
type: integer
12366+
recovery:
12367+
properties:
12368+
clusterUUID:
12369+
type: string
12370+
lastRecoveryPod:
12371+
type: string
12372+
lastRecoverySeqNo:
12373+
format: int64
12374+
type: integer
12375+
lastRecoveryTime:
12376+
format: date-time
12377+
type: string
12378+
type: object
1236612379
size:
1236712380
format: int32
1236812381
type: integer

deploy/crd.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12363,6 +12363,19 @@ spec:
1236312363
ready:
1236412364
format: int32
1236512365
type: integer
12366+
recovery:
12367+
properties:
12368+
clusterUUID:
12369+
type: string
12370+
lastRecoveryPod:
12371+
type: string
12372+
lastRecoverySeqNo:
12373+
format: int64
12374+
type: integer
12375+
lastRecoveryTime:
12376+
format: date-time
12377+
type: string
12378+
type: object
1236612379
size:
1236712380
format: int32
1236812381
type: integer

deploy/cw-bundle.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12363,6 +12363,19 @@ spec:
1236312363
ready:
1236412364
format: int32
1236512365
type: integer
12366+
recovery:
12367+
properties:
12368+
clusterUUID:
12369+
type: string
12370+
lastRecoveryPod:
12371+
type: string
12372+
lastRecoverySeqNo:
12373+
format: int64
12374+
type: integer
12375+
lastRecoveryTime:
12376+
format: date-time
12377+
type: string
12378+
type: object
1236612379
size:
1236712380
format: int32
1236812381
type: integer

e2e-tests/tls-issue-cert-manager/run

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,8 +149,6 @@ main() {
149149
kubectl_bin delete pods -l app.kubernetes.io/instance=$cluster,app.kubernetes.io/managed-by=percona-xtradb-cluster-operator --force --grace-period=0
150150

151151
desc 'wait for cluster to recover after full restart'
152-
wait_for_running "$cluster-haproxy" 1
153-
wait_for_running "$cluster-pxc" 3
154152
wait_cluster_consistency "$cluster" 3 2
155153

156154
desc 'check ssl-internal certificate using PXC after CA rotation'

pkg/apis/pxc/v1/pxc_types.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,7 @@ type PerconaXtraDBClusterStatus struct {
327327
Backup ComponentStatus `json:"backup,omitempty"`
328328
PMM ComponentStatus `json:"pmm,omitempty"`
329329
LogCollector ComponentStatus `json:"logcollector,omitempty"`
330+
Recovery *RecoveryStatus `json:"recovery,omitempty"`
330331
Host string `json:"host,omitempty"`
331332
Messages []string `json:"message,omitempty"`
332333
Status AppState `json:"state,omitempty"`
@@ -375,6 +376,29 @@ type AppStatus struct {
375376
Ready int32 `json:"ready,omitempty"`
376377
}
377378

379+
// RecoveryStatus records the outcome of the most recent full-cluster-crash
380+
// recovery. It is consulted on subsequent crashes to decide whether automatic
381+
// recovery is safe: a UUID change or seqno regression indicates the operator
382+
// would be bootstrapping from a node with stale or unrelated data, so manual
383+
// intervention is required.
384+
type RecoveryStatus struct {
385+
// ClusterUUID is the Galera cluster UUID reported by the pod the operator
386+
// recovered from. The all-zeros UUID means the pod's grastate.dat had no
387+
// recoverable UUID (uninitialized or reset). An empty value means the log
388+
// line did not include a UUID (PXC entrypoing <1.20.0).
389+
ClusterUUID string `json:"clusterUUID,omitempty"`
390+
// LastRecoveryTime is when the operator triggered the most recent
391+
// full-cluster-crash recovery.
392+
LastRecoveryTime metav1.Time `json:"lastRecoveryTime,omitempty"`
393+
// LastRecoveryPod is the pod the operator picked to bootstrap from
394+
// (the one with the highest reported seqno).
395+
LastRecoveryPod string `json:"lastRecoveryPod,omitempty"`
396+
// LastRecoverySeqNo is the wsrep sequence number of the pod that was
397+
// used to bootstrap. A subsequent recovery with a lower seqno is refused
398+
// automatically, since proceeding would discard committed transactions.
399+
LastRecoverySeqNo int64 `json:"lastRecoverySeqNo,omitempty"`
400+
}
401+
378402
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
379403

380404
// PerconaXtraDBCluster is the Schema for the perconaxtradbclusters API

pkg/apis/pxc/v1/zz_generated.deepcopy.go

Lines changed: 21 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)