Skip to content

Commit bc24466

Browse files
committed
PXC-5113 [DOCS] - add troubleshooting steps for a stalled SST 8.4
modified: docs/restarting-nodes.md
1 parent a94d125 commit bc24466

14 files changed

Lines changed: 273 additions & 38 deletions

docs/_static/crash-scenario-1.jpeg

185 KB
Loading

docs/_static/scenario-1.png

4.4 MB
Loading

docs/_static/scenario-2.png

5.27 MB
Loading

docs/_static/scenario-3.png

5.97 MB
Loading

docs/_static/scenario-4.png

5.48 MB
Loading

docs/_static/scenario-5.png

5.52 MB
Loading

docs/_static/scenario-6.png

5.49 MB
Loading

docs/_static/scenario-7.png

6.95 MB
Loading

docs/crash-recovery.md

Lines changed: 60 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ However, there are scenarios where the database service can stop with no node be
66

77
## Scenario 1: Node A is gracefully stopped
88

9+
![Scenario 1: Node A gracefully stopped in a three-node cluster](_static/scenario-1.png)
10+
911
In a three node cluster (node A, Node B, node C), one node (node A, for example) is gracefully stopped: for the purpose of maintenance, configuration change, etc.
1012

1113
In this case, the other nodes receive a “good bye” message from the stopped node and the cluster size is reduced; some properties like quorum calculation or auto increment are automatically changed. As soon as node A is started again, it joins the cluster based on its [`wsrep_cluster_address`](wsrep-system-index.md#wsrep_cluster_address) variable in `my.cnf`.
@@ -15,6 +17,8 @@ If the writeset cache ([`gcache.size`](wsrep-provider-index.md#gcachesize)) on n
1517

1618
## Scenario 2: Two nodes are gracefully stopped
1719

20+
![Scenario 2: Two nodes gracefully stopped; one node remains in the cluster](_static/scenario-2.png)
21+
1822
Similar to [Scenario 1: Node A is gracefully stopped](#scenario-1-node-a-is-gracefully-stopped), the cluster size is reduced to one — even the single remaining node C forms the primary component and is able to serve client requests. To get the nodes back into the cluster, you just need to start them.
1923

2024
However, when a new node joins the cluster, node C will be switched to the “Donor/Desynced” state as it has to provide the state transfer at least to the first joining node. It is still possible to read/write to it during that process, but it may be much slower, which depends on how large amount of data should be sent during the state transfer. Also, some load balancers may consider the donor node as not operational and remove it from the pool. So, it is best to avoid the situation when only one node is up.
@@ -31,6 +35,8 @@ systemctl start mysql
3135

3236
## Scenario 3: All three nodes are gracefully stopped
3337

38+
![Scenario 3: All three cluster nodes gracefully stopped](_static/scenario-3.png)
39+
3440
The cluster is completely stopped and the problem is to initialize it again. It is important that a PXC node writes its last executed position to the `grastate.dat` file.
3541

3642
By comparing the seqno number in this file, you can see which is the most advanced node (most likely the last stopped). The cluster must be bootstrapped using this node, otherwise nodes that had a more advanced position will have to perform the full [SST](glossary.md#sst) to join the cluster initialized from the less advanced one. As a result, some transactions will be lost). To bootstrap the first node, invoke the startup script like this:
@@ -47,16 +53,22 @@ systemctl start mysql@bootstrap.service
4753

4854
## Scenario 4: One node disappears from the cluster
4955

50-
This is the case when one node becomes unavailable due to power outage, hardware failure, kernel panic, mysqld crash, **kill -9** on mysqld pid, etc.
56+
![Scenario 4: One node disappears from the cluster](_static/scenario-4.png)
57+
58+
This is the case when one node becomes unavailable due to power outage, hardware failure, kernel panic, mysqld crash, kill -9 on mysqld pid, etc.
5159

5260
Two remaining nodes notice the connection to node A is down and start trying to re-connect to it. After several timeouts, node A is removed from the cluster. The quorum is saved (two out of three nodes are up), so no service disruption happens. After it is restarted, node A joins automatically (as described in [Scenario 1: Node A is gracefully stopped](#scenario-1-node-a-is-gracefully-stopped)).
5361

5462
## Scenario 5: Two nodes disappear from the cluster
5563

56-
Two nodes are not available and the remaining node (node C) is not able to form the quorum alone. The cluster has to switch to a non-primary mode, where MySQL refuses to serve any SQL queries. In this state, the **mysqld** process on node C is still running and can be connected to but any statement related to data fails with an error.
64+
![Scenario 5: Two nodes disappear from the cluster](_static/scenario-5.png)
65+
66+
Two nodes are not available and the remaining node (node C) is not able to form the quorum alone. The cluster has to switch to a non-primary mode. While node C is still deciding whether it can reach the other nodes, reads may still work and new writes are usually refused. Once node C gives up and the component is non-primary, `wsrep_ready` is `OFF` and normal client queries—including trivial selects—fail.
67+
68+
For example:
5769

5870
```sql
59-
SELECT * FROM test.sbtest1;
71+
SELECT 1 FROM DUAL;
6072
```
6173

6274
??? example "The error message"
@@ -65,7 +77,7 @@ SELECT * FROM test.sbtest1;
6577
ERROR 1047 (08S01): WSREP has not yet prepared node for application use
6678
```
6779

68-
Reads are possible until node C decides that it cannot access node A and node B. New writes are forbidden.
80+
The SQLSTATE is `08S01`; some builds or code paths may show `ERROR 1047 (08S01): Unknown Command` instead of the longer WSREP text—both are the same class of failure when the node is not prepared for application use.
6981

7082
As soon as the other nodes become available, the cluster is formed again automatically. If node B and node C were just network-severed from node A, but they can still reach each other, they will keep functioning as they still form the quorum.
7183

@@ -83,6 +95,8 @@ This approach only works if the other nodes are down before doing that! Otherwis
8395

8496
## Scenario 6: All nodes went down without a proper shutdown procedure
8597

98+
![Scenario 6: All nodes went down without a proper shutdown](_static/scenario-6.png)
99+
86100
This scenario is possible in the following cases:
87101

88102
* Data center power failure
@@ -109,37 +123,58 @@ cat /var/lib/mysql/grastate.dat
109123
safe_to_bootstrap: 0
110124
```
111125

112-
In this case, you cannot be sure that all nodes are consistent with each other. We cannot use `safe_to_bootstrap` variable to determine the node that has the last transaction committed as this variable is set to **0** for each node.
126+
In this case, you cannot be sure that all nodes are consistent with each other. The `safe_to_bootstrap` variable is set to 0 on every node and cannot be used to identify which node has the last transaction committed.
113127

114-
An attempt to bootstrap from such a node will fail unless you start `mysqld` with the `--wsrep-recover` option:
128+
!!! warning "Risk of split-brain"
129+
130+
Setting `safe_to_bootstrap: 1` on a node without first confirming that node has the highest recovered position can cause split-brain and data loss. Always run the validation step below on every node and bootstrap only from the node with the highest seqno.
131+
132+
### Validation step: recover and record position on every node
133+
134+
On each node that was part of the cluster, run `mysqld` with the `--wsrep-recover` option so that the server prints the recovered position and exits (the server does not stay running):
115135

116136
```shell
117137
mysqld --wsrep-recover
118138
```
119139

120-
Search the output for the line that reports the recovered position after the node UUID (**1122** in this case):
140+
In the output, find the line that reports the recovered position in the form `UUID:seqno`:
121141

122-
??? example "Expected output"
142+
??? example "Example output"
123143

124144
```{.text .no-copy}
125145
...
126146
... [Note] WSREP: Recovered position: 220dcdcb-1629-11e4-add3-aec059ad3734:1122
127147
...
128148
```
129149

130-
The node where the recovered position is marked by the greatest number is the best bootstrap candidate. In its `grastate.dat` file, set the safe_to_bootstrap variable to **1**. Then, bootstrap from this node:
150+
Run the command on every node and record the UUID and seqno from each. Use a table like the following so that you can compare and choose the correct bootstrap candidate:
151+
152+
| Node (hostname or label) | UUID | seqno |
153+
|--------------------------|------|-------|
154+
| node1 | | |
155+
| node2 | | |
156+
| node3 | | |
157+
158+
!!! warning "When highest seqno is not safe to use"
159+
160+
The procedure below assumes you have access to every node that was in the cluster and that the recovered positions are trustworthy. If either is false, bootstrapping from the node with the highest seqno can permanently destroy data.
161+
162+
* Access to all nodes: If a node is unreachable (for example, in another datacenter or still down), you cannot assume the highest seqno you see is the true cluster state. The missing node may have had a higher seqno. Bootstrap only after you have run `mysqld --wsrep-recover` on every member and recorded the result.
163+
164+
* Trustworthiness of the "highest" node: A node can report a higher seqno but have corrupt or incomplete data—for example, after a partition (it was in a minority and applied writes that were never committed cluster-wide), a write-ahead or disk failure (it reported a seqno that was not fully persisted), or an unclean shutdown. Bootstrapping from that node forces the rest of the cluster to sync to that state. The cluster will then permanently drop or overwrite the transactions that existed only on the other nodes. If you suspect the "highest" node was partitioned, had storage or write-ahead issues, or you cannot verify its history, do not bootstrap from it without expert guidance or a verified backup strategy. Prefer [Get help from Percona](get-help.md) or your support channel when in doubt.
165+
166+
If you have verified all nodes and trust the node with the greatest seqno, that node is the intended bootstrap candidate. If two nodes show the same UUID and seqno, either can be used.
167+
168+
### Bootstrap step: set safe_to_bootstrap and start the first node
169+
170+
Only on the node that has the highest seqno from the validation step (and only after the caveats above are satisfied), set `safe_to_bootstrap` to 1 in that node’s `grastate.dat` file, then bootstrap from that node:
131171

132172
```shell
173+
# On the chosen node only: edit grastate.dat and set safe_to_bootstrap: 1, then:
133174
systemctl start mysql@bootstrap.service
134175
```
135176

136-
After a shutdown, you can bootstrap from the node which is marked as safe in the `grastate.dat` file.
137-
138-
```{.text .no-copy}
139-
...
140-
safe_to_bootstrap: 1
141-
...
142-
```
177+
After a clean shutdown in the future, you can bootstrap from the node which is marked as safe in the `grastate.dat` file (where `safe_to_bootstrap: 1`).
143178

144179
In recent Galera versions, the option [`pc.recovery`](wsrep-provider-index.md#pcrecovery) (enabled by default) saves the cluster state into a file named `gvwstate.dat` on each member node. As the name of this option suggests (pc – primary component), it saves only a cluster being in the PRIMARY state. An example content of the file may look like this:
145180

@@ -168,23 +203,24 @@ The log file shows recovery completion:
168203

169204
## Scenario 7: The cluster loses its primary state due to split brain
170205

171-
For the purpose of this example, let’s assume we have a cluster that consists of an even number of nodes: six, for example. Three of them are in one location while the other three are in another location and they lose network connectivity. It is best practice to avoid such topology: if you cannot have an odd number of real nodes, you can use an additional arbitrator (garbd) node or set a higher pc.weight to some nodes. But when the split brain happens any way, none of the separated groups can maintain the quorum: all nodes must stop serving requests and both parts of the cluster will be continuously trying to re-connect.
206+
![Scenario 7: Split brain; cluster loses primary state](_static/scenario-7.png)
172207

173-
If you want to restore the service even before the network link is restored, you can make one of the groups primary again using the same command as described in [Scenario 5: Two nodes disappear from the cluster](#scenario-5-two-nodes-disappear-from-the-cluster)
208+
We have a six-node cluster. Three of them are in one location while the other three are in another location and they lose network connectivity.
209+
210+
Best practice is to avoid this topology. If you cannot run an odd number of data nodes, add an arbitrator (`garbd`) or increase `pc.weight` on selected nodes so one side can keep quorum. With an even number, if the split brain happens, neither location can maintain a quorum: both groups must stop serving requests and keep trying to reconnect.
211+
212+
To restore the service before the network link is restored, you can make one of the groups primary again using the same command as described in [Scenario 5: Two nodes disappear from the cluster](#scenario-5-two-nodes-disappear-from-the-cluster)
174213

175214
```sql
176215
SET GLOBAL wsrep_provider_options='pc.bootstrap=true';
177216
```
178217

179-
After this, you are able to work on the manually restored part of the cluster, and the other half should be able to automatically re-join using [IST](glossary.md#ist) as soon as the network link is restored.
218+
After this command, you can work on the manually restored part of the cluster, and the other half should be able to automatically re-join using [IST](glossary.md#ist) when the network link is restored.
180219

181220
!!! warning
182221

183-
If you set the bootstrap option on both the separated parts, you will end up with two living cluster instances, with data likely diverging away from each other. Restoring a network link in this case will not make them re-join until the nodes are restarted and members specified in configuration file are connected again.
184-
185-
Then, as the Galera replication model truly cares about data consistency: once the inconsistency is detected, nodes that cannot execute row change statement due to a data difference – an emergency shutdown will be performed and the only way to bring the nodes back to the cluster is via the full [SST](glossary.md#sst)
222+
If you set the bootstrap option on both of the separated parts, you will have two independent clusters and diverging data. Restoring the network link does not merge them automatically; you must restart nodes and ensure `wsrep_cluster_address` points at a single primary component before the cluster can reunite.
186223

187-
**Based on material from Percona Database Performance Blog**
224+
Galera enforces consistency: when nodes detect conflicting row data, affected nodes may perform an emergency shutdown. Bringing them back into a single cluster usually requires a full [SST](glossary.md#sst) (or another supported state transfer) so that every node shares the same dataset again.
188225

189-
This article is based on the blog post [Galera replication - how to recover a PXC cluster by *Przemysław Malkowski* :octicons-link-external-16:]: https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/
190226

docs/emergency-quorum-recovery.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Emergency quorum recovery (when nodes are up but traffic is blocked)
2+
3+
Nodes are running and accept connections, but MySQL refuses writes (and often reads) with errors such as `WSREP has not yet prepared node for application use`. The cluster has lost quorum or the *primary component*.
4+
5+
Why this happens: Each node has one vote; the cluster needs a majority to form a *primary component*. Only a primary accepts SQL. If one node leaves cleanly, votes decrease and the remaining nodes can form a primary—but if a node crashes (power loss, kill, kernel panic), the others still expect its vote until they time out. In a 3-node cluster, two nodes left after a crash often cannot form quorum (they expect three votes). Likewise, with two nodes left after a planned restart, a network flicker between those two can cause both to drop to non-primary; the cluster is then "online" but refuses every query.
6+
7+
Recovery options:
8+
9+
1. Restore connectivity so the remaining nodes can see each other and re-form a primary.
10+
2. Bring the missing node(s) back so the cluster can form quorum again.
11+
3. Emergency override (when you have confirmed the other nodes are really down): force one node to form a new primary so it can serve traffic; then start the other nodes so they rejoin.
12+
13+
## Emergency override: force a primary when traffic is blocked
14+
15+
Run the following on one node that is still up (connected as a user with sufficient privileges):
16+
17+
```sql
18+
SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';
19+
```
20+
21+
The [`pc.bootstrap`](wsrep-provider-index.md#pcbootstrap) option makes that node form a new primary component. After the command runs, the node accepts writes; the cluster is effectively that one node until the others are started and rejoin (via IST or SST). Then start the other nodes so they join this primary.
22+
23+
!!! warning "Only when the other nodes are down"
24+
25+
Run this override only when you have confirmed that the other nodes are actually down or unreachable. If another node is still primary elsewhere (for example, in another datacenter after a split), setting `pc.bootstrap=YES` on a second node creates two separate clusters with diverging data (split-brain). See [Scenario 5: Two nodes disappear from the cluster](crash-recovery.md#scenario-5-two-nodes-disappear-from-the-cluster) and [Scenario 7: Split brain](crash-recovery.md#scenario-7-the-cluster-loses-its-primary-state-due-to-split-brain) in Crash recovery.
26+
27+
For more support options, see [Get help from Percona](get-help.md).

0 commit comments

Comments
 (0)