|
2 | 2 |
|
3 | 3 | To restart a cluster node, shut down MySQL and restart the service. The node leaves the cluster, reducing the total vote count for quorum. |
4 | 4 |
|
5 | | -The quorum refers to the minimum number of votes required for the cluster to operate effectively and make decisions. Each node in the cluster typically represents one vote. When a node leaves the cluster, the total number of votes decreases, affecting the cluster's ability to achieve quorum. If the cluster does not maintain quorum, it may become unable to process transactions or make changes, potentially leading to a split-brain scenario where different parts of the cluster operate independently. |
| 5 | +The quorum refers to the minimum number of votes required for the cluster to operate effectively and make decisions. Each node in the cluster typically represents one vote. When a node leaves the cluster, the total number of votes decreases, affecting the cluster's ability to achieve quorum. If the cluster does not maintain quorum, the cluster may become unable to process transactions or make changes, potentially leading to a split-brain scenario where different parts of the cluster operate independently. |
6 | 6 |
|
7 | | -Upon rejoining, the node synchronizes using IST (Incremental State Transfer). IST allows the node to catch up with the current state of the cluster by transferring only the changes that occurred while the node was offline. If the necessary changes for IST do not exist in the `gcache` file on any other node within the cluster, the process will perform SST (State Snapshot Transfer) instead. SST involves transferring a complete database snapshot to the node, which can be more time-consuming but ensures that the node receives all data. This approach makes restarting cluster nodes for rolling configuration changes, or software upgrades straightforward from the cluster’s perspective. |
| 7 | +Upon rejoining, the node synchronizes using IST (Incremental State Transfer). IST allows the node to catch up with the current state of the cluster by transferring only the changes that occurred while the node was offline. If the necessary changes for IST do not exist in the `gcache` file on any other node within the cluster, the process will perform SST (State Snapshot Transfer) instead. SST involves transferring a complete database snapshot to the node, which can be more time-consuming but ensures that the node receives all data. The approach makes restarting cluster nodes for rolling configuration changes, or software upgrades straightforward from the cluster’s perspective. |
8 | 8 |
|
9 | 9 | If a node restarts with an invalid configuration change that prevents MySQL from loading, Galera drops the node’s state and forces an SST for that node. |
10 | 10 |
|
11 | | -In the event of a MySQL failure, the system does not remove the PID file because the system deletes this file only during a clean shutdown. As a result, the server does not restart if an existing PID file is present. When MySQL encounters a failure, check the log records for details. You must remove the PID file manually. |
| 11 | +In the event of a MySQL failure, the system does not remove the PID file because the system deletes the PID file only during a clean shutdown. As a result, the server does not restart if an existing PID file is present. When MySQL encounters a failure, check the log records for details. You must remove the PID file manually. |
12 | 12 |
|
13 | | -Use the `rm` command in a Unix/Linux shell to do this: |
| 13 | +Use the `rm` command in a Unix/Linux shell to remove the PID file: |
14 | 14 |
|
15 | 15 | ```shell |
16 | 16 | bash rm /path/to/mysql.pid |
17 | 17 | ``` |
18 | 18 |
|
19 | | -Replace `/path/to/mysql.pid` with the actual path to your MySQL PID file. The default location for the PID file is often `/var/run/mysqld/mysqld.pid` or `/var/lib/mysql/mysql.pid`, but this can vary based on your configuration. Before executing this command, ensure that MySQL is not running, as removing the PID file while the server is active can lead to issues. |
| 19 | +Replace `/path/to/mysql.pid` with the actual path to your MySQL PID file. The default location for the PID file is often `/var/run/mysqld/mysqld.pid` or `/var/lib/mysql/mysql.pid`, but the default location can vary based on your configuration. Before executing the command, ensure that MySQL is not running, as removing the PID file while the server is active can lead to issues. |
| 20 | + |
| 21 | +## Troubleshooting: node refuses to join after restart |
| 22 | + |
| 23 | +When a node restarts and refuses to join, the refusal is usually because the [State Snapshot Transfer](state-snapshot-transfer.md) (SST)—the process where the donor node sends data to the joiner—is failing silently or getting blocked by environmental security features. Check firewall rules, [SELinux](selinux.md), and [AppArmor](apparmor.md) so that the SST port (default 4444) and related processes are allowed between nodes. |
| 24 | + |
| 25 | +### Troubleshooting cheat sheet: grep the error log |
| 26 | + |
| 27 | +Use these patterns on the server error log to quickly find the cause instead of scrolling. Replace `/var/log/mysql/error.log` with your log path if it differs (for example, `/var/log/mysqld.log` on RHEL, or `log_error` in your config). |
| 28 | + |
| 29 | +| Pattern | What it points to | |
| 30 | +|---------|-------------------| |
| 31 | +| `grep "conflict" /var/log/mysql/error.log` | Certification or replication conflicts; possible split-brain or divergent data. | |
| 32 | +| `grep "SST script failed" /var/log/mysql/error.log` | State Snapshot Transfer failed (script, permissions, or donor/joiner communication). | |
| 33 | +| `grep -i "evicted" /var/log/mysql/error.log` | Node was evicted from the cluster (e.g. timeout or cluster decision). | |
| 34 | + |
| 35 | +### Security context (AppArmor, SELinux) |
| 36 | + |
| 37 | +Percona XtraDB Cluster runs inside an OS-level security context: AppArmor on Debian and Ubuntu, SELinux on RHEL and derivatives. If that context blocks the [SST](state-snapshot-transfer.md) method or the paths the SST script uses, the node cannot join and the failure is often silent. On Ubuntu 22.04, a node that refuses to join after restart is very often (in practice, the majority of cases) caused by AppArmor blocking the [`wsrep_sst_method`](wsrep-system-index.md#wsrep_sst_method) script or the binaries and paths the script needs—for example the xtrabackup binary or the data directory. |
| 38 | + |
| 39 | +AppArmor (Ubuntu / Debian): The profiles that matter are the one for the server binary and the one for the SST script. To confirm that AppArmor is blocking SST, put both profiles in complain mode and restart MySQL on the joiner. If the node joins, the cause is AppArmor; then fix the profiles and put them back in enforce mode instead of leaving them in complain mode. |
| 40 | + |
| 41 | +Run as root or with `sudo`: |
| 42 | + |
| 43 | +```shell |
| 44 | +aa-complain /usr/sbin/mysqld |
| 45 | +aa-complain /usr/bin/wsrep_sst_xtrabackup-v2 |
| 46 | +systemctl restart mysql |
| 47 | +``` |
| 48 | + |
| 49 | +If the node joins after the above, reload the profiles with the correct permissions (see [Enable AppArmor](apparmor.md) and [modifying the mysqld profile :octicons-link-external-16:](https://www.percona.com/doc/percona-server/8.0/apparmor.html#modify-mysqld)), then put the profiles back in enforce mode: |
| 50 | + |
| 51 | +```shell |
| 52 | +aa-enforce /usr/sbin/mysqld |
| 53 | +aa-enforce /usr/bin/wsrep_sst_xtrabackup-v2 |
| 54 | +``` |
| 55 | + |
| 56 | +SELinux (RHEL / Rocky / Alma): Ensure the SST script and ports are allowed; see [SELinux](selinux.md). |
| 57 | + |
| 58 | +### Diagnose whether the joiner will use IST or SST |
| 59 | + |
| 60 | +When the sequence-number gap between the cluster and the joiner is larger than what the donor’s [gcache](wsrep-provider-index.md#gcachesize) holds, the joiner always attempts a full [SST](state-snapshot-transfer.md). Full SST can severely impact cluster performance because the donor must stream a full copy of the data. Use the following steps to determine whether the joiner will get [IST](glossary.md#ist) or SST before you start the node. |
| 61 | + |
| 62 | +1. On a donor node (any node that is already in `Synced` state), run: |
| 63 | + |
| 64 | + ```sql |
| 65 | + SHOW STATUS LIKE 'wsrep_last_committed'; |
| 66 | + SHOW STATUS LIKE 'wsrep_local_cached_downto'; |
| 67 | + ``` |
| 68 | + Note the values: `wsrep_last_committed` is the cluster’s latest sequence number; [`wsrep_local_cached_downto`](wsrep-status-index.md#wsrep_local_cached_downto) is the lowest sequence number still in that donor’s gcache. |
| 69 | + |
| 70 | +2. On the joiner (with the node stopped), read the joiner’s last position: |
| 71 | + |
| 72 | + ```shell |
| 73 | + cat /var/lib/mysql/grastate.dat |
| 74 | + ``` |
| 75 | + Find the `seqno` line. If `seqno` is `-1` or `0`, or the file is missing, the joiner will need a full SST when started. |
| 76 | + |
| 77 | +3. Compare values. Compute the gap: *donor `wsrep_last_committed`* minus *joiner `seqno`*. If the joiner’s `seqno` is less than the donor’s `wsrep_local_cached_downto`, no donor has the required range in gcache and the joiner will perform a full SST when started. A large gap (donor far ahead of joiner) usually means the same. |
| 78 | + |
| 79 | +4. If SST is inevitable, reduce impact before starting the joiner: |
| 80 | + |
| 81 | + * Prefer [Clone SST](clone-sst.md): set `wsrep_sst_method=clone` on the joiner (and meet the [Clone SST prerequisites](clone-sst.md#prerequisites)) so that when the node starts, the cluster uses Clone instead of the default `xtrabackup-v2`. Clone SST is often faster and less taxing on the donor. |
| 82 | + |
| 83 | + * Or schedule starting the joiner during a maintenance window when donor load is acceptable. |
| 84 | + |
| 85 | +If the node still does not join after you start the node, check the server `error.log` for SST-related errors and work through [Security context (AppArmor, SELinux)](#security-context-apparmor-selinux) above; on Ubuntu 22.04, security profiles blocking the SST method are the most common cause. |
| 86 | + |
| 87 | +### Alternative: Clone plugin for SST |
| 88 | + |
| 89 | +Percona XtraDB Cluster 8.4 supports [SST using the Clone plugin](clone-sst.md), which can be more stable than the xtrabackup-based flow. With Clone SST, the joiner wipes its own data directory and receives a full copy from the donor. To use the Clone plugin, set `wsrep_sst_method=clone` (and meet the [Clone SST prerequisites](clone-sst.md#prerequisites)) on all nodes. |
| 90 | + |
| 91 | +### Other common causes when a node cannot join |
| 92 | + |
| 93 | +* `grastate.dat` and bootstrap order: After a crash or unclean shutdown, `safe_to_bootstrap` in `grastate.dat` is set to `0`. If you start nodes in the wrong order, the cluster may not form quorum. Compare `seqno` and bootstrap from the most advanced node; see [Crash recovery](crash-recovery.md) and [Bootstrap the first node](bootstrap.md). |
| 94 | + |
| 95 | +* Network and bind address: Set [`wsrep_node_address`](wsrep-system-index.md#wsrep_node_address) explicitly to the node’s IP address that other nodes can reach. If the node binds to `127.0.0.1` or an interface that others cannot use, the Galera handshake fails and the node will not join. |
| 96 | + |
| 97 | +### Emergency override: last node up but refusing traffic (non-primary) |
| 98 | + |
| 99 | +When two of three nodes are down (or more in a larger cluster), the remaining node loses quorum and switches to non-primary state. The node stays up and accepts connections, but MySQL refuses to run data-changing statements and may refuse reads; applications often see errors such as `WSREP has not yet prepared node for application use`. This is the situation that the emergency override fixes: the single remaining node can be forced to form a primary component and serve traffic again so that the cluster can recover. |
| 100 | + |
| 101 | +Run the following on the node that is still up (connected as a user with sufficient privileges): |
| 102 | + |
| 103 | +```sql |
| 104 | +SET GLOBAL wsrep_provider_options='pc.bootstrap=YES'; |
| 105 | +``` |
| 106 | + |
| 107 | +The [`pc.bootstrap`](wsrep-provider-index.md#pcbootstrap) option tells that node to form a new primary component. After the command runs, the node accepts writes and the cluster is effectively that one node until the other nodes are started and rejoin. Then start the other nodes so they join this primary (via IST or SST as usual). |
| 108 | + |
| 109 | +!!! warning "Only when the other nodes are down" |
| 110 | + |
| 111 | + Run this override only when you have confirmed that the other nodes are actually down or unreachable. If another node is still primary elsewhere (for example, in another datacenter after a split), setting `pc.bootstrap=YES` on a second node creates two separate clusters with diverging data (split-brain). See [Scenario 5: Two nodes disappear from the cluster](crash-recovery.md#scenario-5-two-nodes-disappear-from-the-cluster) and [Scenario 7: Split brain](crash-recovery.md#scenario-7-the-cluster-loses-its-primary-state-due-to-split-brain) in Crash recovery. |
| 112 | + |
| 113 | +For more support options, see [Get help from Percona](get-help.md). |
0 commit comments