Skip to content

Commit 58f0b4b

Browse files
authored
Merge pull request #127 from dusk-network/docs/operator-recovery-guides
Add operator rollback and slashing recovery guides
2 parents 00b2dec + 4502f46 commit 58f0b4b

7 files changed

Lines changed: 228 additions & 1 deletion

File tree

src/content/docs/learn/guides/staking-basics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ For reward sources and distribution details, see:
5555

5656
Dusk uses **soft slashing**: stake is not burned, but repeated faults or long downtime can suspend rewards and reduce effective stake.
5757

58-
See: [Slashing](/learn/tokenomics#slashing).
58+
See: [Slashing](/learn/tokenomics#slashing). If you run a provisioner, also read [Slashing prevention and recovery](/operator/guides/slashing-recovery).
5959

6060
## Adding to an existing stake
6161

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
title: Roll back a node update
3+
description: Learn how to reinstall a previous Dusk node installer release when an update needs to be rolled back.
4+
---
5+
6+
Rolling back means reinstalling a previous `node-installer` release. This can be useful if a newly installed node release has a problem and the network has not yet activated a protocol change that requires it.
7+
8+
:::caution
9+
Do not roll back across a network upgrade or activation unless the Dusk team explicitly instructs operators to do so. If the chain already requires the newer node version, an older node may stop following the network.
10+
:::
11+
12+
## Before you roll back
13+
14+
Check the version currently installed:
15+
16+
```sh
17+
ruskquery version
18+
```
19+
20+
Check whether the node is syncing:
21+
22+
```sh
23+
ruskquery block-height
24+
tail -n 50 /var/log/rusk.log
25+
```
26+
27+
If the node is only stuck or behind, try [fast-sync](/operator/guides/fast-sync) or [manual resync](/operator/guides/manual-resync) before rolling back.
28+
29+
## Roll back mainnet
30+
31+
Replace `vX.Y.Z` with the installer release you want to roll back to.
32+
33+
For example, to roll back to installer release `v1.2.3`, set:
34+
35+
```sh
36+
INSTALLER_VERSION="v1.2.3"
37+
```sh
38+
INSTALLER_VERSION="vX.Y.Z"
39+
40+
sudo service rusk stop
41+
42+
curl --proto '=https' --tlsv1.2 -sSfL \
43+
"https://github.com/dusk-network/node-installer/releases/download/${INSTALLER_VERSION}/node-installer.sh" \
44+
| sudo bash
45+
46+
sudo service rusk start
47+
```
48+
49+
## Roll back testnet
50+
51+
Use the same pinned installer release, but pass the testnet flag.
52+
53+
```sh
54+
INSTALLER_VERSION="vX.Y.Z"
55+
56+
sudo service rusk stop
57+
58+
curl --proto '=https' --tlsv1.2 -sSfL \
59+
"https://github.com/dusk-network/node-installer/releases/download/${INSTALLER_VERSION}/node-installer.sh" \
60+
| sudo bash -s -- --network testnet
61+
62+
sudo service rusk start
63+
```
64+
65+
## Verify the rollback
66+
67+
Confirm the installed version:
68+
69+
```sh
70+
ruskquery version
71+
```
72+
73+
Check the service:
74+
75+
```sh
76+
service rusk status
77+
```
78+
79+
Check whether the node is progressing:
80+
81+
```sh
82+
ruskquery block-height
83+
tail -F /var/log/rusk.log
84+
```
85+
86+
If block height does not progress, compare your height with the explorer and consider [fast-syncing the node](/operator/guides/fast-sync).
87+
88+
## Notes
89+
90+
- Use a pinned installer release URL. Do not use `latest` for rollback.
91+
- Rollback changes the installed node software and service configuration. It does not automatically restore an older chain state.
92+
- Archive nodes may have database migrations that are not safe to downgrade. If an archive node fails after rollback, update back to the supported version or resync the archive state.
93+
- If you operate a provisioner, monitor the node after rollback. Downtime or running an incompatible version can affect consensus participation. See [Slashing prevention and recovery](/operator/guides/slashing-recovery).
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: Slashing prevention and recovery
3+
description: Learn how Dusk soft slashing affects provisioners and what to check after a slashing event.
4+
---
5+
6+
Dusk uses **soft slashing** to discourage repeated faults and long downtime. Stake is not burned, but a provisioner can lose eligibility or effective participation, which reduces rewards.
7+
8+
Soft slashing can happen when a provisioner repeatedly fails to participate correctly in consensus. Common operational causes include:
9+
10+
- Running an outdated or incompatible node version.
11+
- Being offline for too long.
12+
- Falling behind the network tip.
13+
- Network or firewall problems that prevent consensus messages from being sent or received.
14+
- Consensus key or node configuration issues.
15+
16+
## If your provisioner was slashed
17+
18+
Start by restoring healthy node operation. Slashing is a symptom; the first priority is to make sure the node is on the right chain, on the right version, and progressing.
19+
20+
### 1. Check the installed version
21+
22+
```sh
23+
ruskquery version
24+
```
25+
26+
If the network has a required release, upgrade to it:
27+
28+
```sh
29+
curl --proto '=https' --tlsv1.2 -sSfL https://github.com/dusk-network/node-installer/releases/latest/download/node-installer.sh | sudo bash
30+
sudo service rusk start
31+
```
32+
33+
For testnet:
34+
35+
```sh
36+
curl --proto '=https' --tlsv1.2 -sSfL https://github.com/dusk-network/node-installer/releases/latest/download/node-installer.sh | sudo bash -s -- --network testnet
37+
sudo service rusk start
38+
```
39+
40+
If a newly installed release is known to be problematic and the network can still run the previous version, follow [Rollback a node update](/operator/guides/rollback-node-update).
41+
42+
### 2. Check sync status
43+
44+
```sh
45+
ruskquery block-height
46+
```
47+
48+
Compare the height with the explorer. If the node is stuck or far behind, use [fast-sync](/operator/guides/fast-sync):
49+
50+
```sh
51+
download_state
52+
sudo service rusk start
53+
```
54+
55+
### 3. Check service status and logs
56+
57+
```sh
58+
service rusk status
59+
tail -n 100 /var/log/rusk.log
60+
```
61+
62+
Look for errors related to:
63+
64+
- wrong network or chain mismatch
65+
- consensus key loading
66+
- peer discovery
67+
- Kadcast address or UDP connectivity
68+
- state/database errors
69+
70+
### 4. Check staking status
71+
72+
```sh
73+
rusk-wallet stake-info
74+
```
75+
76+
Confirm that the expected stake is still present and inspect the reported stake state. If the wallet cannot connect, fix node connectivity first.
77+
78+
### 5. Monitor after recovery
79+
80+
After the node is updated, synced, and running, keep the logs open and check the block height repeatedly:
81+
82+
```sh
83+
tail -F /var/log/rusk.log
84+
ruskquery block-height
85+
```
86+
87+
The height should continue to progress. If the node falls behind again, treat it as an unresolved infrastructure or networking issue.
88+
89+
### 6. Unstake and restake if the node is healthy
90+
91+
If the node is on the right version, fully synced, and operating normally, unstake and restake to restore normal provisioner participation.
92+
93+
Slashing can be caused by temporary operational issues, such as cloud provider downtime, network disruption, or downtime during a live node update. If the underlying issue is gone, restaking is the recovery step.
94+
95+
```sh
96+
rusk-wallet unstake
97+
rusk-wallet stake --amt <amount>
98+
```
99+
100+
Replace `<amount>` with the amount you want to stake. The new stake must mature before it starts participating again.
101+
102+
## Prevention checklist
103+
104+
- Keep the node updated during announced network upgrades.
105+
- Monitor `ruskquery block-height` against the explorer.
106+
- Alert on service downtime and repeated restart loops.
107+
- Keep UDP `9000` reachable for Kadcast.
108+
- Keep consensus keys backed up and readable by the Rusk service.
109+
- Avoid running experimental or mismatched binaries on a staked provisioner.
110+
- Use [fast-sync](/operator/guides/fast-sync) when a node falls behind instead of waiting for a long resync from genesis.
111+
112+
## Related guides
113+
114+
- [Upgrade a node](/operator/guides/upgrade-node)
115+
- [Roll back a node update](/operator/guides/rollback-node-update)
116+
- [Fast-sync a node](/operator/guides/fast-sync)
117+
- [Troubleshooting](/operator/troubleshooting)

src/content/docs/operator/guides/upgrade-node.mdx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,14 @@ Unable to figure it out yourself? Visit our [Node Runner Troubleshooting](https:
6565

6666
If everything else fails, check out the [manual resync](/operator/guides/manual-resync) instructions.
6767

68+
## Roll back an update
69+
70+
If a newly installed release causes problems and the network has not activated changes that require it, you can reinstall a previous `node-installer` release.
71+
72+
Use a pinned installer release instead of `latest`, and only roll back across a network upgrade if the Dusk team instructs operators to do so.
73+
74+
See: [Roll back a node update](/operator/guides/rollback-node-update).
75+
6876
## Nocturne Reset
6977

7078
:::note[Info]

src/content/docs/operator/maintenance-monitoring.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ The recommended setup for network participants looking to stake and use the netw
1818

1919
Effective monitoring and alerting systems are crucial to avoid slashing events. There are several tools available for real-time monitoring and alerting, which are particularly important for provisioners participating in consensus. Implementing these systems helps ensure continuous performance and timely responses to potential issues.
2020

21+
At minimum, monitor whether the Rusk service is running, whether `ruskquery block-height` is progressing, and whether your node is close to the network tip. If your provisioner was slashed or is at risk of being slashed, follow [Slashing prevention and recovery](/operator/guides/slashing-recovery).
22+
2123
## Keys Management
2224

2325
Proper management of your cryptographic keys is essential to ensure the security of your node.

src/content/docs/operator/troubleshooting.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ This means your node is receiving consensus messages for rounds much higher than
1616

1717
Check your block height (`ruskquery block-height`) against the explorer. If you are stuck, consider fast-syncing (`download_state`) or a manual resync.
1818

19+
#### My provisioner was slashed
20+
First, restore healthy node operation: check the installed version, sync height, service status, logs, and staking status. Then monitor the node to make sure it keeps progressing.
21+
22+
Follow: [Slashing prevention and recovery](/operator/guides/slashing-recovery).
23+
1924
#### Unable to resolve domain: invalid socket address
2025
Such errors usually indicate DNS problems. Check your DNS settings.
2126

src/sidebars/siteSidebar.js

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,8 @@ const siteSidebar = [
106106
{ label: "Fast-Sync a Node", link: "/operator/guides/fast-sync" },
107107
{ label: "Manually Re-Sync a Node", link: "/operator/guides/manual-resync" },
108108
{ label: "Upgrade a Node", link: "/operator/guides/upgrade-node" },
109+
{ label: "Roll Back a Node Update", link: "/operator/guides/rollback-node-update" },
110+
{ label: "Slashing Recovery", link: "/operator/guides/slashing-recovery" },
109111
{ label: "Choose a Network", link: "/operator/networks" },
110112
{ label: "Maintenance & Monitoring", link: "/operator/maintenance-monitoring" },
111113
{ label: "FAQ", link: "/operator/faq" },

0 commit comments

Comments
 (0)