Commit 2042a47
fix race condition between forget and failover (valkey-io#105)
## Summary
Fix a race condition between `forgetStaleNodes` and Valkey's
auto-failover that can permanently prevent a replica from being promoted
after its primary dies. See
valkey-io#103 for more
context.
### The bug
When a primary's deployment is deleted, the controller's
`forgetStaleNodes` issues `CLUSTER FORGET` for the dead node from every
surviving node. If this runs before Valkey's auto-failover election
completes, it removes the dead primary from the other masters' node
tables. Those masters can then no longer validate the replica's
`FAILOVER_AUTH_REQUEST` (they don't recognize the dead node), so they
never vote. The replica is permanently stuck as a slave,
`findShardPrimary` never finds a primary for the shard, and the cluster
enters an infinite loop of:
```
ERROR command failed: CLUSTER FORGET {"error": "Can't forget my master!"}
DEBUG skipping replica; primary not ready yet
DEBUG missing replicas, requeue..
```
This is a timing-dependent race. The window is roughly 0.5–1 second
between the `fail` flag being set and the failover election completing.
It was reported by a user who hit it when deleting a primary deployment.
### The fix
Before issuing `CLUSTER FORGET`, check whether any live node in the
cluster still considers the failing node as its master (`HasReplicaOf`).
If so, skip the FORGET — the replica needs the dead node in the other
masters' node tables to complete the failover election. Once the
failover completes and the replica is promoted, it no longer reports
itself as a slave of the dead node, so the next reconcile will proceed
with FORGET normally.
### Changes
- **`internal/valkey/clusterstate.go`**: Add `HasReplicaOf(nodeId)`
method on `ClusterState` that checks if any node's `CLUSTER NODES`
self-report shows it as a replica of the given node ID. Add
`MasterIdFromSelf()` helper on `NodeState` that extracts `fields[3]`
(master ID) from the `myself` line.
- **`internal/controller/valkeycluster_controller.go`**: Guard
`forgetStaleNodes` with the `HasReplicaOf` check. When skipped, log
`"skipping forget; failover pending for node"` at V(1).
### Why this is safe
- **Dead replica (not a master):** No node claims a replica as its
master → `HasReplicaOf` returns false → FORGET proceeds immediately. No
behavior change.
- **Both master and replica are dead:** The dead replica isn't in
`state.Shards` (connection failed) → `HasReplicaOf` returns false →
FORGET proceeds. Correct — no failover is possible anyway.
- **Scale-down stale nodes:** Drained masters have no replicas left →
`HasReplicaOf` returns false → FORGET proceeds. No behavior change.
- **Failover permanently blocked for other reasons** (e.g., replica too
far behind): `HasReplicaOf` returns true, FORGET is deferred. This is no
worse than today where FORGET runs but the failover is also permanently
blocked. With this fix, at least the failover has a chance if the
blocking condition resolves.
---------
Signed-off-by: yang.qiu <yang.qiu@reddit.com>
Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
Co-authored-by: yang.qiu <yang.qiu@reddit.com>
Co-authored-by: Joseph Heyburn <jdheyburn@gmail.com>1 parent 54ad6dd commit 2042a47
2 files changed
Lines changed: 51 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
805 | 805 | | |
806 | 806 | | |
807 | 807 | | |
808 | | - | |
809 | | - | |
810 | | - | |
811 | | - | |
812 | | - | |
813 | | - | |
814 | | - | |
815 | | - | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
816 | 827 | | |
817 | 828 | | |
818 | 829 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
193 | 193 | | |
194 | 194 | | |
195 | 195 | | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
196 | 228 | | |
197 | 229 | | |
198 | 230 | | |
| |||
0 commit comments