Skip to content

[Bug] LogSet shard pending indefinitely after rapid InPlace restarts: Raft config change deadlock #24203

@xzxiong

Description

@xzxiong

Bug Description

LogSet Pod 在短时间内多次 InPlace 重启(image 变更)后,某个 shard 的 Raft 成员变更陷入死锁,shard 永久 pending 无法自愈。

Environment

  • Version: v3.0.0-b79773d55-2026-04-25
  • Cluster: dev freetier-01, 3-replica LogSet
  • Trigger: InPlace pod restart (image change) 2 times within ~13 minutes

Steps to Reproduce

  1. 3-replica LogSet running normally (log-0, log-1, log-2)
  2. Change LogSet image (InPlace update) → log-2 restarts (container kill + restart)
  3. While log-2 shard 1 is still recovering, change image again → log-2 restarts again
  4. After second restart, shard 1 on log-2 enters permanent pending state

Observed Behavior

log-2 continuously outputs (every second, indefinitely):

shard 1 is pending, not included into the heartbeat

log-0 (shard 1 leader) fails every L/Add attempt:

ERROR logservice/service_commands.go:80 failed to add replica
error: request rejected
  dragonboat/v4@.../request.go:79

HAKeeper keeps retrying with incrementing replicaID (5420635 → 5420781+), every ~18 seconds, all rejected.

Deleting and recreating log-2 Pod does not fix the issue — the pending config change is persisted in log-0/log-1 Raft log.

Expected Behavior

  • shard 1 should eventually recover after Pod restart
  • If a config change is stuck, there should be a timeout/cleanup mechanism
  • At minimum, HAKeeper should detect the deadlock and take corrective action (e.g., remove the stuck pending config change before retrying L/Add)

Root Cause Analysis

dragonboat rejects AddReplica requests when there is an ongoing (pending) config change in the Raft group. The sequence:

  1. log-2 restarts → HAKeeper detects shard 1 replica down → sends L/Add with new replicaID
  2. log-0 starts executing the config change (AddReplica)
  3. Before config change completes, log-2 restarts again
  4. HAKeeper sends another L/Add with newer replicaID
  5. dragonboat rejects because previous config change is still pending
  6. The old config change never completes (target node restarted with different state)
  7. Deadlock: old config change blocks new ones, but old one cannot complete

Impact

  • shard 1 runs with only 2/3 replicas (degraded, no redundancy)
  • LogSet CR status shows Ready=True, 3/3 Upmisleading, since it only checks store-level heartbeat (shard 0 / HAKeeper shard is fine)
  • If one more log node fails, shard 1 loses majority → data unavailable

Suggested Fix Areas

  1. dragonboat: Add timeout for pending config changes, auto-abort if not completed within threshold
  2. logservice: Before retrying L/Add, check if there is a pending config change and wait/abort it first
  3. HAKeeper: Detect repeated L/Add failures and escalate (e.g., L/Remove old replica first, then L/Add)
  4. LogSet status: Report per-shard health, not just store-level heartbeat

Logs

Full analysis with timeline: internal doc handbooks:docs/analysis/20260426-dev-freetier01-logset-shard1-raft-deadlock.md

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions