feat: support prechecking down peers before restarting tikv pod#6877
feat: support prechecking down peers before restarting tikv pod#6877ti-chi-bot[bot] merged 4 commits intopingcap:mainfrom
Conversation
Signed-off-by: liubo02 <liubo02@pingcap.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6877 +/- ##
==========================================
+ Coverage 37.44% 37.61% +0.17%
==========================================
Files 392 392
Lines 22432 22483 +51
==========================================
+ Hits 8399 8458 +59
+ Misses 14033 14025 -8
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR enhances TiKV pod restart safety by introducing PD-based prechecks (down-peer regions and leader eviction) before allowing TiKV pod recreation, and refactors leader-eviction condition syncing into the eviction task flow.
Changes:
- Add a PD API client method and types for querying regions with down peers (
/pd/api/v1/regions/check/down-peer). - Gate TiKV pod recreation on (a) zero non-self down peers and (b) leaders being evicted, and trigger leader-eviction scheduling when needed.
- Refactor syncing of
TiKVCondLeadersEvictedfrom the status task into the leader-eviction task, with updated/added unit tests.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/pdapi/v1/types.go | Adds PD response types for down-peer region checks. |
| pkg/pdapi/v1/client.go | Adds GetDownPeerRegions PD client call and endpoint constant. |
| pkg/pdapi/v1/mock_generated.go | Updates PD client mock to include GetDownPeerRegions. |
| pkg/pdapi/v1/client_test.go | Adds unit test coverage for GetDownPeerRegions. |
| pkg/controllers/tikv/tasks/util.go | Adds helper checks for leader-eviction status/timeout; fixes VolumeName import aliasing. |
| pkg/controllers/tikv/tasks/pod.go | Adds restart prechecks (down peers + leaders evicted) and wires PD client usage into restart flow. |
| pkg/controllers/tikv/tasks/pod_test.go | Extends pod task tests to cover down-peer filtering and leader-eviction gating behavior. |
| pkg/controllers/tikv/tasks/evict_leader.go | Changes eviction scheduler management based on ShouldEvictLeader and syncs LeadersEvicted condition here. |
| pkg/controllers/tikv/tasks/evict_leader_test.go | Adds tests for starting/stopping leader eviction scheduler behavior. |
| pkg/controllers/tikv/tasks/offline.go | Switches offline flow to use the new leader-eviction check helper and ShouldEvictLeader. |
| pkg/controllers/tikv/tasks/status.go | Removes leader-eviction condition syncing and related wait behavior from status task. |
| pkg/controllers/tikv/tasks/status_test.go | Updates expectations after removing leader-eviction condition management from status task. |
| pkg/controllers/tikv/tasks/ctx.go | Minor formatting/structure adjustments; no functional change observed. |
| pkg/controllers/tikv/builder.go | Updates runner wiring to pass PD client manager into TaskPod. |
| api/core/v1alpha1/tikv_types.go | Adds ReasonStoreNotExist and deprecates ReasonStoreIsRemoved. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return task.Wait().With("cannot recreate pod, check down peer: %v", err) | ||
| } | ||
|
|
||
| if err := CheckTiKVLeadersEvicted(state.TiKV()); err != nil { | ||
| return task.Wait().With("cannot recreate pod, check leader count: %v", err) |
| func countNonSelfDownPeers(downPeerInfo *pdapi.RegionsCheckInfo, store *pdv1.Store) int { | ||
| if store == nil || store.ID == "" { | ||
| return downPeerInfo.Count | ||
| } | ||
| if downPeerInfo.Count == 0 { | ||
| return 0 | ||
| } | ||
|
|
||
| nonSelfDownPeerCount := 0 | ||
| for _, region := range downPeerInfo.Regions { |
| case !state.PDSynced: | ||
| return task.Wait().With("pd is unsynced") | ||
| case state.Store == nil: | ||
| if state.Store == nil { |
| pc, ok := state.GetPDClient(cm) | ||
| if !ok { | ||
| return task.Wait().With("wait if pd client is not registered") | ||
| } | ||
|
|
Signed-off-by: liubo02 <liubo02@pingcap.com>
|
/cherry-pick release-2.1 |
|
@liubog2008: once the present PR merges, I will cherry-pick it on top of release-2.1 in the new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fgksgf The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
@liubog2008: new pull request created to branch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Uh oh!
There was an error while loading. Please reload this page.