Skip to content

fix: handle TiKV metrics fetch errors (#6897)#6898

Open
ti-chi-bot wants to merge 1 commit into
pingcap:release-2.1from
ti-chi-bot:cherry-pick-6897-to-release-2.1
Open

fix: handle TiKV metrics fetch errors (#6897)#6898
ti-chi-bot wants to merge 1 commit into
pingcap:release-2.1from
ti-chi-bot:cherry-pick-6897-to-release-2.1

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

This is an automated cherry-pick of #6897

What

GetLeaderCount could say the leader metric was missing when the /metrics fetch itself failed. The fetch runs in a goroutine, but the error was stored in a shared var, so the caller could miss it. Tiny fix, but yeah, kinda sus.

This waits for the fetch result through a buffered error channel before returning the fallback "metric not found" error.

Related: #4281 is the same code path, but a different old goroutine-leak issue.

Repro

The added test reproduces it on the old code:

go test ./pkg/tikvapi/v1 -run TestTiKVClient_GetLeaderCountFetchError -count=1

It serves /metrics with HTTP 500. Old code returns "metric not found". New code returns the real fetch error. This can happen IRL when the TiKV metrics endpoint is unavailable, times out, hits TLS/transport errors, or returns non-200; no weird cloud quota edge case needed.

Tests

go test -race ./pkg/tikvapi/v1 -count=1
go test ./pkg/tikvapi/v1 -run TestTiKVClient_GetLeaderCountFetchError -count=100
go test ./pkg/pdapi/... ./pkg/tikvapi/... ./pkg/resourcemanagerapi/... ./pkg/ticdcapi/...
git diff --check
make unit

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign z2665 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.63%. Comparing base (e5ddd8b) to head (5788e5c).

Additional details and impacted files
@@               Coverage Diff               @@
##           release-2.1    #6898      +/-   ##
===============================================
+ Coverage        37.61%   37.63%   +0.01%     
===============================================
  Files              392      392              
  Lines            22483    22485       +2     
===============================================
+ Hits              8458     8463       +5     
+ Misses           14025    14022       -3     
Flag Coverage Δ
unittest 37.63% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants