Skip to content

perf(loadbalance): avoid redundant Store in consistBalancer.GetPicker#1948

Open
Wizard-Guido wants to merge 1 commit intocloudwego:mainfrom
Wizard-Guido:perf/consist-balancer-dedupe-store
Open

perf(loadbalance): avoid redundant Store in consistBalancer.GetPicker#1948
Wizard-Guido wants to merge 1 commit intocloudwego:mainfrom
Wizard-Guido:perf/consist-balancer-dedupe-store

Conversation

@Wizard-Guido
Copy link
Copy Markdown

What type of PR is this?

  • perf

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
When multiple goroutines concurrently miss the cache for the same CacheKey, singleflight correctly ensures newConsistInfo runs only once — but every waiting goroutine then independently calls cachedConsistInfo.Store(key, cii) with the same value. For N waiters this produces N-1 redundant sync.Map writes.

Moving the Store inside the singleflight.Do callback guarantees it runs exactly once, regardless of how many goroutines were waiting. This also aligns consistBalancer.GetPicker with the pre-existing pattern in weightedBalancer.GetPicker (same package), where the cache write already lives inside sfg.Do.

As a side effect, the race window where a stale singleflight result can overwrite a fresher entry written by a concurrent Rebalance is narrowed (the store now completes before any waiter observes the result, instead of N-1 stores racing afterward). The broader Delete / Rebalance race noted in the existing FIXME on line 369 is not addressed by this PR.

// Before — Store called by every waiter
cii, _, _ = cb.sfg.Do(e.CacheKey, func() (interface{}, error) {
    return cb.newConsistInfo(e), nil
})
cb.cachedConsistInfo.Store(e.CacheKey, cii)

// After — Store called exactly once, matches weightedBalancer.GetPicker
cii, _, _ = cb.sfg.Do(e.CacheKey, func() (interface{}, error) {
    ci := cb.newConsistInfo(e)
    cb.cachedConsistInfo.Store(e.CacheKey, ci)
    return ci, nil
})

This fix aligns with the existing pattern in the same package:
weightedBalancer.GetPicker in this same package already places the cache write inside the singleflight.Do callback — this is the established best practice within pkg/loadbalance:

// pkg/loadbalance/weighted_balancer.go — existing correct pattern
picker, _, _ = wb.sfg.Do(e.CacheKey, func() (interface{}, error) {
    p := wb.createPicker(e)
    wb.pickerCache.Store(e.CacheKey, p)   // Store inside sfg.Do
    return p, nil
})

Also fixes a typo in a test function name: TestConsistPicker_ReblanceTestConsistPicker_Rebalance.

zh(optional):

consistBalancer.GetPicker 在缓存未命中时使用 singleflight 构建一致性哈希环。当多个 goroutine 并发请求同一个 CacheKey 时,singleflight 正确保证了 newConsistInfo 只执行一次 —— 但所有等待的 goroutine 拿到结果后各自独立调用了一次 cachedConsistInfo.Store(key, cii)。N 个等待者就会产生 N-1 次完全冗余的 sync.Map 写入,每次写入内部都会走锁竞争、原子操作,甚至可能触发 dirty map 晋升。

此外还存在一个小的竞态窗口:因为 Store 发生在 sfg.Do 返回之后,如果这期间有并发的 Rebalance 写入了更新的数据,这些迟到的 Store 会把新数据覆盖回旧的

修复:对齐本包内已有的最佳实践
同包中的 weightedBalancer.GetPicker 早已把缓存写入放在 singleflight.Do 的回调内部,这是 pkg/loadbalance 里已经建立的惯例:

// pkg/loadbalance/weighted_balancer.go — 已有的正确写法
picker, _, _ = wb.sfg.Do(e.CacheKey, func() (interface{}, error) {
    p := wb.createPicker(e)
    wb.pickerCache.Store(e.CacheKey, p)   // Store 在 sfg.Do 里面
    return p, nil
})

本 PR 让 consistBalancer.GetPicker 遵循同样的模式:

// 修改前 —— 每个等待者都会调用 Store(N 次)
cii, _, _ = cb.sfg.Do(e.CacheKey, func() (interface{}, error) {
    return cb.newConsistInfo(e), nil
})
cb.cachedConsistInfo.Store(e.CacheKey, cii)

// 修改后 —— Store 只执行一次,与 weightedBalancer.GetPicker 保持一致
cii, _, _ = cb.sfg.Do(e.CacheKey, func() (interface{}, error) {
    ci := cb.newConsistInfo(e)
    cb.cachedConsistInfo.Store(e.CacheKey, ci)
    return ci, nil
})

顺便修复了一个测试函数名拼写错误:TestConsistPicker_ReblanceTestConsistPicker_Rebalance
不在本 PR 范围内:consist.go:369 处 FIXME 标注的 Delete / Rebalance 竞态问题本 PR 不处理。

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

@Wizard-Guido Wizard-Guido requested review from a team as code owners April 14, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant