Skip to content

Improve host_source locking and ring refresh concurrency#1944

Open
nanassito wants to merge 2 commits intoapache:trunkfrom
nanassito:feature/host-source-lock-perf
Open

Improve host_source locking and ring refresh concurrency#1944
nanassito wants to merge 2 commits intoapache:trunkfrom
nanassito:feature/host-source-lock-perf

Conversation

@nanassito
Copy link
Copy Markdown

We discovered during a load test of one of our service that there is a lot of locking going on in host_source.go. It turns out that a few can be optimized.

Here are the set of changes in this PR:

1. ringDescriber / GetHosts

The mutex surrounded all of getLocalHostInfo and getClusterPeerInfo (control-connection I/O). prevHosts / prevPartitioner were never written anywhere, so error returns were always nil / "". The mutex only serialized unrelated ring refreshes for no benefit. Removed mu, prevHosts, and prevPartitioner, and return nil, "", err on failure so concurrent GetHosts no longer block each other on network work.

2. HostInfo.ConnectAddressAndPort

It only reads fields; it used Lock and blocked writers. Switched to RLock so it matches other read-only accessors.

This is the most costly one exposed by our load test.

3. HostInfo.HostnameAndPort

Uses a read-optimized path: RLock when hostname is already set (common case); Lock only when lazily filling hostname.

4. isValidPeer

Previously mixed RPCAddress() (under lock) with direct reads of hostId, dataCenter, etc. (racy). One RLock covers all fields and avoids extra lock cycles from a separate RPCAddress call.

5. errorBroadcaster.broadcast

Stopped holding the mutex while sending on listener channels (could stall other newListener / broadcast callers). Snapshot and clear listeners under the lock, then unlock and notify.

Remove ringDescriber mutex around GetHosts I/O; prevHosts/prevPartitioner
were never written. Use read locks for ConnectAddressAndPort and a
read-fast path for HostnameAndPort. Read peer validity under a single
RLock in isValidPeer. Release errorBroadcaster mutex before sending to
listeners.

Made-with: Cursor
@jon-whit
Copy link
Copy Markdown

jon-whit commented Apr 16, 2026

We also have this same issue in our organization. Our application is being quite limited in its concurrency burstability because of the lock contention. Here's a sample mutex blocking profile taken from a real production runtime under high load.

Screenshot 2026-04-16 at 3 06 16 PM

Copy link
Copy Markdown
Contributor

@worryg0d worryg0d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I reviewed this patch - jfyi, I'm not an apache committer, so I'm just writing comments.

HostInfo methods might be on a hot path, so it is fine to narrow write-lock to read-lock where it is possible.

Having a benchmark would be useful to see the performance impact.

Comment thread host_source.go Outdated
Comment thread host_source.go
Comment on lines -488 to -490
mu sync.Mutex
prevHosts []*HostInfo
prevPartitioner string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized they are never used

@nanassito nanassito requested a review from worryg0d April 27, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants