Skip to content

fix: round-robin tiebreak in pickLeastBusyChannel to prevent concurrent-bind collapse#234

Open
akash329d wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
akash329d:fix-pick-least-busy-tiebreak
Open

fix: round-robin tiebreak in pickLeastBusyChannel to prevent concurrent-bind collapse#234
akash329d wants to merge 1 commit intoGoogleCloudPlatform:masterfrom
akash329d:fix-pick-least-busy-tiebreak

Conversation

@akash329d
Copy link
Copy Markdown

pickLeastBusyChannel seeds with channelRefs.get(0) and only switches on a strictly-smaller activeStreamsCount. The count is incremented later in GcpClientCall.start(), so when many new affinity keys arrive concurrently they all observe equal (zero) counts, all tie-break to channel 0, and the bindings stick — funnelling subsequent traffic onto one channel and queuing at HTTP/2 MAX_CONCURRENT_STREAMS.

We hit this via java-spanner with multiplexed sessions (which sets a small bounded affinity-key set per RPC); see googleapis/google-cloud-java#12725 for the throughput numbers and repro.

This change rotates the iteration start so that ties spread round-robin across the pool while still preferring a strictly less-busy channel when one exists. Adds a regression test alongside the existing testGetChannelRefPickUpSmallest.

@google-cla
Copy link
Copy Markdown

google-cla bot commented Apr 9, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant