Skip to content

rls: only reset backoff on recovery from TRANSIENT_FAILURE#9137

Merged
easwars merged 21 commits into
grpc:masterfrom
mswierq:fix/8693-rls-control-channel-state-monitoring
May 27, 2026
Merged

rls: only reset backoff on recovery from TRANSIENT_FAILURE#9137
easwars merged 21 commits into
grpc:masterfrom
mswierq:fix/8693-rls-control-channel-state-monitoring

Conversation

@mswierq
Copy link
Copy Markdown
Contributor

@mswierq mswierq commented May 20, 2026

Fixes #8693

This PR is a continuation and finalization of the stale/closed PR #8720.

It addresses the remaining feedback from the maintainers and the code review tools to resolve control channel state monitoring issues in the RLS balancer.

RELEASE NOTES: N/A

ulascansenturk and others added 18 commits November 20, 2025 23:39
Fix control channel connectivity monitoring to track TRANSIENT_FAILURE
state explicitly. Only reset backoff timers when transitioning from
TRANSIENT_FAILURE to READY, not for benign state changes like
READY → IDLE → READY.

Fixes grpc#8693
- Add testOnlyInitialReadyDone channel for proper test synchronization
- Signal when monitoring goroutine processes initial READY state
- Tests wait for this signal instead of using time.Sleep
- All synchronization now uses channels/callbacks - no arbitrary sleeps
- Tests pass consistently with race detector

Addresses review feedback about removing time.Sleep for state transitions.
…rt timeouts

- Replace goto/label patterns with labeled for loops and break
- Replace default cases with time.After(10ms) for proper timing
- Remove impossible TRANSIENT_FAILURE handling in IDLE test
Replace 5 identical labeled-for-select loops with a shared helper
function that waits for a specific connectivity state on the channel.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.09%. Comparing base (bb023f8) to head (a712535).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
balancer/rls/control_channel.go 73.91% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9137      +/-   ##
==========================================
- Coverage   83.21%   83.09%   -0.12%     
==========================================
  Files         414      417       +3     
  Lines       33489    33643     +154     
==========================================
+ Hits        27868    27956      +88     
- Misses       4207     4261      +54     
- Partials     1414     1426      +12     
Files with missing lines Coverage Δ
balancer/rls/control_channel.go 83.13% <73.91%> (-3.83%) ⬇️

... and 36 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mswierq mswierq marked this pull request as ready for review May 20, 2026 17:56
@easwars easwars added Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. Type: Internal Cleanup Refactors, etc Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. and removed Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. labels May 20, 2026
@easwars easwars added this to the 1.82 Release milestone May 20, 2026
@easwars easwars requested a review from eshitachandwani May 20, 2026 18:40
@easwars
Copy link
Copy Markdown
Contributor

easwars commented May 20, 2026

@eshitachandwani : Since you were the first reviewer on the original PR, I'm assigning this to you again for review. Please do take a look my open comments from the original PR and ensure that they are handled here. Thanks.

Comment thread balancer/rls/control_channel.go Outdated
Comment thread balancer/rls/balancer_test.go Outdated
Comment thread balancer/rls/balancer_test.go Outdated
Comment thread balancer/rls/balancer_test.go Outdated
Copy link
Copy Markdown
Member

@eshitachandwani eshitachandwani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some minor nits. Adding @easwars as a second reviewer.

Comment thread balancer/rls/control_channel.go Outdated
Comment thread balancer/rls/balancer_test.go Outdated
Comment thread balancer/rls/balancer_test.go Outdated
@easwars
Copy link
Copy Markdown
Contributor

easwars commented May 27, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the RLS control channel connectivity state monitoring by removing the background monitoring goroutine and handling state transitions directly within the OnMessage callback. This change ensures that backoff resets are only triggered when recovering from a TRANSIENT_FAILURE state, avoiding unnecessary resets during benign transitions like READY -> IDLE -> READY or the initial connection. The test suite has been updated and expanded to verify these scenarios. The review comments identify a potential panic in a test helper when a channel is closed, and potential race conditions in the test subscriber setup where a shared instance's delegate is overwritten.

Comment thread balancer/rls/balancer_test.go
Comment thread balancer/rls/balancer_test.go
Comment thread balancer/rls/balancer_test.go
@easwars easwars changed the title Fix/8693 rls control channel state monitoring rls: only reset backoff on recovery from TRANSIENT_FAILURE May 27, 2026
Copy link
Copy Markdown
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo minor nits

Comment thread balancer/rls/control_channel.go
Comment thread balancer/rls/control_channel.go
Comment thread balancer/rls/balancer_test.go
@easwars easwars removed their assignment May 27, 2026
@easwars easwars merged commit bcaa6f4 into grpc:master May 27, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Internal Cleanup Refactors, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rls: Update logic in the control channel connectivity state monitoring goroutine

4 participants