outlier-detection: Unconditionally eject always failing endpoints#12537
outlier-detection: Unconditionally eject always failing endpoints#12537incubos wants to merge 4 commits intogrpc:masterfrom
Conversation
|
Is the grpc-java implementation not following gRFC A50? If it is a bug in the design, then the gRFC needs updating, as the other languages would need fixing too. |
|
Unfortunately, it looks like a bug in the design of gRFC A50 Success Rate Algorithm. I would suggest replacing 3-iii list item:
with
Could you please tell, how gRFC updating should be initiated? |
|
@incubos, it'd be a PR to the proposal repository. Prefix the PR title with "A50 update:" @murgatroid99, I assume you'd be the one to take a look. |
|
Thanks a lot! |
|
Could not convince the maintainer of A50 design to deviate from Envoy implementation having the same issue. |
If success rate standard deviation and/or
stdevFactorare big enough thenSuccessRateOutlierEjectionAlgorithmcalculates negativerequiredSuccessRatethreshold and doesn't eject (or even worse -- eventually stops ejecting) always failing endpoints (having zerosuccessCountmetric).We fix the issue by unconditionally ejecting endpoints with zero
successCount(ignoringstdevFactor-based threshold). A separate unit test is added.This change of behaviour might affect production installations with high standard deviation of success rates by ejecting completely unhealthy endpoints, but it is expected to work out for the best.