LeastRequestLoadBalancer does not account for pending requests

*Title*: *LeastRequestLoadBalancer does not account for pending requests*

*Description*:
We have a service running where a single Envoy fronts a small cluster of request-serving hosts and balances traffic across them with full scan LeastRequest LB. While running this setup, we occasionally see spikes in `rq_active` for a single host that jump to ~14-15, while p50 for the cluster is still around 1-2. We don't have the most granular monitoring here, but did notice that these spikes correlate with other spikes in `upstream_rq_pending_active`, as well as a step increase in `upstream_cx_active` equal to the pending number. From this, our current theory is that connection establishment to one host is occasionally slow. While those connections are pending, additional requests get assigned to the same host (because its rq_active doesn't reflect the queued work). Once connections finish establishing, the queue drains, producing a spike in concurrent requests at that target.

I can see from the code itself that this sequence of events is possible:
1. LeastRequestLoadBalancer referencing `rq_active` [1](https://github.com/envoyproxy/envoy/blob/1e9d667f49f8b4593de89a9d0b878e18fd00314c/source/extensions/load_balancing_policies/least_request/least_request_lb.cc#L132)
2. If the chosen host has no ready connections, the request is enqueued into `pending_streams_` and `upstream_rq_pending_active_` is incremented, while `rq_active` is not [2](https://github.com/envoyproxy/envoy/blob/1e9d667f49f8b4593de89a9d0b878e18fd00314c/source/common/conn_pool/conn_pool_base.cc#L699)
3. `rq_active` is only incremented later when a connection finishes and the stream is popped off the queue [3](https://github.com/envoyproxy/envoy/blob/1e9d667f49f8b4593de89a9d0b878e18fd00314c/source/common/conn_pool/conn_pool_base.cc#L407) and attached to the client [4](https://github.com/envoyproxy/envoy/blob/1e9d667f49f8b4593de89a9d0b878e18fd00314c/source/common/conn_pool/conn_pool_base.cc#L275)

With bursty workloads or slow connection establishment it seems this strategy can compromise the performance of LeastRequestLoadBalancer, causing it to assign load in an imbalanced manner, and potentially driving up load to a host that is already struggling as it is slow to open connections.

Is this an intentional design decision? If so, we'd be interested in the reasoning. If not, would you be open to a fix? It doesn't seem too simple given the current code structure. Simply incrementing `rq_active` earlier would cause the metric to be incorrect. Including pending requires iterating across connection pools. Perhaps a new counter for pending per host? 


*Observed Behavior*:
Here is an example of the observed behavior in our service. Can see that within a single 1 minute data point, `upstream_rq_pending_total` spikes to 17 (we emit deltas of `total` to catch spikes our sampling rate misses in `active`), requests in flight (emitted from our application) spikes to 16, and `upstream_cx_active` has a step from 81 to 98. 

<img width="798" height="393" alt="Image" src="https://github.com/user-attachments/assets/f5746bb8-575b-4612-ae72-a1bf8d54711d" />


I haven't created a repro of this issue because I felt the code links and description of the chain of events were sufficient, but let me know if any more information is needed. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeastRequestLoadBalancer does not account for pending requests #44989

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

LeastRequestLoadBalancer does not account for pending requests #44989

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions