Skip to content

LeastRequestLoadBalancer does not account for pending requests #44989

@benjawo

Description

@benjawo

Title: LeastRequestLoadBalancer does not account for pending requests

Description:
We have a service running where a single Envoy fronts a small cluster of request-serving hosts and balances traffic across them with full scan LeastRequest LB. While running this setup, we occasionally see spikes in rq_active for a single host that jump to ~14-15, while p50 for the cluster is still around 1-2. We don't have the most granular monitoring here, but did notice that these spikes correlate with other spikes in upstream_rq_pending_active, as well as a step increase in upstream_cx_active equal to the pending number. From this, our current theory is that connection establishment to one host is occasionally slow. While those connections are pending, additional requests get assigned to the same host (because its rq_active doesn't reflect the queued work). Once connections finish establishing, the queue drains, producing a spike in concurrent requests at that target.

I can see from the code itself that this sequence of events is possible:

  1. LeastRequestLoadBalancer referencing rq_active 1
  2. If the chosen host has no ready connections, the request is enqueued into pending_streams_ and upstream_rq_pending_active_ is incremented, while rq_active is not 2
  3. rq_active is only incremented later when a connection finishes and the stream is popped off the queue 3 and attached to the client 4

With bursty workloads or slow connection establishment it seems this strategy can compromise the performance of LeastRequestLoadBalancer, causing it to assign load in an imbalanced manner, and potentially driving up load to a host that is already struggling as it is slow to open connections.

Is this an intentional design decision? If so, we'd be interested in the reasoning. If not, would you be open to a fix? It doesn't seem too simple given the current code structure. Simply incrementing rq_active earlier would cause the metric to be incorrect. Including pending requires iterating across connection pools. Perhaps a new counter for pending per host?

Observed Behavior:
Here is an example of the observed behavior in our service. Can see that within a single 1 minute data point, upstream_rq_pending_total spikes to 17 (we emit deltas of total to catch spikes our sampling rate misses in active), requests in flight (emitted from our application) spikes to 16, and upstream_cx_active has a step from 81 to 98.

Image

I haven't created a repro of this issue because I felt the code links and description of the chain of events were sufficient, but let me know if any more information is needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/load balancingquestionQuestions that are neither investigations, bugs, nor enhancements

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions