|
| 1 | +--- |
| 2 | +title: "Backend Utilization Load Balancing" |
| 3 | +--- |
| 4 | + |
| 5 | +BackendUtilization load balancing uses [Open Resource Cost Application (ORCA)][ORCA] load metrics reported by the backend to dynamically weight endpoints. Under the hood it is implemented as [Envoy's client-side weighted round-robin][client-side-wrr] policy: each endpoint's weight is derived from the utilization metrics it emits, so instances running hot receive proportionally less traffic than those with headroom. |
| 6 | + |
| 7 | +If no ORCA metrics are received from an endpoint, that endpoint is treated as evenly weighted. |
| 8 | + |
| 9 | +See the [Load Balancing concepts page][concepts-lb] for a deeper explanation of ORCA metric formats. |
| 10 | + |
| 11 | +## Prerequisites |
| 12 | + |
| 13 | +* Your backend (or a sidecar in front of it) must emit ORCA load metrics as response headers or trailers. See [Backend instrumentation](#backend-instrumentation) below. |
| 14 | +* {{< boilerplate prerequisites >}} |
| 15 | + |
| 16 | +## Build and Deploy the Example Backend |
| 17 | + |
| 18 | +The Envoy Gateway repository includes a small HTTP server under `examples/backend-utilization/` that emits a fixed ORCA `cpu_utilization` value (set via the `ORCA_CPU_UTILIZATION` environment variable) on every response. The example manifest deploys two sets of pods — one reporting `0.1` (idle) and one reporting `0.9` (hot) — behind a single Service. This lets you observe the weighting effect without wiring real load into a backend. |
| 19 | + |
| 20 | +**Note:** The `envoyproxy/gateway-backend-utilization` image is not published to a public registry — you need to build it locally from a checkout of the Envoy Gateway repository. |
| 21 | + |
| 22 | +* Build the example backend image |
| 23 | + |
| 24 | + ```shell |
| 25 | + make -C examples/backend-utilization docker-buildx |
| 26 | + ``` |
| 27 | + |
| 28 | +* Make the image available to your cluster |
| 29 | + |
| 30 | + {{< tabpane text=true >}} |
| 31 | + {{% tab header="local kind server" %}} |
| 32 | + |
| 33 | + ```shell |
| 34 | + kind load docker-image --name envoy-gateway envoyproxy/gateway-backend-utilization:latest |
| 35 | + ``` |
| 36 | + |
| 37 | + {{% /tab %}} |
| 38 | + {{% tab header="other Kubernetes server" %}} |
| 39 | + |
| 40 | + ```shell |
| 41 | + docker tag envoyproxy/gateway-backend-utilization:latest $YOUR_DOCKER_REPO/gateway-backend-utilization:latest |
| 42 | + docker push $YOUR_DOCKER_REPO/gateway-backend-utilization:latest |
| 43 | + ``` |
| 44 | + |
| 45 | + If you push to your own registry, update the `image:` field in `examples/kubernetes/backend-utilization.yaml` to match before applying. |
| 46 | + |
| 47 | + {{% /tab %}} |
| 48 | + {{< /tabpane >}} |
| 49 | + |
| 50 | +* Apply the example manifest (Service, two Deployments, HTTPRoute) |
| 51 | + |
| 52 | + ```shell |
| 53 | + kubectl apply -f https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/backend-utilization.yaml -n default |
| 54 | + ``` |
| 55 | + |
| 56 | +Verify the two Deployments are ready: |
| 57 | + |
| 58 | +```shell |
| 59 | +kubectl get deployment/backend-utilization-low deployment/backend-utilization-high -n default |
| 60 | +``` |
| 61 | + |
| 62 | +## Configure BackendUtilization |
| 63 | + |
| 64 | +Apply a [BackendTrafficPolicy][BackendTrafficPolicy] with `loadBalancer.type: BackendUtilization`: |
| 65 | + |
| 66 | +{{< tabpane text=true >}} |
| 67 | +{{% tab header="Apply from stdin" %}} |
| 68 | +```shell |
| 69 | +cat <<EOF | kubectl apply -f - |
| 70 | +apiVersion: gateway.envoyproxy.io/v1alpha1 |
| 71 | +kind: BackendTrafficPolicy |
| 72 | +metadata: |
| 73 | + name: backend-utilization |
| 74 | + namespace: default |
| 75 | +spec: |
| 76 | + targetRefs: |
| 77 | + - group: gateway.networking.k8s.io |
| 78 | + kind: HTTPRoute |
| 79 | + name: backend-utilization |
| 80 | + loadBalancer: |
| 81 | + type: BackendUtilization |
| 82 | + backendUtilization: |
| 83 | + blackoutPeriod: 1s # shorten so the demo shifts traffic quickly |
| 84 | + weightUpdatePeriod: 500ms |
| 85 | +EOF |
| 86 | +``` |
| 87 | +{{% /tab %}} |
| 88 | +{{% tab header="Apply from file" %}} |
| 89 | +```yaml |
| 90 | +--- |
| 91 | +apiVersion: gateway.envoyproxy.io/v1alpha1 |
| 92 | +kind: BackendTrafficPolicy |
| 93 | +metadata: |
| 94 | + name: backend-utilization |
| 95 | + namespace: default |
| 96 | +spec: |
| 97 | + targetRefs: |
| 98 | + - group: gateway.networking.k8s.io |
| 99 | + kind: HTTPRoute |
| 100 | + name: backend-utilization |
| 101 | + loadBalancer: |
| 102 | + type: BackendUtilization |
| 103 | + backendUtilization: |
| 104 | + blackoutPeriod: 1s # shorten so the demo shifts traffic quickly |
| 105 | + weightUpdatePeriod: 500ms |
| 106 | +``` |
| 107 | +{{% /tab %}} |
| 108 | +{{< /tabpane >}} |
| 109 | +
|
| 110 | +Leaving `backendUtilization: {}` empty accepts the defaults, but the 10 s default `blackoutPeriod` means traffic will appear evenly split for the first 10 seconds of the test. The shorter values above make the weighting visible immediately. The `backendUtilization` field itself is required when `type: BackendUtilization` — omitting it will fail CEL validation. |
| 111 | + |
| 112 | +## Configuration Fields |
| 113 | + |
| 114 | +All fields on `backendUtilization` are optional. |
| 115 | + |
| 116 | +| Field | Default | Purpose | |
| 117 | +|---|---|---| |
| 118 | +| `blackoutPeriod` | `10s` | How long an endpoint must report metrics before its reported weight is trusted. Prevents traffic from shifting based on a single noisy sample. | |
| 119 | +| `weightExpirationPeriod` | `3m` | If an endpoint stops reporting for this long, its reported weight is discarded and it reverts to the default weight. | |
| 120 | +| `weightUpdatePeriod` | `1s` | How often Envoy recomputes the weight table. Values below `100ms` are capped at `100ms`. | |
| 121 | +| `errorUtilizationPenaltyPercent` | `0` | Multiplier (as `percent × 100`) applied to an endpoint's effective utilization based on its error rate (eps/qps). `100` = 1.0×, `150` = 1.5×, `200` = 2.0×. Higher values push errant endpoints out of rotation faster. | |
| 122 | +| `metricNamesForComputingUtilization` | _unset_ | Custom ORCA metric keys to feed into the weight formula when `application_utilization` isn't reported. Use `named_metrics.<key>` for keys inside the ORCA proto's `named_metrics` map. | |
| 123 | +| `keepResponseHeaders` | `false` | By default Envoy strips the ORCA headers/trailers before forwarding the response. Set to `true` to let downstream clients see them (useful for chained load balancers or debugging). | |
| 124 | + |
| 125 | +### Example: Tuned for a Bursty Backend |
| 126 | + |
| 127 | +```yaml |
| 128 | +loadBalancer: |
| 129 | + type: BackendUtilization |
| 130 | + backendUtilization: |
| 131 | + blackoutPeriod: 30s # ignore reports during slow-start |
| 132 | + weightExpirationPeriod: 1m # shorter memory — react faster to silent endpoints |
| 133 | + weightUpdatePeriod: 500ms # faster reweighting |
| 134 | + errorUtilizationPenaltyPercent: 150 # 1.5× penalty for errant endpoints |
| 135 | +``` |
| 136 | + |
| 137 | +### Example: Application-Defined Utilization |
| 138 | + |
| 139 | +If your backend reports a custom metric (for example, queue depth) instead of CPU utilization, wire it in through `metricNamesForComputingUtilization`: |
| 140 | + |
| 141 | +```yaml |
| 142 | +loadBalancer: |
| 143 | + type: BackendUtilization |
| 144 | + backendUtilization: |
| 145 | + metricNamesForComputingUtilization: |
| 146 | + - named_metrics.queue_depth |
| 147 | +``` |
| 148 | + |
| 149 | +The backend would then emit: |
| 150 | + |
| 151 | +```http |
| 152 | +endpoint-load-metrics: TEXT named_metrics.queue_depth=0.42 |
| 153 | +``` |
| 154 | + |
| 155 | +## Backend Instrumentation |
| 156 | + |
| 157 | +Your backend must emit ORCA load metrics. Envoy accepts metrics in three formats on response **headers or trailers**: |
| 158 | + |
| 159 | +| Format | Header | Payload | |
| 160 | +|---|---|---| |
| 161 | +| Binary | `endpoint-load-metrics-bin` | Base64-encoded serialized [`OrcaLoadReport`][orca-proto] proto | |
| 162 | +| JSON | `endpoint-load-metrics` | `JSON {"cpu_utilization": 0.3, "mem_utilization": 0.8}` | |
| 163 | +| TEXT | `endpoint-load-metrics` | `TEXT cpu=0.3,mem=0.8,named_metrics.queue_depth=0.42` | |
| 164 | + |
| 165 | +For gRPC backends, the [xDS ORCA][grpc-orca] libraries emit these automatically via the `orca_load_report` service. For HTTP backends, add a response middleware that measures and serializes your CPU/memory/custom metrics on each response. |
| 166 | + |
| 167 | +## Combining With Zone-Aware Routing |
| 168 | + |
| 169 | +`BackendUtilization` composes with `weightedZones` to produce locality-aware weighted round-robin (Envoy's `wrr_locality` policy). See the [WeightedZones example][zone-aware-weighted] on the zone-aware routing page. |
| 170 | + |
| 171 | +`preferLocal` is **not** supported with `BackendUtilization`. |
| 172 | + |
| 173 | +## Testing |
| 174 | + |
| 175 | +Ensure the `GATEWAY_HOST` environment variable from the [Quickstart](../../quickstart) is set. If not, follow the Quickstart instructions to set the variable. |
| 176 | + |
| 177 | +Give Envoy a few seconds after applying the policy to collect ORCA samples and compute endpoint weights — until then, traffic will appear roughly even. Then send 200 requests and tally which deployment handled each. Because `backend-utilization-low` reports `cpu_utilization=0.1` and `backend-utilization-high` reports `0.9`, Envoy should weight the `low` pods roughly 9× more heavily. |
| 178 | + |
| 179 | +```shell |
| 180 | +for i in $(seq 1 200); do |
| 181 | + curl -s -H "Host: www.example.com" "http://${GATEWAY_HOST}/backend-utilization" | jq -r '.pod' |
| 182 | +done | sort | uniq -c |
| 183 | +``` |
| 184 | + |
| 185 | +Expected output (exact counts will vary, but `low` should dominate ~9:1): |
| 186 | + |
| 187 | +```console |
| 188 | + 90 backend-utilization-low-6b9cf46b59-l7df7 |
| 189 | + 87 backend-utilization-low-6b9cf46b59-xxrw2 |
| 190 | + 12 backend-utilization-high-5fdb65cb87-mctlp |
| 191 | + 11 backend-utilization-high-5fdb65cb87-rrdvq |
| 192 | +``` |
| 193 | + |
| 194 | +If you instead see a roughly even split, the weights may not have stabilized yet — wait a few seconds and retry. You can verify the per-endpoint weights directly through the Envoy admin interface: |
| 195 | + |
| 196 | +```shell |
| 197 | +ENVOY_POD=$(kubectl get pods -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}') |
| 198 | +kubectl -n envoy-gateway-system port-forward pod/${ENVOY_POD} 19000:19000 & |
| 199 | +curl -s localhost:19000/clusters | grep "backend-utilization" | grep weight |
| 200 | +``` |
| 201 | + |
| 202 | +You should see weights roughly `10000` for the `low` pods and `1111` for the `high` pods (the inverse of the reported utilization). |
| 203 | + |
| 204 | +## Clean-Up |
| 205 | + |
| 206 | +```shell |
| 207 | +kubectl delete backendtrafficpolicy/backend-utilization |
| 208 | +kubectl delete -f https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/backend-utilization.yaml -n default |
| 209 | +``` |
| 210 | + |
| 211 | +[ORCA]: https://docs.google.com/document/d/1NSnK3346BkBo1JUU3I9I5NYYnaJZQPt8_Z_XCBCI3uA |
| 212 | +[orca-proto]: https://www.envoyproxy.io/docs/envoy/latest/xds/data/orca/v3/orca_load_report.proto |
| 213 | +[client-side-wrr]: https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/load_balancing_policies/client_side_weighted_round_robin/v3/client_side_weighted_round_robin.proto |
| 214 | +[grpc-orca]: https://github.com/grpc/proposal/blob/master/A51-custom-backend-metrics.md |
| 215 | +[concepts-lb]: ../../../concepts/load-balancing#backend-utilization-orca |
| 216 | +[zone-aware-weighted]: ../zone-aware-routing#weightedzones |
| 217 | +[BackendTrafficPolicy]: ../../../api/extension_types#backendtrafficpolicy |
0 commit comments