|
| 1 | +--- |
| 2 | +title: "Kubernetes Resource Limits: The Right Way to Keep Workloads Stable" |
| 3 | +slug: "kubernetes-resource-limits" |
| 4 | +date: "2024-12-03" |
| 5 | +excerpt: "Missing or misconfigured resource requests and limits are the leading cause of noisy-neighbour problems and OOMKilled containers. Here's how to set them correctly." |
| 6 | +tags: |
| 7 | + - Kubernetes |
| 8 | + - DevOps |
| 9 | + - Platform Engineering |
| 10 | + - Reliability |
| 11 | +readTime: 8 |
| 12 | +metaDescription: "A hands-on guide to Kubernetes resource requests, limits, QoS classes, VPA, and LimitRange policies — with real-world recommendations for production clusters." |
| 13 | +--- |
| 14 | + |
| 15 | +## The Hidden Cost of Not Setting Limits |
| 16 | + |
| 17 | +Every Kubernetes cluster without explicit resource requests and limits eventually ends up with the same symptoms: intermittent `OOMKilled` pods, nodes that mysteriously max out CPU at 3 AM, and HPA that refuses to scale because `metrics-server` can't get meaningful numbers. |
| 18 | + |
| 19 | +Resource configuration isn't optional for production clusters. This guide walks through the concepts and practical patterns I use across client engagements. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## Requests vs Limits: The Mental Model |
| 24 | + |
| 25 | +Two separate levers control how Kubernetes allocates resources: |
| 26 | + |
| 27 | +| Field | What it does | When it matters | |
| 28 | +|---|---|---| |
| 29 | +| `requests` | Minimum guaranteed — scheduling decision | At pod scheduling time | |
| 30 | +| `limits` | Hard ceiling — enforced by cgroups | At runtime | |
| 31 | + |
| 32 | +The scheduler places a pod on a node that has **at least** the sum of all containers' `requests` available. The `limits` then cap what each container can actually consume at any point in time. |
| 33 | + |
| 34 | +```yaml |
| 35 | +resources: |
| 36 | + requests: |
| 37 | + cpu: "250m" # 0.25 vCPU guaranteed |
| 38 | + memory: "256Mi" # 256 MiB guaranteed |
| 39 | + limits: |
| 40 | + cpu: "1000m" # Max 1 vCPU |
| 41 | + memory: "512Mi" # Max 512 MiB — exceeding this → OOMKilled |
| 42 | +``` |
| 43 | +
|
| 44 | +--- |
| 45 | +
|
| 46 | +## QoS Classes and Why They Matter for Eviction |
| 47 | +
|
| 48 | +Kubernetes assigns each pod a **Quality of Service (QoS) class** based on its resource configuration. This class determines eviction priority when a node is under memory pressure: |
| 49 | +
|
| 50 | +| QoS Class | Condition | Eviction Priority | |
| 51 | +|---|---|---| |
| 52 | +| `Guaranteed` | requests == limits for all containers | Evicted last | |
| 53 | +| `Burstable` | At least one container has requests < limits | Evicted middle | |
| 54 | +| `BestEffort` | No requests or limits set | Evicted first | |
| 55 | + |
| 56 | +**Recommendation:** Production workloads should be `Guaranteed` (requests == limits) for memory, and `Burstable` for CPU. Memory is incompressible — when a container exceeds its limit, it's killed. CPU over-limit just means throttling, which is recoverable. |
| 57 | + |
| 58 | +```yaml |
| 59 | +# Guaranteed memory, Burstable CPU — common production pattern |
| 60 | +resources: |
| 61 | + requests: |
| 62 | + cpu: "500m" |
| 63 | + memory: "512Mi" |
| 64 | + limits: |
| 65 | + cpu: "2000m" # Allow CPU burst |
| 66 | + memory: "512Mi" # Lock memory — no surprises |
| 67 | +``` |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Choosing the Right Values |
| 72 | + |
| 73 | +Guessing at limits is dangerous. Set them too low and you get OOMKilled or CPU throttled processes; too high and you waste capacity. The right approach: |
| 74 | + |
| 75 | +### Step 1: Run without limits under load |
| 76 | + |
| 77 | +Deploy to staging without limits and observe actual consumption: |
| 78 | + |
| 79 | +```bash |
| 80 | +kubectl top pods -n my-app --containers |
| 81 | +``` |
| 82 | + |
| 83 | +### Step 2: Review historical metrics |
| 84 | + |
| 85 | +In Prometheus + Grafana: |
| 86 | + |
| 87 | +```promql |
| 88 | +# 95th-percentile CPU usage over 7 days |
| 89 | +quantile_over_time(0.95, container_cpu_usage_seconds_total{namespace="my-app"}[7d]) |
| 90 | +
|
| 91 | +# Peak memory usage per container |
| 92 | +max_over_time(container_memory_working_set_bytes{namespace="my-app"}[7d]) |
| 93 | +``` |
| 94 | + |
| 95 | +### Step 3: Apply headroom |
| 96 | + |
| 97 | +- **CPU requests:** p50 actual + 20% headroom |
| 98 | +- **CPU limits:** 2–4x the request (allow bursty processing) |
| 99 | +- **Memory requests:** p95 actual + 25% headroom |
| 100 | +- **Memory limits:** equal to requests (or p99 actual if you're confident in the ceiling) |
| 101 | + |
| 102 | +--- |
| 103 | + |
| 104 | +## Cluster-Wide Safety Nets: LimitRange and ResourceQuota |
| 105 | + |
| 106 | +Don't rely on every developer configuring resources correctly. Enforce defaults at the namespace level. |
| 107 | + |
| 108 | +### LimitRange — per-pod defaults |
| 109 | + |
| 110 | +```yaml |
| 111 | +apiVersion: v1 |
| 112 | +kind: LimitRange |
| 113 | +metadata: |
| 114 | + name: default-limits |
| 115 | + namespace: my-app |
| 116 | +spec: |
| 117 | + limits: |
| 118 | + - type: Container |
| 119 | + default: # Applied when limits are missing |
| 120 | + cpu: "500m" |
| 121 | + memory: "256Mi" |
| 122 | + defaultRequest: # Applied when requests are missing |
| 123 | + cpu: "100m" |
| 124 | + memory: "128Mi" |
| 125 | + max: |
| 126 | + cpu: "4" |
| 127 | + memory: "4Gi" |
| 128 | + min: |
| 129 | + cpu: "50m" |
| 130 | + memory: "64Mi" |
| 131 | +``` |
| 132 | + |
| 133 | +A pod deployed without resource fields will inherit `defaultRequest` and `default`. Any pod that exceeds `max` is rejected at admission. |
| 134 | + |
| 135 | +### ResourceQuota — namespace total cap |
| 136 | + |
| 137 | +```yaml |
| 138 | +apiVersion: v1 |
| 139 | +kind: ResourceQuota |
| 140 | +metadata: |
| 141 | + name: namespace-quota |
| 142 | + namespace: my-app |
| 143 | +spec: |
| 144 | + hard: |
| 145 | + requests.cpu: "10" |
| 146 | + requests.memory: "20Gi" |
| 147 | + limits.cpu: "20" |
| 148 | + limits.memory: "40Gi" |
| 149 | + pods: "50" |
| 150 | +``` |
| 151 | + |
| 152 | +This prevents a single misconfigured deployment from consuming all cluster capacity. |
| 153 | + |
| 154 | +--- |
| 155 | + |
| 156 | +## Vertical Pod Autoscaler (VPA): Let Kubernetes Learn |
| 157 | + |
| 158 | +For workloads with variable resource needs, VPA observes historical usage and recommends — or automatically sets — requests. |
| 159 | + |
| 160 | +```yaml |
| 161 | +apiVersion: autoscaling.k8s.io/v1 |
| 162 | +kind: VerticalPodAutoscaler |
| 163 | +metadata: |
| 164 | + name: my-app-vpa |
| 165 | +spec: |
| 166 | + targetRef: |
| 167 | + apiVersion: "apps/v1" |
| 168 | + kind: Deployment |
| 169 | + name: my-app |
| 170 | + updatePolicy: |
| 171 | + updateMode: "Off" # "Off" = recommendations only; "Auto" = live updates |
| 172 | + resourcePolicy: |
| 173 | + containerPolicies: |
| 174 | + - containerName: "*" |
| 175 | + minAllowed: |
| 176 | + cpu: 50m |
| 177 | + memory: 64Mi |
| 178 | + maxAllowed: |
| 179 | + cpu: "2" |
| 180 | + memory: 2Gi |
| 181 | +``` |
| 182 | + |
| 183 | +Start with `updateMode: "Off"` and review recommendations with: |
| 184 | +
|
| 185 | +```bash |
| 186 | +kubectl describe vpa my-app-vpa |
| 187 | +``` |
| 188 | + |
| 189 | +Avoid running VPA in `Auto` mode alongside HPA on CPU/memory — they conflict. Use HPA on custom metrics (e.g., RPS from KEDA) and VPA for request/limit tuning. |
| 190 | + |
| 191 | +--- |
| 192 | + |
| 193 | +## Diagnosing OOMKilled |
| 194 | + |
| 195 | +```bash |
| 196 | +# Find OOMKilled containers in the last hour |
| 197 | +kubectl get events --field-selector=reason=OOMKilling -A --sort-by='.lastTimestamp' |
| 198 | + |
| 199 | +# Check restart count and last termination reason |
| 200 | +kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Last State" |
| 201 | + |
| 202 | +# Full container restart history |
| 203 | +kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[*].restartCount}' |
| 204 | +``` |
| 205 | + |
| 206 | +When you see OOMKilled: |
| 207 | +1. Check if `limits.memory` is too low vs actual working set |
| 208 | +2. Check for memory leaks (steady upward trend in `container_memory_working_set_bytes`) |
| 209 | +3. Increase limit, fix the leak, or both |
| 210 | + |
| 211 | +--- |
| 212 | + |
| 213 | +## Production Checklist |
| 214 | + |
| 215 | +- [ ] All production containers have explicit `requests` and `limits` |
| 216 | +- [ ] `LimitRange` enforced in every namespace (reject missing-limit pods) |
| 217 | +- [ ] `ResourceQuota` set per namespace to cap blast radius |
| 218 | +- [ ] Memory requests == memory limits for stateful or latency-sensitive workloads |
| 219 | +- [ ] CPU limits >= 2× CPU requests to accommodate burst |
| 220 | +- [ ] VPA installed in `Off` mode; recommendations reviewed monthly |
| 221 | +- [ ] Prometheus recording rules for p95/p99 CPU and memory per container |
| 222 | +- [ ] Alerting on `OOMKilled` and sustained CPU throttle ratio > 25% |
| 223 | + |
| 224 | +Resource configuration is unglamorous tuning work, but it's the difference between a cluster that "usually works" and one that handles traffic spikes, node failures, and 3 AM incidents with composure. |
0 commit comments