Skip to content

Commit 53c2be6

Browse files
feat: rebrand to RelientOps + blog system + interactive diagrams + SEO
Rebrand: - Update SITE_NAME/SITE_URL to RelientOps in env.ts, .env.local, .env.example, and ci.yml (both build and deploy steps) - Dynamic brand interpolation in about/contact layout.tsx Phase 1 – Interactive diagram viewer: - src/components/ui/ZoomableDiagram.tsx: pan/zoom/fullscreen wrapper with mouse wheel, drag, touch pan, keyboard (Esc) and control buttons - Integrate ZoomableDiagram around MermaidDiagram in case study pages Phase 2 – Case study SEO: - Title format: '[Title] | RelientOps – DevOps & Cloud Consultant' - Added authors, keywords, twitter card metadata - JSON-LD Article + BreadcrumbList structured data on every slug page Phase 3 – Blog system: - src/types/blog.ts: BlogMeta + BlogPost interfaces - src/lib/blog.ts: getBlogPaths/getBlogPost/getAllBlogPosts/getAllBlogTags - content/blog/terraform-state-management.md: 7-min post - content/blog/kubernetes-resource-limits.md: 8-min post - src/app/blog/page.tsx: SSG listing page - src/app/blog/[slug]/page.tsx: SSG post page - src/app/blog/[slug]/layout.tsx: generateMetadata + JSON-LD BlogPosting - src/components/sections/BlogCard.tsx: card component - sitemap.ts: /blog + all blog slugs added - siteConfig.nav: /blog added between Case Studies and About Build: 19 static/SSG pages, zero type errors
1 parent e4de305 commit 53c2be6

File tree

17 files changed

+1179
-22
lines changed

17 files changed

+1179
-22
lines changed

.env.example

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@
55
# ─────────────────────────────────────────────────────────────────────────────
66

77
# ── Brand identity ────────────────────────────────────────────────────────────
8-
NEXT_PUBLIC_SITE_NAME=CloudForgeOps
8+
NEXT_PUBLIC_SITE_NAME=RelientOps
99
NEXT_PUBLIC_SITE_TAGLINE=Freelance DevOps & Cloud Engineering
1010
NEXT_PUBLIC_SITE_DESCRIPTION=I design, build, and operate cloud-native infrastructure. From Kubernetes to CI/CD pipelines, I help startups ship faster and stay reliable.
11-
NEXT_PUBLIC_SITE_URL=https://cloudforgeops.com
11+
NEXT_PUBLIC_SITE_URL=https://relientops.io
1212
NEXT_PUBLIC_SITE_OG_IMAGE=/images/og-default.png
1313

1414
# ── Contact ───────────────────────────────────────────────────────────────────
1515
NEXT_PUBLIC_CONTACT_EMAIL=sagardeepak2002@gmail.com
1616

1717
# ── Social links ──────────────────────────────────────────────────────────────
1818
NEXT_PUBLIC_SOCIAL_GITHUB=https://github.com/sagarDeepakDevOps
19-
NEXT_PUBLIC_SOCIAL_LINKEDIN=https://linkedin.com/company/cloudforgeops
19+
NEXT_PUBLIC_SOCIAL_LINKEDIN=https://linkedin.com/in/sagardeepak2002
2020

2121
# ── Owner / personal identity ─────────────────────────────────────────────────
2222
NEXT_PUBLIC_OWNER_NAME=Deepak Sagar

.github/workflows/ci.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ jobs:
3535
- name: Build
3636
run: npm run build
3737
env:
38-
NEXT_PUBLIC_SITE_NAME: CloudForgeOps
39-
NEXT_PUBLIC_SITE_URL: https://cloudforgeops.com
38+
NEXT_PUBLIC_SITE_NAME: RelientOps
39+
NEXT_PUBLIC_SITE_URL: https://relientops.io
4040
NEXT_PUBLIC_SITE_TAGLINE: "Freelance DevOps & Cloud Consulting"
4141
NEXT_PUBLIC_SITE_DESCRIPTION: "Production-grade cloud infrastructure, Kubernetes, and CI/CD — built by a senior DevOps engineer."
4242
NEXT_PUBLIC_SITE_OG_IMAGE: /og-image.png
@@ -74,10 +74,10 @@ jobs:
7474
working-directory: ./
7575
vercel-args: >-
7676
--prod
77-
--build-env NEXT_PUBLIC_SITE_NAME="CloudForgeOps"
77+
--build-env NEXT_PUBLIC_SITE_NAME="RelientOps"
7878
--build-env NEXT_PUBLIC_SITE_TAGLINE="Freelance DevOps & Cloud Engineering"
7979
--build-env NEXT_PUBLIC_SITE_DESCRIPTION="I design, build, and operate cloud-native infrastructure. From Kubernetes to CI/CD pipelines, I help startups ship faster and stay reliable."
80-
--build-env NEXT_PUBLIC_SITE_URL="https://cloudforgeops.vercel.app"
80+
--build-env NEXT_PUBLIC_SITE_URL="https://relientops.io"
8181
--build-env NEXT_PUBLIC_SITE_OG_IMAGE="/images/og-default.png"
8282
--build-env NEXT_PUBLIC_CONTACT_EMAIL="sagardeepak2002@gmail.com"
8383
--build-env NEXT_PUBLIC_SOCIAL_GITHUB="https://github.com/sagarDeepakDevOps"
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
---
2+
title: "Kubernetes Resource Limits: The Right Way to Keep Workloads Stable"
3+
slug: "kubernetes-resource-limits"
4+
date: "2024-12-03"
5+
excerpt: "Missing or misconfigured resource requests and limits are the leading cause of noisy-neighbour problems and OOMKilled containers. Here's how to set them correctly."
6+
tags:
7+
- Kubernetes
8+
- DevOps
9+
- Platform Engineering
10+
- Reliability
11+
readTime: 8
12+
metaDescription: "A hands-on guide to Kubernetes resource requests, limits, QoS classes, VPA, and LimitRange policies — with real-world recommendations for production clusters."
13+
---
14+
15+
## The Hidden Cost of Not Setting Limits
16+
17+
Every Kubernetes cluster without explicit resource requests and limits eventually ends up with the same symptoms: intermittent `OOMKilled` pods, nodes that mysteriously max out CPU at 3 AM, and HPA that refuses to scale because `metrics-server` can't get meaningful numbers.
18+
19+
Resource configuration isn't optional for production clusters. This guide walks through the concepts and practical patterns I use across client engagements.
20+
21+
---
22+
23+
## Requests vs Limits: The Mental Model
24+
25+
Two separate levers control how Kubernetes allocates resources:
26+
27+
| Field | What it does | When it matters |
28+
|---|---|---|
29+
| `requests` | Minimum guaranteed — scheduling decision | At pod scheduling time |
30+
| `limits` | Hard ceiling — enforced by cgroups | At runtime |
31+
32+
The scheduler places a pod on a node that has **at least** the sum of all containers' `requests` available. The `limits` then cap what each container can actually consume at any point in time.
33+
34+
```yaml
35+
resources:
36+
requests:
37+
cpu: "250m" # 0.25 vCPU guaranteed
38+
memory: "256Mi" # 256 MiB guaranteed
39+
limits:
40+
cpu: "1000m" # Max 1 vCPU
41+
memory: "512Mi" # Max 512 MiB — exceeding this → OOMKilled
42+
```
43+
44+
---
45+
46+
## QoS Classes and Why They Matter for Eviction
47+
48+
Kubernetes assigns each pod a **Quality of Service (QoS) class** based on its resource configuration. This class determines eviction priority when a node is under memory pressure:
49+
50+
| QoS Class | Condition | Eviction Priority |
51+
|---|---|---|
52+
| `Guaranteed` | requests == limits for all containers | Evicted last |
53+
| `Burstable` | At least one container has requests < limits | Evicted middle |
54+
| `BestEffort` | No requests or limits set | Evicted first |
55+
56+
**Recommendation:** Production workloads should be `Guaranteed` (requests == limits) for memory, and `Burstable` for CPU. Memory is incompressible — when a container exceeds its limit, it's killed. CPU over-limit just means throttling, which is recoverable.
57+
58+
```yaml
59+
# Guaranteed memory, Burstable CPU — common production pattern
60+
resources:
61+
requests:
62+
cpu: "500m"
63+
memory: "512Mi"
64+
limits:
65+
cpu: "2000m" # Allow CPU burst
66+
memory: "512Mi" # Lock memory — no surprises
67+
```
68+
69+
---
70+
71+
## Choosing the Right Values
72+
73+
Guessing at limits is dangerous. Set them too low and you get OOMKilled or CPU throttled processes; too high and you waste capacity. The right approach:
74+
75+
### Step 1: Run without limits under load
76+
77+
Deploy to staging without limits and observe actual consumption:
78+
79+
```bash
80+
kubectl top pods -n my-app --containers
81+
```
82+
83+
### Step 2: Review historical metrics
84+
85+
In Prometheus + Grafana:
86+
87+
```promql
88+
# 95th-percentile CPU usage over 7 days
89+
quantile_over_time(0.95, container_cpu_usage_seconds_total{namespace="my-app"}[7d])
90+
91+
# Peak memory usage per container
92+
max_over_time(container_memory_working_set_bytes{namespace="my-app"}[7d])
93+
```
94+
95+
### Step 3: Apply headroom
96+
97+
- **CPU requests:** p50 actual + 20% headroom
98+
- **CPU limits:** 2–4x the request (allow bursty processing)
99+
- **Memory requests:** p95 actual + 25% headroom
100+
- **Memory limits:** equal to requests (or p99 actual if you're confident in the ceiling)
101+
102+
---
103+
104+
## Cluster-Wide Safety Nets: LimitRange and ResourceQuota
105+
106+
Don't rely on every developer configuring resources correctly. Enforce defaults at the namespace level.
107+
108+
### LimitRange — per-pod defaults
109+
110+
```yaml
111+
apiVersion: v1
112+
kind: LimitRange
113+
metadata:
114+
name: default-limits
115+
namespace: my-app
116+
spec:
117+
limits:
118+
- type: Container
119+
default: # Applied when limits are missing
120+
cpu: "500m"
121+
memory: "256Mi"
122+
defaultRequest: # Applied when requests are missing
123+
cpu: "100m"
124+
memory: "128Mi"
125+
max:
126+
cpu: "4"
127+
memory: "4Gi"
128+
min:
129+
cpu: "50m"
130+
memory: "64Mi"
131+
```
132+
133+
A pod deployed without resource fields will inherit `defaultRequest` and `default`. Any pod that exceeds `max` is rejected at admission.
134+
135+
### ResourceQuota — namespace total cap
136+
137+
```yaml
138+
apiVersion: v1
139+
kind: ResourceQuota
140+
metadata:
141+
name: namespace-quota
142+
namespace: my-app
143+
spec:
144+
hard:
145+
requests.cpu: "10"
146+
requests.memory: "20Gi"
147+
limits.cpu: "20"
148+
limits.memory: "40Gi"
149+
pods: "50"
150+
```
151+
152+
This prevents a single misconfigured deployment from consuming all cluster capacity.
153+
154+
---
155+
156+
## Vertical Pod Autoscaler (VPA): Let Kubernetes Learn
157+
158+
For workloads with variable resource needs, VPA observes historical usage and recommends — or automatically sets — requests.
159+
160+
```yaml
161+
apiVersion: autoscaling.k8s.io/v1
162+
kind: VerticalPodAutoscaler
163+
metadata:
164+
name: my-app-vpa
165+
spec:
166+
targetRef:
167+
apiVersion: "apps/v1"
168+
kind: Deployment
169+
name: my-app
170+
updatePolicy:
171+
updateMode: "Off" # "Off" = recommendations only; "Auto" = live updates
172+
resourcePolicy:
173+
containerPolicies:
174+
- containerName: "*"
175+
minAllowed:
176+
cpu: 50m
177+
memory: 64Mi
178+
maxAllowed:
179+
cpu: "2"
180+
memory: 2Gi
181+
```
182+
183+
Start with `updateMode: "Off"` and review recommendations with:
184+
185+
```bash
186+
kubectl describe vpa my-app-vpa
187+
```
188+
189+
Avoid running VPA in `Auto` mode alongside HPA on CPU/memory — they conflict. Use HPA on custom metrics (e.g., RPS from KEDA) and VPA for request/limit tuning.
190+
191+
---
192+
193+
## Diagnosing OOMKilled
194+
195+
```bash
196+
# Find OOMKilled containers in the last hour
197+
kubectl get events --field-selector=reason=OOMKilling -A --sort-by='.lastTimestamp'
198+
199+
# Check restart count and last termination reason
200+
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Last State"
201+
202+
# Full container restart history
203+
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[*].restartCount}'
204+
```
205+
206+
When you see OOMKilled:
207+
1. Check if `limits.memory` is too low vs actual working set
208+
2. Check for memory leaks (steady upward trend in `container_memory_working_set_bytes`)
209+
3. Increase limit, fix the leak, or both
210+
211+
---
212+
213+
## Production Checklist
214+
215+
- [ ] All production containers have explicit `requests` and `limits`
216+
- [ ] `LimitRange` enforced in every namespace (reject missing-limit pods)
217+
- [ ] `ResourceQuota` set per namespace to cap blast radius
218+
- [ ] Memory requests == memory limits for stateful or latency-sensitive workloads
219+
- [ ] CPU limits >= 2× CPU requests to accommodate burst
220+
- [ ] VPA installed in `Off` mode; recommendations reviewed monthly
221+
- [ ] Prometheus recording rules for p95/p99 CPU and memory per container
222+
- [ ] Alerting on `OOMKilled` and sustained CPU throttle ratio > 25%
223+
224+
Resource configuration is unglamorous tuning work, but it's the difference between a cluster that "usually works" and one that handles traffic spikes, node failures, and 3 AM incidents with composure.

0 commit comments

Comments
 (0)