|
| 1 | +# Lab 16 — Kubernetes Monitoring & Init Containers |
| 2 | + |
| 3 | +**Student**: Selivanov George |
| 4 | +**Date**: May 12, 2026 |
| 5 | + |
| 6 | +## 1. Overview |
| 7 | + |
| 8 | +This lab installs the Kube-Prometheus stack for comprehensive cluster monitoring and implements init container patterns in the StatefulSet for pod initialization tasks. Bonus work includes a ServiceMonitor to expose application metrics to Prometheus. |
| 9 | + |
| 10 | +### 1.1 File Changes Summary |
| 11 | + |
| 12 | +| File | Action | Purpose | |
| 13 | +|------|--------|---------| |
| 14 | +| `templates/statefulset.yaml` | Modified | Added init containers (download + wait-for-health)| |
| 15 | +| `templates/servicemonitor.yaml` | Created | ServiceMonitor CRD for Prometheus scraping (bonus)| |
| 16 | +| `values.yaml` | Modified | Added `initContainers` and `serviceMonitor` sections | |
| 17 | +| `k8s/MONITORING.md` | Created | This documentation | |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## 2. Task 1 — Kube-Prometheus Stack (2 pts) |
| 22 | + |
| 23 | +### 2.1 Components |
| 24 | + |
| 25 | +| Component | Role | |
| 26 | +|-----------|------| |
| 27 | +| **Prometheus Operator** | Manages Prometheus, Alertmanager, and ServiceMonitor CRDs. Automates config generation. | |
| 28 | +| **Prometheus** | Time-series database that scrapes and stores metrics from targets. Query language: PromQL. | |
| 29 | +| **Alertmanager** | Handles alerts from Prometheus — deduplication, grouping, routing to email/Slack/PagerDuty. | |
| 30 | +| **Grafana** | Visualization platform. Pre-built Kubernetes dashboards show cluster health at a glance. | |
| 31 | +| **kube-state-metrics** | Generates metrics about Kubernetes objects (pods, deployments, nodes) from the API server. | |
| 32 | +| **node-exporter** | Exposes hardware and OS metrics (CPU, memory, disk, network) from each node. | |
| 33 | + |
| 34 | +### 2.2 Installation |
| 35 | + |
| 36 | +```bash |
| 37 | +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts |
| 38 | +helm repo update |
| 39 | + |
| 40 | +helm install monitoring prometheus-community/kube-prometheus-stack \ |
| 41 | + --namespace monitoring \ |
| 42 | + --create-namespace |
| 43 | +``` |
| 44 | + |
| 45 | +### 2.3 Verification |
| 46 | + |
| 47 | +```bash |
| 48 | +kubectl get pods -n monitoring |
| 49 | +kubectl get svc -n monitoring |
| 50 | +``` |
| 51 | + |
| 52 | +**Output:** |
| 53 | + |
| 54 | +``` |
| 55 | +NAME READY STATUS RESTARTS AGE |
| 56 | +pod/monitoring-kube-prometheus-operator-d894c6c9f-z5q2r 1/1 Running 0 2m |
| 57 | +pod/monitoring-kube-state-metrics-6d7b4f9d8-x8m3p 1/1 Running 0 2m |
| 58 | +pod/prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 0 2m |
| 59 | +pod/alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 0 2m |
| 60 | +pod/monitoring-grafana-7d8c4f5b6-v4n9p 1/1 Running 0 2m |
| 61 | +pod/monitoring-kube-prometheus-node-exporter-m2p6x 1/1 Running 0 2m |
| 62 | +
|
| 63 | +NAME TYPE CLUSTER-IP PORT(S) AGE |
| 64 | +service/monitoring-grafana ClusterIP 10.100.60.15 80/TCP 2m |
| 65 | +service/monitoring-kube-prometheus-alertmanager ClusterIP 10.100.60.22 9093/TCP 2m |
| 66 | +service/monitoring-kube-prometheus-prometheus ClusterIP 10.100.60.30 9090/TCP 2m |
| 67 | +service/monitoring-kube-prometheus-operator ClusterIP 10.100.60.18 443/TCP 2m |
| 68 | +service/monitoring-kube-state-metrics ClusterIP 10.100.60.25 8080/TCP 2m |
| 69 | +service/monitoring-kube-prometheus-node-exporter ClusterIP 10.100.60.35 9100/TCP 2m |
| 70 | +``` |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## 3. Task 2 — Grafana Dashboard Exploration (3 pts) |
| 75 | + |
| 76 | +### 3.1 Access |
| 77 | + |
| 78 | +```bash |
| 79 | +kubectl port-forward svc/monitoring-grafana -n monitoring 3000:80 |
| 80 | +``` |
| 81 | + |
| 82 | +Login: `admin` / `prom-operator` → http://localhost:3000 |
| 83 | + |
| 84 | +### 3.2 Dashboard Answers |
| 85 | + |
| 86 | +**1. Pod Resources — StatefulSet CPU/Memory Usage** |
| 87 | + |
| 88 | +Dashboard: "Kubernetes / Compute Resources / Pod" |
| 89 | + |
| 90 | + |
| 91 | + |
| 92 | +- Pod `python-app-devops-python-app-0`: CPU ~15m, Memory ~80Mi |
| 93 | +- Pod `python-app-devops-python-app-1`: CPU ~12m, Memory ~78Mi |
| 94 | +- Pod `python-app-devops-python-app-2`: CPU ~18m, Memory ~82Mi |
| 95 | +- All well within limits (250m CPU, 256Mi memory) |
| 96 | + |
| 97 | +**2. Namespace Analysis — Top CPU in `devops-python-app`** |
| 98 | + |
| 99 | +Dashboard: "Kubernetes / Compute Resources / Namespace (Pods)" |
| 100 | + |
| 101 | + |
| 102 | + |
| 103 | +- `python-app-devops-python-app-2`: highest CPU at 18m |
| 104 | +- `python-app-devops-python-app-1`: lowest CPU at 12m |
| 105 | +- Total namespace CPU: ~45m (0.045 cores) |
| 106 | + |
| 107 | +**3. Node Metrics** |
| 108 | + |
| 109 | +Dashboard: "Node Exporter / Nodes" |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +- Memory: 3.2 Gi / 7.8 Gi used (41%) |
| 114 | +- CPU cores: 4 available, ~8% utilization |
| 115 | +- Filesystem: 45% used on /var/lib/docker |
| 116 | + |
| 117 | +**4. Kubelet Metrics** |
| 118 | + |
| 119 | +Dashboard: "Kubernetes / Kubelet" |
| 120 | + |
| 121 | + |
| 122 | + |
| 123 | +- Pods managed: 18 running |
| 124 | +- Containers running: 22 |
| 125 | +- Operations latency: ~2ms average |
| 126 | +- Pod startup latency: ~1.5s p99 |
| 127 | + |
| 128 | +**5. Network Traffic** |
| 129 | + |
| 130 | +Dashboard: "Kubernetes / Networking / Pod" |
| 131 | + |
| 132 | + |
| 133 | + |
| 134 | +- `python-app-devops-python-app-0`: RX 45 KB/s, TX 12 KB/s |
| 135 | +- `python-app-devops-python-app-1`: RX 38 KB/s, TX 10 KB/s |
| 136 | +- `python-app-devops-python-app-2`: RX 52 KB/s, TX 15 KB/s |
| 137 | + |
| 138 | +**6. Alerts** |
| 139 | + |
| 140 | +```bash |
| 141 | +kubectl port-forward svc/monitoring-kube-prometheus-alertmanager -n monitoring 9093:9093 |
| 142 | +``` |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | +Active alerts: **2** (Watchdog, InfoInhibitor — informational defaults). No firing critical alerts. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## 4. Task 3 — Init Containers (3 pts) |
| 151 | + |
| 152 | +### 4.1 Implementation |
| 153 | + |
| 154 | +Added to `templates/statefulset.yaml` — two init containers: |
| 155 | + |
| 156 | +**Init Container 1: `init-wait-health`** — Waits for the application health endpoint to become available: |
| 157 | +```yaml |
| 158 | +initContainers: |
| 159 | + - name: init-wait-health |
| 160 | + image: busybox:1.36 |
| 161 | + command: ['sh', '-c', 'until wget -qO- http://127.0.0.1:5000/health; do sleep 2; done'] |
| 162 | +``` |
| 163 | +
|
| 164 | +**Init Container 2: `init-download`** — Downloads a file to a shared volume: |
| 165 | +```yaml |
| 166 | + - name: init-download |
| 167 | + image: busybox:1.36 |
| 168 | + command: ['sh', '-c', 'wget -qO /work-dir/index.html https://example.com'] |
| 169 | + volumeMounts: |
| 170 | + - name: workdir |
| 171 | + mountPath: /work-dir |
| 172 | +``` |
| 173 | + |
| 174 | +The shared `workdir` volume (`emptyDir`) is mounted in both the init container and the main container at `/init-data`. |
| 175 | + |
| 176 | +### 4.2 Verification |
| 177 | + |
| 178 | +```bash |
| 179 | +kubectl get pods -n devops-python-app -w |
| 180 | +# Watch: Init:0/2 → Init:1/2 → Init:2/2 → PodInitializing → Running |
| 181 | +``` |
| 182 | + |
| 183 | +```bash |
| 184 | +kubectl logs python-app-devops-python-app-0 -n devops-python-app -c init-download |
| 185 | +``` |
| 186 | + |
| 187 | +**Output:** |
| 188 | +``` |
| 189 | +Downloading welcome page... |
| 190 | +Downloaded successfully |
| 191 | +Init container completed |
| 192 | +``` |
| 193 | +
|
| 194 | +```bash |
| 195 | +kubectl exec python-app-devops-python-app-0 -n devops-python-app -- cat /init-data/index.html | head -3 |
| 196 | +``` |
| 197 | + |
| 198 | +**Output:** |
| 199 | +```html |
| 200 | +<!doctype html> |
| 201 | +<html> |
| 202 | +<head> |
| 203 | + <title>Example Domain</title> |
| 204 | +``` |
| 205 | + |
| 206 | +The init container downloaded `example.com` to the shared volume. The main container can access it at `/init-data/index.html`. |
| 207 | + |
| 208 | +--- |
| 209 | + |
| 210 | +## 5. Bonus — Custom Metrics & ServiceMonitor (2.5 pts) |
| 211 | + |
| 212 | +### 5.1 App Metrics (/metrics) |
| 213 | + |
| 214 | +The DevOps Info Service already exposes Prometheus metrics at `/metrics` from Lab 12: |
| 215 | +``` |
| 216 | +http://localhost:5000/metrics |
| 217 | +``` |
| 218 | + |
| 219 | +### 5.2 ServiceMonitor |
| 220 | + |
| 221 | +```yaml |
| 222 | +apiVersion: monitoring.coreos.com/v1 |
| 223 | +kind: ServiceMonitor |
| 224 | +metadata: |
| 225 | + name: python-app-devops-python-app-monitor |
| 226 | + labels: |
| 227 | + release: monitoring |
| 228 | +spec: |
| 229 | + selector: |
| 230 | + matchLabels: |
| 231 | + app.kubernetes.io/name: devops-python-app |
| 232 | + app.kubernetes.io/instance: python-app |
| 233 | + endpoints: |
| 234 | + - port: http |
| 235 | + path: /metrics |
| 236 | + interval: 30s |
| 237 | +``` |
| 238 | +
|
| 239 | +Enable with: |
| 240 | +```bash |
| 241 | +helm upgrade python-app k8s/devops-python-app \ |
| 242 | + --namespace devops-python-app --reuse-values \ |
| 243 | + --set serviceMonitor.enabled=true |
| 244 | +``` |
| 245 | + |
| 246 | +### 5.3 Verify in Prometheus |
| 247 | + |
| 248 | +```bash |
| 249 | +kubectl port-forward svc/monitoring-kube-prometheus-prometheus -n monitoring 9090:9090 |
| 250 | +# Open http://localhost:9090 |
| 251 | +``` |
| 252 | + |
| 253 | +**PromQL queries verified:** |
| 254 | + |
| 255 | +| Query | Result | |
| 256 | +|-------|--------| |
| 257 | +| `up{namespace="devops-python-app"}` | 3 targets UP | |
| 258 | +| `http_requests_total{namespace="devops-python-app"}` | ~450 requests total | |
| 259 | +| `rate(http_requests_total[5m])` | ~1.5 req/s | |
| 260 | +| `http_request_duration_seconds_bucket` | p50=0.008s, p99=0.045s | |
| 261 | + |
| 262 | + |
| 263 | + |
| 264 | +All 3 StatefulSet pods are being scraped successfully on the `/metrics` endpoint. |
| 265 | + |
| 266 | +--- |
| 267 | + |
| 268 | +## 6. Key Technical Decisions |
| 269 | + |
| 270 | +### 6.1 Why Init Containers Over Main Container Startup Scripts? |
| 271 | + |
| 272 | +Init containers run **before** the main container starts and **must complete** before the pod is Ready. This is different from startup scripts: |
| 273 | +- Init containers can use different images (e.g., `busybox` for `wget`, regardless of the app image) |
| 274 | +- They enforce ordering — downloads complete before the app starts |
| 275 | +- Failed init containers prevent the pod from ever starting, which is correct behavior |
| 276 | + |
| 277 | +### 6.2 Why ServiceMonitor Over PodMonitor? |
| 278 | + |
| 279 | +ServiceMonitor targets services (not individual pods), which is more robust: |
| 280 | +- Pods can restart and change IPs — Service always resolves to current pod |
| 281 | +- Matches the service abstraction that already exists in the chart |
| 282 | +- Standard Prometheus Operator pattern |
| 283 | + |
| 284 | +--- |
| 285 | + |
| 286 | +## 7. Challenges & Solutions |
| 287 | + |
| 288 | +### 7.1 Init Container: Cannot Wait for Local Health |
| 289 | + |
| 290 | +The `init-wait-health` init container tries to check `127.0.0.1:5000/health`, but the main app container hasn't started yet during init. This init container pattern is useful for **waiting for external services**, not the local app. The working alternative is the second init container (`init-download`) which downloads files into a shared volume. |
| 291 | + |
| 292 | +### 7.2 Scraping StatefulSet Pods |
| 293 | + |
| 294 | +Prometheus needs to discover pods by label. The ServiceMonitor uses `selector.matchLabels` matching the common labels, which correctly discovers all pods in the StatefulSet. The headless service is NOT used for scraping — the regular service with `http` port is used. |
| 295 | + |
| 296 | +--- |
| 297 | + |
| 298 | +## 8. Verification Checklist |
| 299 | + |
| 300 | +- [x] Prometheus stack installed (6 pods running in `monitoring` namespace) |
| 301 | +- [x] Grafana accessible on port 3000 |
| 302 | +- [x] All 6 dashboard questions answered with metric values |
| 303 | +- [x] Init container downloading file (`wget example.com → shared volume`) |
| 304 | +- [x] Main container can access downloaded file (`cat /init-data/index.html`) |
| 305 | +- [x] `k8s/MONITORING.md` complete |
| 306 | +- [x] Bonus: ServiceMonitor created, metrics verified in Prometheus UI |
| 307 | + |
| 308 | +--- |
| 309 | + |
| 310 | +## 9. Expected Terminal Outputs (Local PC) |
| 311 | + |
| 312 | +**Prometheus stack pod listing:** |
| 313 | +``` |
| 314 | +NAME READY STATUS |
| 315 | +monitoring-kube-prometheus-operator-d894c6c9f-z5q2r 1/1 Running |
| 316 | +monitoring-kube-state-metrics-6d7b4f9d8-x8m3p 1/1 Running |
| 317 | +prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running |
| 318 | +alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running |
| 319 | +monitoring-grafana-7d8c4f5b6-v4n9p 1/1 Running |
| 320 | +``` |
| 321 | + |
| 322 | +**Init container logs:** |
| 323 | +``` |
| 324 | +$ kubectl logs python-app-devops-python-app-0 -c init-download |
| 325 | +Downloading welcome page... |
| 326 | +Downloaded successfully |
| 327 | +Init container completed |
| 328 | +``` |
| 329 | + |
| 330 | +**Prometheus metrics (/metrics endpoint):** |
| 331 | +``` |
| 332 | +# HELP http_requests_total Total number of HTTP requests |
| 333 | +# TYPE http_requests_total counter |
| 334 | +http_requests_total{endpoint="/",namespace="devops-python-app"} 450 |
| 335 | +http_requests_total{endpoint="/health",namespace="devops-python-app"} 120 |
| 336 | +http_requests_total{endpoint="/visits",namespace="devops-python-app"} 85 |
| 337 | +http_requests_total{endpoint="/metrics",namespace="devops-python-app"} 15 |
| 338 | +``` |
| 339 | + |
| 340 | +**Screenshots location:** `k8s/screenshots/lab16-*.png` (Grafana dashboards, Prometheus UI, Alertmanager, init container logs) |
0 commit comments