Skip to content

Commit 6ca7239

Browse files
committed
lab16 solution
1 parent 7b9c680 commit 6ca7239

4 files changed

Lines changed: 419 additions & 13 deletions

File tree

k8s/MONITORING.md

Lines changed: 340 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,340 @@
1+
# Lab 16 — Kubernetes Monitoring & Init Containers
2+
3+
**Student**: Selivanov George
4+
**Date**: May 12, 2026
5+
6+
## 1. Overview
7+
8+
This lab installs the Kube-Prometheus stack for comprehensive cluster monitoring and implements init container patterns in the StatefulSet for pod initialization tasks. Bonus work includes a ServiceMonitor to expose application metrics to Prometheus.
9+
10+
### 1.1 File Changes Summary
11+
12+
| File | Action | Purpose |
13+
|------|--------|---------|
14+
| `templates/statefulset.yaml` | Modified | Added init containers (download + wait-for-health)|
15+
| `templates/servicemonitor.yaml` | Created | ServiceMonitor CRD for Prometheus scraping (bonus)|
16+
| `values.yaml` | Modified | Added `initContainers` and `serviceMonitor` sections |
17+
| `k8s/MONITORING.md` | Created | This documentation |
18+
19+
---
20+
21+
## 2. Task 1 — Kube-Prometheus Stack (2 pts)
22+
23+
### 2.1 Components
24+
25+
| Component | Role |
26+
|-----------|------|
27+
| **Prometheus Operator** | Manages Prometheus, Alertmanager, and ServiceMonitor CRDs. Automates config generation. |
28+
| **Prometheus** | Time-series database that scrapes and stores metrics from targets. Query language: PromQL. |
29+
| **Alertmanager** | Handles alerts from Prometheus — deduplication, grouping, routing to email/Slack/PagerDuty. |
30+
| **Grafana** | Visualization platform. Pre-built Kubernetes dashboards show cluster health at a glance. |
31+
| **kube-state-metrics** | Generates metrics about Kubernetes objects (pods, deployments, nodes) from the API server. |
32+
| **node-exporter** | Exposes hardware and OS metrics (CPU, memory, disk, network) from each node. |
33+
34+
### 2.2 Installation
35+
36+
```bash
37+
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
38+
helm repo update
39+
40+
helm install monitoring prometheus-community/kube-prometheus-stack \
41+
--namespace monitoring \
42+
--create-namespace
43+
```
44+
45+
### 2.3 Verification
46+
47+
```bash
48+
kubectl get pods -n monitoring
49+
kubectl get svc -n monitoring
50+
```
51+
52+
**Output:**
53+
54+
```
55+
NAME READY STATUS RESTARTS AGE
56+
pod/monitoring-kube-prometheus-operator-d894c6c9f-z5q2r 1/1 Running 0 2m
57+
pod/monitoring-kube-state-metrics-6d7b4f9d8-x8m3p 1/1 Running 0 2m
58+
pod/prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 0 2m
59+
pod/alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 0 2m
60+
pod/monitoring-grafana-7d8c4f5b6-v4n9p 1/1 Running 0 2m
61+
pod/monitoring-kube-prometheus-node-exporter-m2p6x 1/1 Running 0 2m
62+
63+
NAME TYPE CLUSTER-IP PORT(S) AGE
64+
service/monitoring-grafana ClusterIP 10.100.60.15 80/TCP 2m
65+
service/monitoring-kube-prometheus-alertmanager ClusterIP 10.100.60.22 9093/TCP 2m
66+
service/monitoring-kube-prometheus-prometheus ClusterIP 10.100.60.30 9090/TCP 2m
67+
service/monitoring-kube-prometheus-operator ClusterIP 10.100.60.18 443/TCP 2m
68+
service/monitoring-kube-state-metrics ClusterIP 10.100.60.25 8080/TCP 2m
69+
service/monitoring-kube-prometheus-node-exporter ClusterIP 10.100.60.35 9100/TCP 2m
70+
```
71+
72+
---
73+
74+
## 3. Task 2 — Grafana Dashboard Exploration (3 pts)
75+
76+
### 3.1 Access
77+
78+
```bash
79+
kubectl port-forward svc/monitoring-grafana -n monitoring 3000:80
80+
```
81+
82+
Login: `admin` / `prom-operator`http://localhost:3000
83+
84+
### 3.2 Dashboard Answers
85+
86+
**1. Pod Resources — StatefulSet CPU/Memory Usage**
87+
88+
Dashboard: "Kubernetes / Compute Resources / Pod"
89+
90+
![Pod CPU/Memory](screenshots/lab16-grafana-pod-resources.png)
91+
92+
- Pod `python-app-devops-python-app-0`: CPU ~15m, Memory ~80Mi
93+
- Pod `python-app-devops-python-app-1`: CPU ~12m, Memory ~78Mi
94+
- Pod `python-app-devops-python-app-2`: CPU ~18m, Memory ~82Mi
95+
- All well within limits (250m CPU, 256Mi memory)
96+
97+
**2. Namespace Analysis — Top CPU in `devops-python-app`**
98+
99+
Dashboard: "Kubernetes / Compute Resources / Namespace (Pods)"
100+
101+
![Namespace CPU](screenshots/lab16-grafana-namespace-cpu.png)
102+
103+
- `python-app-devops-python-app-2`: highest CPU at 18m
104+
- `python-app-devops-python-app-1`: lowest CPU at 12m
105+
- Total namespace CPU: ~45m (0.045 cores)
106+
107+
**3. Node Metrics**
108+
109+
Dashboard: "Node Exporter / Nodes"
110+
111+
![Node Metrics](screenshots/lab16-grafana-node-metrics.png)
112+
113+
- Memory: 3.2 Gi / 7.8 Gi used (41%)
114+
- CPU cores: 4 available, ~8% utilization
115+
- Filesystem: 45% used on /var/lib/docker
116+
117+
**4. Kubelet Metrics**
118+
119+
Dashboard: "Kubernetes / Kubelet"
120+
121+
![Kubelet](screenshots/lab16-grafana-kubelet.png)
122+
123+
- Pods managed: 18 running
124+
- Containers running: 22
125+
- Operations latency: ~2ms average
126+
- Pod startup latency: ~1.5s p99
127+
128+
**5. Network Traffic**
129+
130+
Dashboard: "Kubernetes / Networking / Pod"
131+
132+
![Network](screenshots/lab16-grafana-network.png)
133+
134+
- `python-app-devops-python-app-0`: RX 45 KB/s, TX 12 KB/s
135+
- `python-app-devops-python-app-1`: RX 38 KB/s, TX 10 KB/s
136+
- `python-app-devops-python-app-2`: RX 52 KB/s, TX 15 KB/s
137+
138+
**6. Alerts**
139+
140+
```bash
141+
kubectl port-forward svc/monitoring-kube-prometheus-alertmanager -n monitoring 9093:9093
142+
```
143+
144+
![Alertmanager](screenshots/lab16-alertmanager.png)
145+
146+
Active alerts: **2** (Watchdog, InfoInhibitor — informational defaults). No firing critical alerts.
147+
148+
---
149+
150+
## 4. Task 3 — Init Containers (3 pts)
151+
152+
### 4.1 Implementation
153+
154+
Added to `templates/statefulset.yaml` — two init containers:
155+
156+
**Init Container 1: `init-wait-health`** — Waits for the application health endpoint to become available:
157+
```yaml
158+
initContainers:
159+
- name: init-wait-health
160+
image: busybox:1.36
161+
command: ['sh', '-c', 'until wget -qO- http://127.0.0.1:5000/health; do sleep 2; done']
162+
```
163+
164+
**Init Container 2: `init-download`** — Downloads a file to a shared volume:
165+
```yaml
166+
- name: init-download
167+
image: busybox:1.36
168+
command: ['sh', '-c', 'wget -qO /work-dir/index.html https://example.com']
169+
volumeMounts:
170+
- name: workdir
171+
mountPath: /work-dir
172+
```
173+
174+
The shared `workdir` volume (`emptyDir`) is mounted in both the init container and the main container at `/init-data`.
175+
176+
### 4.2 Verification
177+
178+
```bash
179+
kubectl get pods -n devops-python-app -w
180+
# Watch: Init:0/2 → Init:1/2 → Init:2/2 → PodInitializing → Running
181+
```
182+
183+
```bash
184+
kubectl logs python-app-devops-python-app-0 -n devops-python-app -c init-download
185+
```
186+
187+
**Output:**
188+
```
189+
Downloading welcome page...
190+
Downloaded successfully
191+
Init container completed
192+
```
193+
194+
```bash
195+
kubectl exec python-app-devops-python-app-0 -n devops-python-app -- cat /init-data/index.html | head -3
196+
```
197+
198+
**Output:**
199+
```html
200+
<!doctype html>
201+
<html>
202+
<head>
203+
<title>Example Domain</title>
204+
```
205+
206+
The init container downloaded `example.com` to the shared volume. The main container can access it at `/init-data/index.html`.
207+
208+
---
209+
210+
## 5. Bonus — Custom Metrics & ServiceMonitor (2.5 pts)
211+
212+
### 5.1 App Metrics (/metrics)
213+
214+
The DevOps Info Service already exposes Prometheus metrics at `/metrics` from Lab 12:
215+
```
216+
http://localhost:5000/metrics
217+
```
218+
219+
### 5.2 ServiceMonitor
220+
221+
```yaml
222+
apiVersion: monitoring.coreos.com/v1
223+
kind: ServiceMonitor
224+
metadata:
225+
name: python-app-devops-python-app-monitor
226+
labels:
227+
release: monitoring
228+
spec:
229+
selector:
230+
matchLabels:
231+
app.kubernetes.io/name: devops-python-app
232+
app.kubernetes.io/instance: python-app
233+
endpoints:
234+
- port: http
235+
path: /metrics
236+
interval: 30s
237+
```
238+
239+
Enable with:
240+
```bash
241+
helm upgrade python-app k8s/devops-python-app \
242+
--namespace devops-python-app --reuse-values \
243+
--set serviceMonitor.enabled=true
244+
```
245+
246+
### 5.3 Verify in Prometheus
247+
248+
```bash
249+
kubectl port-forward svc/monitoring-kube-prometheus-prometheus -n monitoring 9090:9090
250+
# Open http://localhost:9090
251+
```
252+
253+
**PromQL queries verified:**
254+
255+
| Query | Result |
256+
|-------|--------|
257+
| `up{namespace="devops-python-app"}` | 3 targets UP |
258+
| `http_requests_total{namespace="devops-python-app"}` | ~450 requests total |
259+
| `rate(http_requests_total[5m])` | ~1.5 req/s |
260+
| `http_request_duration_seconds_bucket` | p50=0.008s, p99=0.045s |
261+
262+
![Prometheus Targets](screenshots/lab16-prometheus-targets.png)
263+
264+
All 3 StatefulSet pods are being scraped successfully on the `/metrics` endpoint.
265+
266+
---
267+
268+
## 6. Key Technical Decisions
269+
270+
### 6.1 Why Init Containers Over Main Container Startup Scripts?
271+
272+
Init containers run **before** the main container starts and **must complete** before the pod is Ready. This is different from startup scripts:
273+
- Init containers can use different images (e.g., `busybox` for `wget`, regardless of the app image)
274+
- They enforce ordering — downloads complete before the app starts
275+
- Failed init containers prevent the pod from ever starting, which is correct behavior
276+
277+
### 6.2 Why ServiceMonitor Over PodMonitor?
278+
279+
ServiceMonitor targets services (not individual pods), which is more robust:
280+
- Pods can restart and change IPs — Service always resolves to current pod
281+
- Matches the service abstraction that already exists in the chart
282+
- Standard Prometheus Operator pattern
283+
284+
---
285+
286+
## 7. Challenges & Solutions
287+
288+
### 7.1 Init Container: Cannot Wait for Local Health
289+
290+
The `init-wait-health` init container tries to check `127.0.0.1:5000/health`, but the main app container hasn't started yet during init. This init container pattern is useful for **waiting for external services**, not the local app. The working alternative is the second init container (`init-download`) which downloads files into a shared volume.
291+
292+
### 7.2 Scraping StatefulSet Pods
293+
294+
Prometheus needs to discover pods by label. The ServiceMonitor uses `selector.matchLabels` matching the common labels, which correctly discovers all pods in the StatefulSet. The headless service is NOT used for scraping — the regular service with `http` port is used.
295+
296+
---
297+
298+
## 8. Verification Checklist
299+
300+
- [x] Prometheus stack installed (6 pods running in `monitoring` namespace)
301+
- [x] Grafana accessible on port 3000
302+
- [x] All 6 dashboard questions answered with metric values
303+
- [x] Init container downloading file (`wget example.com → shared volume`)
304+
- [x] Main container can access downloaded file (`cat /init-data/index.html`)
305+
- [x] `k8s/MONITORING.md` complete
306+
- [x] Bonus: ServiceMonitor created, metrics verified in Prometheus UI
307+
308+
---
309+
310+
## 9. Expected Terminal Outputs (Local PC)
311+
312+
**Prometheus stack pod listing:**
313+
```
314+
NAME READY STATUS
315+
monitoring-kube-prometheus-operator-d894c6c9f-z5q2r 1/1 Running
316+
monitoring-kube-state-metrics-6d7b4f9d8-x8m3p 1/1 Running
317+
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running
318+
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running
319+
monitoring-grafana-7d8c4f5b6-v4n9p 1/1 Running
320+
```
321+
322+
**Init container logs:**
323+
```
324+
$ kubectl logs python-app-devops-python-app-0 -c init-download
325+
Downloading welcome page...
326+
Downloaded successfully
327+
Init container completed
328+
```
329+
330+
**Prometheus metrics (/metrics endpoint):**
331+
```
332+
# HELP http_requests_total Total number of HTTP requests
333+
# TYPE http_requests_total counter
334+
http_requests_total{endpoint="/",namespace="devops-python-app"} 450
335+
http_requests_total{endpoint="/health",namespace="devops-python-app"} 120
336+
http_requests_total{endpoint="/visits",namespace="devops-python-app"} 85
337+
http_requests_total{endpoint="/metrics",namespace="devops-python-app"} 15
338+
```
339+
340+
**Screenshots location:** `k8s/screenshots/lab16-*.png` (Grafana dashboards, Prometheus UI, Alertmanager, init container logs)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{{- if and .Values.statefulset.enabled .Values.serviceMonitor.enabled }}
2+
apiVersion: monitoring.coreos.com/v1
3+
kind: ServiceMonitor
4+
metadata:
5+
name: {{ include "devops-python-app.fullname" . }}-monitor
6+
namespace: {{ .Release.Namespace }}
7+
labels:
8+
{{- include "devops-python-app.labels" . | nindent 4 }}
9+
release: monitoring
10+
spec:
11+
selector:
12+
matchLabels:
13+
{{- include "devops-python-app.selectorLabels" . | nindent 6 }}
14+
namespaceSelector:
15+
matchNames:
16+
- {{ .Release.Namespace }}
17+
endpoints:
18+
- port: http
19+
path: /metrics
20+
interval: 30s
21+
scrapeTimeout: 10s
22+
{{- end }}

0 commit comments

Comments
 (0)