By the end of this lab, you'll be able to:
- Navigate Container Insights — Cluster, Nodes, Controllers, and Containers perspectives
- Interpret cluster performance metrics — CPU, memory, node count, and pod count charts
- Write KQL queries — to search container logs and diagnose issues from Log Analytics
- Set up log-based alerts — to trigger on error patterns in application logs
⏱️ Estimated Time: ~20 minutes
Before starting, ensure you have:
- Azure Portal access
kubectl(for validation commands)- Completed Lab 6.2 — Availability Tests
- AKS 1.33 cluster with Container Insights enabled (provisioned by Terraform in Lab 2)
- Log Analytics workspace connected to both the AKS cluster and Application Insights
AKS Cluster (1.33)
└── Container Insights add-on
├── Collects: container stdout/stderr logs
├── Collects: CPU/memory metrics per pod and node
└── Sends to: Log Analytics Workspace (devopsjourneyapr2026-law)
├── Table: ContainerLog / ContainerLogV2
├── Table: KubePodInventory
├── Table: KubeNodeInventory
└── Table: Perf (CPU/memory metrics)
💡 Container Insights is enabled by the
oms_agentadd-on in your Terraform AKS configuration (oms_agent { log_analytics_workspace_id = ... }). The same workspace also receives data from Application Insights — enabling cross-resource KQL queries.
-
🌐 Open the Azure Portal
Go to portal.azure.com → Kubernetes services →
devopsjourneyapr2026aks. -
📊 Open Insights
In the left pane → Monitoring → Insights.
Alternatively: Azure Monitor → Containers → select your cluster from the multi-cluster view.
-
🔍 Verify data is flowing
The default view shows four line charts — if data is present, Container Insights is working correctly.
The Cluster tab shows aggregated performance metrics for the entire cluster.
The four performance charts:
| Chart | Description | Percentile Filters |
|---|---|---|
| Node CPU utilization % | Aggregate CPU across all nodes | Avg, Min, 50th, 90th, 95th, Max |
| Node memory utilization % | Aggregate memory across all nodes | Avg, Min, 50th, 90th, 95th, Max |
| Node count | Total, Ready, Not Ready node counts | Total, Ready, Not Ready |
| Active pod count | Pod count by status | Total, Pending, Running, Unknown, Succeeded, Failed |
🔍 What to look for:
- CPU > 80% sustained → consider scaling up node pools or adding nodes
- Memory > 85% sustained → memory pressure, risk of OOM evictions
- Not Ready nodes > 0 → node health issue, investigate immediately
- Pending pods > 0 for extended time → scheduling failure (insufficient resources)
-
🖥️ Switch to Nodes tab
Click the Nodes tab at the top.
-
🔍 What to review:
- Expand any node to see the pods running on it
- Click a pod to see its CPU/memory usage over time
- Identify which pods are consuming the most resources on each node
-
📋 Useful insight: If your Flask app pod shows high memory growth over time, this may indicate a memory leak — correlate with Application Insights to find the root cause.
-
📦 Switch to Containers tab
Click the Containers tab.
-
🔍 Filter by namespace:
- Set Namespace filter to
thomasthorntoncloud - Find your Flask application container
- Set Namespace filter to
-
📋 Click the container name to open:
- Live container logs (last 1000 lines)
- CPU and memory charts for that specific container
💡 Live logs from this view use the Container Insights log stream — useful for quick debugging without
kubectl logs.
Container Insights stores all data in Log Analytics — you can query it with KQL.
-
🔍 Open Log Analytics
Azure Portal → Log Analytics workspaces →
devopsjourneyapr2026-law→ Logs. -
📋 Useful KQL queries:
Pod status over the last hour:
KubePodInventory | where TimeGenerated > ago(1h) | where Namespace == "thomasthorntoncloud" | summarize count() by PodStatus, bin(TimeGenerated, 5m) | render timechart
Application container logs (last 30 minutes):
ContainerLogV2 | where TimeGenerated > ago(30m) | where PodNamespace == "thomasthorntoncloud" | where ContainerName contains "devopsjourney" | project TimeGenerated, LogMessage, PodName | order by TimeGenerated desc
Pod restart count (identify crash-looping pods):
KubePodInventory | where TimeGenerated > ago(1h) | where Namespace == "thomasthorntoncloud" | where PodRestartCount > 0 | summarize max(PodRestartCount) by PodName, ContainerName | order by max_PodRestartCount desc
Node CPU utilization over time:
Perf | where TimeGenerated > ago(1h) | where ObjectName == "K8SNode" | where CounterName == "cpuUsageNanoCores" | summarize avg(CounterValue) by Computer, bin(TimeGenerated, 5m) | render timechart
Find OOMKilled containers:
KubePodInventory | where TimeGenerated > ago(24h) | where ContainerStatusReason == "OOMKilled" | project TimeGenerated, PodName, ContainerName, ContainerStatusReason | order by TimeGenerated desc
Alert when your application logs contain ERROR messages.
-
🔔 Create alert from Log Analytics
In Log Analytics → Logs → paste this query:
ContainerLogV2 | where TimeGenerated > ago(5m) | where PodNamespace == "thomasthorntoncloud" | where ContainerName contains "devopsjourney" | where LogMessage contains "ERROR" | count
-
➕ Click New alert rule
Setting Value Condition Greater than 0Frequency Every 5 minutesSeverity Sev 2 — Warning Action Email notification via Action Group -
💾 Save the alert rule
Container Insights checklist:
- Cluster tab shows CPU, memory, node count, and pod count charts with data
- Containers tab shows pods in
thomasthorntoncloudnamespace - Nodes tab shows all nodes in Ready state
Technical validation:
# Verify Container Insights add-on is running
kubectl get pods -n kube-system | grep -i "omsagent\|ama-"
# Expected: 1-2 omsagent or ama-logs pods in Running state
# Verify all nodes are Ready
kubectl get nodes
# Expected: All nodes STATUS = Ready
# Verify Flask app pods are Running in the correct namespace
kubectl get pods -n thomasthorntoncloud
# Expected: pods with STATUS = Running
# Check pod resource usage (requires metrics-server)
kubectl top pods -n thomasthorntoncloud
kubectl top nodes✅ Expected Output:
NAME STATUS ROLES AGE VERSION
aks-nodepool1-... Ready <none> 5d v1.33.x
NAME READY STATUS RESTARTS AGE
devopsjourney-app-<hash> 1/1 Running 0 2d
🔧 Troubleshooting (click to expand)
Common issues:
# Problem: Container Insights shows no data / "No data for selected time range"
# Solution 1: Check the OMS Agent / AMA add-on pods
kubectl get pods -n kube-system | grep -E "omsagent|ama-logs"
# Expected: Running pods; if not, the add-on may not be enabled
# Solution 2: Verify the AKS cluster has Container Insights enabled
az aks show --resource-group devopsjourneyapr2026 \
--name devopsjourneyapr2026aks \
--query "addonProfiles.omsAgent" -o json
# Expected: { "enabled": true, "config": { "logAnalyticsWorkspaceResourceID": "..." } }
# Problem: KQL query returns no results for ContainerLogV2
# Solution: Older clusters use ContainerLog (V1); try this table name instead
# ContainerLogV2 is the default for clusters created/upgraded after Dec 2022
# Problem: KubePodInventory table is empty
# Solution: Check if diagnostic settings are enabled on the AKS cluster
az monitor diagnostic-settings list \
--resource $(az aks show -g devopsjourneyapr2026 -n devopsjourneyapr2026aks --query id -o tsv) \
-o table
# Problem: Pod shows high memory but application seems normal
# Solution: Check if memory limit is set in the Kubernetes manifest
kubectl describe pod -n thomasthorntoncloud <pod-name> | grep -A5 "Limits\|Requests"- Container Insights is for infrastructure; Application Insights is for application code — together they give you the full picture. Container Insights shows CPU/memory/pod health; Application Insights shows HTTP traces, exceptions, and user behaviour. The shared Log Analytics workspace lets you join both datasets in a single KQL query.
KubePodInventoryis historical —kubectl get podsonly shows current state. If a pod crashed and restarted hours ago,kubectlshows Running;KubePodInventoryshows the full lifecycle including crash timestamps andContainerStatusReason. Essential for post-incident analysis.OOMKilledmeans the container exceeded its memory limit and was forcibly terminated by the OS. Fix by increasing the memory limit in the deployment manifest, profiling the app for leaks, or scaling horizontally.- Sharing a Log Analytics workspace enables cross-resource KQL queries — you can correlate application errors (from Application Insights:
requests/exceptions) with pod crashes (from Container Insights:KubePodInventory) in a single query. This is a key advantage of workspace-based Application Insights over classic.
Congratulations 🎉 — you've completed the full DevOps Journey Using Azure DevOps lab series!
You now have a production-grade pipeline that provisions infrastructure with Terraform, builds and pushes container images to ACR, deploys to AKS with zero-touch CI/CD, and monitors everything with Application Insights and Container Insights.
Lab series summary:
- Set up Azure DevOps with Workload Identity Federation
- Provisioned AKS infrastructure with Terraform
- Created AD admin group for AKS RBAC
- Built and pushed the Flask app to ACR
- Configured Key Vault with App Insights connection string
- Deployed the app to AKS with the full pipeline
- Implemented CI/CD with branch triggers
- Monitored the app with Application Insights
- Configured availability tests
- Analysed container metrics with Container Insights and KQL
← Back to Lab 6.2 | Return to Course Home →