Name	Name	Last commit message	Last commit date
parent directory ..
artifacts	artifacts
configs/nginx	configs/nginx
runbooks	runbooks
scripts	scripts
web/incident-app	web/incident-app
README.md	README.md
commands.sh	commands.sh
interview_qna.md	interview_qna.md
output.txt	output.txt
troubleshooting.md	troubleshooting.md

🧪 Lab 24: Simulated Incident Drill

📝 Lab Summary

This lab focused on simulating operational incidents and responding with repeatable troubleshooting, documentation, and recovery workflows. It combined service validation, failure injection, runbook usage, and post-incident reporting in a realistic Linux environment.

🎯 Objectives

Practice structured incident response procedures
Trigger and diagnose common system failures
Follow runbook procedures for incident resolution
Document incident response activities professionally
Develop muscle memory for troubleshooting workflows

📌 Prerequisites

Basic Linux command line proficiency
Understanding of system services and processes
Familiarity with log file locations
Basic scripting knowledge (Bash or Python)
Understanding of web servers and databases

🖥️ Lab Environment

Platform: Ubuntu 24.04 LTS cloud lab environment
User: toor
Host: ip-172-31-10-184
Shell: Bash

🛠️ Task Overview

Task 1: Create Incident Scenarios and Runbooks

Set Up Baseline Services
Create a simple web application
Create Incident Trigger Scripts
Create script to simulate disk space issue
Create script to simulate service crash
Create script to simulate high CPU load

Task 2: Create Incident Scenarios and Runbooks

Create Runbook Templates
Create master runbook
Create specific runbook for disk space
Create runbook for service failures

Task 3: Execute Incident Response Drill

Create Incident Documentation System
Create Drill Execution Script
Practice Full Incident Response
Follow runbook
Resolve incident
Post-incident verification

📁 Repository Structure

lab24-simulated-incident-drill/
└── README.md
└── artifacts/
    └── drill-checklist.txt
└── commands.sh
└── configs/
    └── nginx/
        └── incident-app.conf
└── interview_qna.md
└── output.txt
└── runbooks/
    └── post-incident-template.md
    └── runbook-disk-space.md
    └── runbook-service-failure.md
    └── runbook-template.md
└── scripts/
    └── execute-drill.sh
    └── incident_logger.py
    └── trigger-cpu-spike.sh
    └── trigger-disk-full.sh
    └── trigger-service-crash.sh
└── troubleshooting.md
└── web/
    └── incident-app/
        └── index.html

✅ Verification & Validation

Confirmed the environment and toolchain were installed correctly
Validated the core workflow with command execution and captured outputs
Preserved scripts, configuration files, and supporting artifacts used during the lab
Documented common failure paths and remediation steps in the troubleshooting guide

📚 What I Learned

How to prepare incident runbooks before failures happen
How to document baseline, detection, remediation, and verification clearly
How to restore service safely after intentional failure injection
Why post-incident validation matters as much as the initial fix

🌍 Why This Matters

Structured incident drills build operational confidence before real outages happen and reduce response friction when services fail unexpectedly.

🚀 Real-World Applications

Incident response drills
Production service recovery
Ops runbook design
Post-incident review workflows

🔎 Real-World Relevance

The workflow in this lab maps well to practical cloud, DevOps, software assurance, and security operations responsibilities where repeatable procedures and evidence-backed validation matter.

✅ Result

A complete simulated incident workflow was documented, executed, and verified successfully with the service scenario restored to a healthy state.

🏁 Conclusion

You have successfully created and executed a simulated incident response drill. In this walkthrough, you practiced:

setting up realistic incident scenarios
creating and following structured runbooks
documenting incident response activities
executing investigation and resolution procedures
performing post-incident analysis

These are the core DevOps and operations skills the lab is intended to build: structured troubleshooting, disciplined documentation, and recovery validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

🧪 Lab 24: Simulated Incident Drill

📝 Lab Summary

🎯 Objectives

📌 Prerequisites

🖥️ Lab Environment

🛠️ Task Overview

Task 1: Create Incident Scenarios and Runbooks

Task 2: Create Incident Scenarios and Runbooks

Task 3: Execute Incident Response Drill

📁 Repository Structure

✅ Verification & Validation

📚 What I Learned

🌍 Why This Matters

🚀 Real-World Applications

🔎 Real-World Relevance

✅ Result

🏁 Conclusion

FilesExpand file tree

lab24-simulated-incident-drill

Directory actions

More options

Directory actions

More options

Latest commit

History

lab24-simulated-incident-drill

Folders and files

parent directory

README.md

🧪 Lab 24: Simulated Incident Drill

📝 Lab Summary

🎯 Objectives

📌 Prerequisites

🖥️ Lab Environment

🛠️ Task Overview

Task 1: Create Incident Scenarios and Runbooks

Task 2: Create Incident Scenarios and Runbooks

Task 3: Execute Incident Response Drill

📁 Repository Structure

✅ Verification & Validation

📚 What I Learned

🌍 Why This Matters

🚀 Real-World Applications

🔎 Real-World Relevance

✅ Result

🏁 Conclusion