Flask REST API — End-to-End DevOps Project

Production-grade DevOps reference architecture built around a Flask + PostgreSQL REST API. Covers the full lifecycle from local development to production-style Kubernetes orchestration with GitOps and observability.

Audience: This documentation is structured for engineers preparing for 3-5 year DevOps / SRE interviews. Each module covers the implementation, the deep concepts behind it, troubleshooting from real issues we hit, interview Q&A, STAR stories, and how it maps to the cloud.

High-Level Architecture

                                ┌──────────────────────┐
                                │   Developer pushes   │
                                │   code to GitHub     │
                                └──────────┬───────────┘
                                           │
                                           ▼
   ┌────────────────────── CI Pipeline (GitHub Actions) ───────────────--───────┐
   │                                                                            │
   │  build job:                                                                │
   │   • Run unit tests (pytest)                                                │
   │   • Build Docker image                                                     │
   │   • Push to DockerHub (tagged with commit SHA)                             │
   │                                                                            │
   │  update-helm job:                                                          │
   │   • sed updates helm/application/values.yaml with new image tag            │
   │   • Commits & pushes to main branch                                        │
   │                                                                            │
   └────────────────────────────────────┬────────────────────────────────-──────┘
                                        │ git push main
                                        ▼
   ┌─────────────────── ArgoCD (GitOps Controller in K8s) ────────────────--────┐
   │                                                                            │
   │  Detects values.yaml diff → renders Helm chart → applies new manifests     │
   │  → Kubernetes does rolling update                                          │
   │                                                                            │
   └────────────────────────────────────┬───────────────────────────────-───────┘
                                        │
                                        ▼
   ┌──────────────────── 3-Node Minikube Cluster (Production-like) ───────────┐
   │                                                                          │
   │  ┌─────────────────┐  ┌─────────────────┐  ┌──────────────────────────┐  │
   │  │ App Tier        │  │ Database Tier   │  │ Dependent Services Tier  │  │
   │  │ (minikube)      │  │ (minikube-m02)  │  │ (minikube-m03)           │  │
   │  │                 │  │                 │  │                          │  │
   │  │ • Flask API ×3  │  │ • Postgres      │  │ • Vault                  │  │
   │  │                 │  │                 │  │ • External Secrets Op    │  │
   │  │                 │  │                 │  │ • Prometheus + AM        │  │
   │  │                 │  │                 │  │ • Grafana                │  │
   │  │                 │  │                 │  │ • Loki                   │  │
   │  │                 │  │                 │  │ • Promtail (DS)          │  │
   │  │                 │  │                 │  │ • Postgres exporter      │  │
   │  │                 │  │                 │  │ • Blackbox exporter      │  │
   │  └─────────────────┘  └─────────────────┘  └──────────────────────────┘  │
   │                                                                          │
   └────────────────────────────────────┬─────────────────────────────────────┘
                                        │ Slack alerts
                                        ▼
                              ┌───────────────────┐
                              │   #alerts channel │
                              └───────────────────┘

Tech Stack

Layer	Tools
Application	Flask 3 + SQLAlchemy + Flask-Migrate + Gunicorn + PostgreSQL 15
Testing	pytest (unit) + Locust (load)
Containerization	Docker (multi-stage build) + Docker Compose (local stack) + nginx (reverse proxy)
CI	GitHub Actions on a self-hosted runner; SHA-based image tagging; auto-update Helm values
IaC	Terraform (AWS VPC, EC2, ALB) + Ansible (system bootstrapping) — written, not deployed
Orchestration	Kubernetes via Minikube (3-node cluster mimicking multi-AZ)
Secrets	HashiCorp Vault + External Secrets Operator (ESO)
Packaging	Helm 3 charts for every component
GitOps	ArgoCD with App-of-Apps pattern + multi-source pattern for upstream charts
Observability	Prometheus + Grafana + Loki + Promtail + Alertmanager + exporters
Alerting	Alertmanager → Slack via Incoming Webhooks

Module Index

The documentation is structured as a curriculum. Read in order for the full picture, or jump to the topic you need.

Module 1 — Local Application Setup

Goal: Get the Flask API running locally with venv + Postgres + migrations + seed data.

Tech stack & architecture
Step-by-step walkthrough with the why for each step
12 interview Q&A on Python venvs, WSGI, migrations, secrets, connection pooling
2 STAR stories — moving the project broke the venv, AirPlay port conflict
Production hardening + AWS mapping

Module 1B — Application Testing

Goal: Unit tests with pytest + in-memory SQLite, load tests with Locust.

The test pyramid + why in-memory SQLite for unit tests
pytest fixture pattern + setup/teardown
Locust scenarios + headless CI mode
14 interview Q&A on test pyramid, RED method, contract testing, load test interpretation
2 STAR stories — duplicated Prometheus registry breaking tests, finding the throughput limit

Module 2 — Containerization (Docker + Compose)

Goal: Package the app as a Docker image; orchestrate the multi-service stack with Compose.

Multi-stage Dockerfile (build vs main; image size 80 MB vs 400 MB)
Layer caching, EXPOSE vs port mapping, CMD vs ENTRYPOINT
Compose deep dive: networking, healthchecks, depends_on, volumes
14 troubleshooting issues — including the famous 127.0.0.1 Gunicorn binding bug
20 interview Q&A — containers vs VMs, layers, distroless, signal handling
3 STAR stories — debugging container networking, image optimization, port conflicts

Module 3 — CI Pipeline with GitHub Actions

Goal: On every push, run tests → build image → push to DockerHub → update Helm values in main.

Self-hosted vs GitHub-hosted runners (when to use which)
Pipeline walkthrough — build and update-helm jobs
Cross-platform sed, GH_PAT scopes, secret management
The CI → GitOps handoff
14 troubleshooting issues — setup-python permission errors, push protection, branch confusion
20 interview Q&A — CI vs CD, OIDC, matrix builds, blue/green
3 STAR stories — setup-python mac issue, Slack webhook leak, CI pushing to wrong branch

Module 4 — Infrastructure as Code (Terraform & Ansible)

Goal: Provision AWS infra (VPC, subnets, NAT, ALB, EC2) with Terraform; configure machines with Ansible.

for_each vs count (with the index-shifting trap)
State management — local vs S3 + DynamoDB locking
Drift detection (apply -refresh-only vs apply)
Modules, workspaces, backends
24 deep Terraform troubleshooting scenarios — state lock recovery, drift, RDS replacement traps, EIP costs, rate limits
4 production scenario deep-dives — manually deleted IAM role, CloudFormation migration, leaked tfstate, concurrent applies
32 interview Q&A across Terraform + Ansible
3 STAR stories — state recovery via S3 versioning, RDS rename trap, $4K/mo cost cleanup

Module 5 — Kubernetes Orchestration

Goal: Deploy Vault, ESO, Postgres, Flask onto a 3-node minikube cluster.

3-node architecture with workload-to-node placement (type=application/database/dependent_services)
Vault deployment, init/unseal flow, KV-v2
ESO architecture + setup + force-sync pattern
Deep concepts (the bulk of the doc):
- Networking & CoreDNS — full query flow, ndots:5, Service types, kube-proxy modes
- Storage — PV/PVC/StorageClass, access modes, reclaim policies
- Workloads — Deployment vs StatefulSet vs DaemonSet
- Probes — liveness vs readiness vs startup
- Rollouts & rollbacks — RollingUpdate vs Recreate, maxSurge math
- Autoscaling — HPA + VPA + Cluster Autoscaler + KEDA with full YAMLs
- NetworkPolicies (with DNS gotcha)
- RBAC — Role vs ClusterRole
- Operators & CRDs — ESO walkthrough as the canonical example
~50 interview Q&A across architecture / networking / storage / workloads / probes / autoscaling / secrets / operators / scenarios
4 STAR stories — pod-to-pod debug, stuck namespace, PVC permissions, HPA implementation

Module 6 — GitOps with Helm + ArgoCD

Goal: Package K8s manifests as Helm charts; deploy via ArgoCD using the App-of-Apps pattern.

Why GitOps (push vs pull)
Helm deep dive — Chart.yaml, templates, hooks, helpers, sub-charts
ArgoCD deep dive — Application CRD, sync policies, App-of-Apps, multi-source pattern, sync waves
The full CI → GitOps → Deploy loop
14 troubleshooting issues — CRD version mismatches, ConfigMap-doesn't-restart, sync errors
35 interview Q&A — GitOps principles, Helm internals, ArgoCD architecture, AppProjects, ApplicationSet
4 STAR stories — adopting GitOps, ConfigMap checksum trick, CRD version mismatch, selfHeal saving the day

Module 7 — Observability (Prometheus + Grafana + Loki + Alertmanager)

Goal: Build a full observability layer with metrics, logs, dashboards, and Slack alerts.

Three pillars (metrics, logs, traces); USE & RED methods
Component-by-component setup
Application instrumentation (prometheus-flask-exporter)
10 alert rules with severity + USE/RED classification
Pre-loaded Grafana dashboards (5 community dashboards)
Slack integration — using slack_api_url_file to keep webhook out of Git
18 troubleshooting issues — PVC permission fixes, schema mismatches, cardinality issues
30+ interview Q&A across SLI/SLO/SLA, USE/RED, Prometheus internals, Loki vs ELK, Grafana, real scenarios
4 STAR stories — Prometheus permission debug, Slack webhook leak, true-positive alert, observability from scratch

Why This Order Matters

1. Local setup           → understand the app
   ↓
2. Containerize          → make it portable
   ↓
3. CI pipeline           → automate build + test + push
   ↓
4. IaC                   → provision infra reproducibly
   ↓
5. Kubernetes            → run it at scale
   ↓
6. GitOps                → declarative, audited deployments
   ↓
7. Observability         → see what's happening in production

Each layer depends on the previous. The CI pipeline (3) makes sense because we can build a container (2) of the app (1). Kubernetes (5) is meaningful because we have a CI artifact (3). GitOps (6) governs Kubernetes (5). Observability (7) closes the loop — you can finally see what your fully-automated, fully-orchestrated system is doing in real time.

Quick Start

Prerequisites: macOS / Linux, Python 3.10+, Docker Desktop, kubectl, helm, minikube, brew (for installs).

1. Local app

git clone https://github.com/akhil27051999/Flask-REST-API.git
cd Flask-REST-API
python3 -m venv venv && source venv/bin/activate
pip install -r app/requirements.txt
# Configure .env (see Module 1) and run:
flask db upgrade
python app/seed.py
flask run

2. Containerized stack (Docker Compose)

export ENV_FILE=.env
docker compose up -d --build
docker exec flask-app-container flask db upgrade --directory app/migrations
docker exec -e PYTHONPATH=/api flask-app-container python /api/app/seed.py
curl http://localhost/students/3

3. Kubernetes stack

# Cluster
minikube start --nodes=3 --driver=docker --cpus=2 --memory=2048
kubectl label node minikube       type=application --overwrite
kubectl label node minikube-m02   type=database --overwrite
kubectl label node minikube-m03   type=dependent_services --overwrite

# Install ArgoCD
helm repo add argo https://argoproj.github.io/argo-helm
helm install argocd argo/argo-cd -n argocd --create-namespace

# Bootstrap everything via App-of-Apps
kubectl apply -f argocd/root-app.yaml

# Manual bootstrap steps (Vault unseal, vault-token secret) — see Module 5

4. Trigger the GitOps loop

Edit helm/application/values.yaml (e.g., bump replicas), commit, push:

git add helm/application/values.yaml
git commit -m "scale flask-api to 3"
git push origin main
# ArgoCD picks it up within 3 min — or trigger immediate sync:
kubectl patch application flask-api -n argocd --type merge \
  -p '{"operation":{"sync":{"revision":"main"}}}'

Repository Layout

Flask-REST-API/
├── app/                    # Flask source code + Dockerfile + requirements.txt + migrations
├── tests/                  # pytest unit tests + Locust load tests
├── nginx/                  # nginx reverse proxy config + Dockerfile
├── docker-compose.yaml     # Local multi-service stack
├── .github/workflows/      # CI pipeline
├── terraform/              # AWS infrastructure (VPC, EC2, ALB, etc.)
├── ansible/                # Configuration management for VMs
├── k8s/                    # Raw K8s manifests (legacy/reference; see helm/ for current)
├── helm/                   # Helm charts for every component
│   ├── application/        # Flask app
│   ├── vault/              # HashiCorp Vault
│   ├── external-secrets/   # ESO + custom resources
│   ├── database/           # PostgreSQL
│   ├── prometheus/         # Prometheus + Alertmanager
│   ├── grafana/            # Grafana
│   ├── loki/               # Loki
│   ├── promtail/           # Promtail
│   ├── postgres-exporter/  # Postgres metrics exporter
│   └── blackbox-exporter/  # HTTP probe exporter
├── argocd/                 # ArgoCD Applications
│   ├── root-app.yaml       # The App-of-Apps that manages everything
│   ├── vault.yaml
│   ├── external-secrets.yaml
│   ├── database.yaml
│   ├── application.yaml
│   └── observability-*.yaml
└── docs/                   # This documentation (modules + images)

Skills Demonstrated

By building this project end-to-end, you've practiced every tool a 3-5 yr DevOps/SRE role expects:

✅ Python web app with proper structure, migrations, testing
✅ Multi-stage Dockerfile with layer caching, alpine base, non-root patterns
✅ Docker Compose for local multi-service development
✅ CI/CD with GitHub Actions (self-hosted runner) → DockerHub → automatic Helm updates
✅ Terraform for AWS provisioning (VPC, subnets, NAT, SGs, ALB, EC2) with state, modules, lifecycle
✅ Ansible for configuration management (idempotent, role-based pattern)
✅ Kubernetes — multi-node cluster, node labels, all major workload types, networking, storage, RBAC, autoscaling
✅ HashiCorp Vault — initialization, unsealing, KV secrets engine
✅ External Secrets Operator — bridging Vault and K8s native Secrets
✅ Helm — chart structure, templating, hooks, releases
✅ ArgoCD — Applications, App-of-Apps, multi-source, sync policies, selfHeal
✅ Observability stack — Prometheus, Grafana, Loki, Promtail, Alertmanager, exporters
✅ PromQL + LogQL for queries
✅ Slack alerting with proper secret handling
✅ GitOps workflows — pull-based deploys, drift detection, rollbacks via git revert
✅ Real production troubleshooting — Gunicorn binding, fsGroup permissions, push protection, sync errors, state locking

Interview Prep Checklist

For each module, you should be able to:

Explain the architecture — what it does and why it's structured that way
Walk through one debugging story (use the STAR stories as templates)
Answer 5+ deep questions on the topic from memory
Sketch the data flow on a whiteboard
Discuss production hardening — what would change at scale
Map to cloud equivalents (AWS / GCP / Azure)

Contributing / Extending

This project is a learning + interview prep artifact. Suggested extensions to deepen further:

Add distributed tracing (Jaeger / Tempo + OpenTelemetry) for the third pillar
Add service mesh (Istio / Linkerd) for mTLS + traffic policies
Add chaos engineering (Litmus / Chaos Mesh) — kill pods during load tests
Migrate Postgres from Deployment → StatefulSet with HA replication
Add policy as code with OPA Gatekeeper / Kyverno
Add cert-manager + Ingress with TLS
Implement canary deployments with Argo Rollouts

License

MIT (or your license of choice).

Author

Akhil Thyadi — built as a hands-on portfolio project for DevOps / SRE roles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flask REST API — End-to-End DevOps Project

High-Level Architecture

Tech Stack

Module Index

Module 1 — Local Application Setup

Module 1B — Application Testing

Module 2 — Containerization (Docker + Compose)

Module 3 — CI Pipeline with GitHub Actions

Module 4 — Infrastructure as Code (Terraform & Ansible)

Module 5 — Kubernetes Orchestration

Module 6 — GitOps with Helm + ArgoCD

Module 7 — Observability (Prometheus + Grafana + Loki + Alertmanager)

Why This Order Matters

Quick Start

1. Local app

2. Containerized stack (Docker Compose)

3. Kubernetes stack

4. Trigger the GitOps loop

Repository Layout

Skills Demonstrated

Interview Prep Checklist

Contributing / Extending

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.github/workflows		.github/workflows
ansible		ansible
app		app
argocd		argocd
docs		docs
helm		helm
images		images
k8s		k8s
nginx		nginx
postman		postman
terraform		terraform
tests		tests
vagrant		vagrant
.dockerignore		.dockerignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
bootstrap.sh		bootstrap.sh
docker-compose.yaml		docker-compose.yaml

Folders and files

Latest commit

History

Repository files navigation

Flask REST API — End-to-End DevOps Project

High-Level Architecture

Tech Stack

Module Index

Why This Order Matters

Quick Start

1. Local app

2. Containerized stack (Docker Compose)

3. Kubernetes stack

4. Trigger the GitOps loop

Repository Layout

Skills Demonstrated

Interview Prep Checklist

Contributing / Extending

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages