HackYourFuture · lassebenni · May 30, 2026 · May 21, 2026 · May 27, 2026
diff --git a/data-track/week-5/README.md b/data-track/week-5/README.md
@@ -0,0 +1,94 @@
+# HYF Data Track — Week 5 Practice Exercises
+
+Seven exercises that consolidate Week 5 (containers & CI/CD): writing Dockerfiles, managing dependencies for reproducible builds, and automating checks with GitHub Actions.
+
+Work through them in order. Exercises 1–4 build on each other (pipeline → caching → uv → comparison). Exercises 5–7 are standalone.
+
+## Layout
+
+| Folder | Topic | Concepts |
+|---|---|---|
+| [`exercise_1/`](exercise_1/) | Minimal Pipeline to Container | Dockerfile basics, `ENV`, `CMD` |
+| [`exercise_2/`](exercise_2/) | Cache-Friendly Dockerfile | Layer ordering, `requirements.txt` |
+| [`exercise_3/`](exercise_3/) | Cache-Friendly Dockerfile with uv | `uv sync --frozen`, `pyproject.toml`, `uv.lock` |
+| [`exercise_4/`](exercise_4/) | Compare Both Docker Approaches | `requirements.txt` vs `uv`, trade-offs |
+| [`exercise_5/`](exercise_5/) | CI Smoke Test | GitHub Actions, `pytest`, breaking CI intentionally |
+| [`exercise_6/`](exercise_6/) | Environment Variable Patterns | `-e`, `--env-file`, `ARG` vs `ENV` |
+| [`exercise_7/`](exercise_7/) | Image Tagging Strategy | `docker tag`, commit SHA, multi-environment tags |
+
+```text
+week-5/
+├── exercise_1/
+│   ├── src/pipeline.py       # starter pipeline script
+│   ├── Dockerfile            # student fills the TODOs
+│   ├── README.md
+│   └── solutions/
+│       └── Dockerfile        # reference answer with # WHY comments
+├── exercise_2/
+│   ├── src/pipeline.py
+│   ├── requirements.txt
+│   ├── Dockerfile            # BAD ordering — student fixes it
+│   ├── README.md
+│   └── solutions/
+│       └── Dockerfile
+├── exercise_3/
+│   ├── src/pipeline.py
+│   ├── pyproject.toml
+│   ├── uv.lock
+│   ├── Dockerfile            # student fills the TODOs
+│   ├── README.md
+│   └── solutions/
+│       └── Dockerfile
+├── exercise_4/
+│   ├── README.md             # written comparison task
+│   └── solutions/
+│       └── answers.md
+├── exercise_5/
+│   ├── tests/
+│   │   └── test_smoke.py     # student creates this
+│   ├── .github/
+│   │   └── workflows/
+│   │       └── ci.yml        # student creates this
+│   ├── README.md
+│   └── solutions/
+│       ├── test_smoke.py
+│       └── ci.yml
+├── exercise_6/
+│   ├── src/pipeline.py
+│   ├── .env.example
+│   ├── Dockerfile
+│   ├── README.md
+│   └── solutions/
+│       └── Dockerfile
+└── exercise_7/
+    ├── README.md
+    └── solutions/
+        └── answers.md
+```
+
+## Open in GitHub Codespaces
+
+> 💻 [Open in GitHub Codespaces](https://github.com/codespaces/new/HackYourFuture/Learning-Resources?devcontainer_path=.devcontainer%2Fdata-track%2Fdevcontainer.json)
+
+One Codespace covers all seven exercises. From the Explorer, navigate into `data-track/week-5/exercise_N/`.
+
+**Note:** Exercises 1–3, 6–7 require Docker. The Codespace devcontainer includes Docker-in-Docker. If you work locally, make sure Docker Desktop is running.
+
+## Clone locally
+
+```bash
+git clone https://github.com/HackYourFuture/Learning-Resources.git
+cd Learning-Resources/data-track/week-5
+```
+
+## Reference solutions (peek only after attempting)
+
+Each `exercise_N/solutions/` folder holds the reference answer. The original `# TODO` comments are preserved, and `# WHY ...:` notes explain the non-obvious choices.
+
+**Read the WHY notes, not just the code.** The reasoning is what carries into real projects.
+
+Time-box yourself: 15–30 minutes of honest attempt before opening `solutions/`. You can diff your work against the reference:
+
+```bash
+diff exercise_1/Dockerfile exercise_1/solutions/Dockerfile
+```
diff --git a/data-track/week-5/exercise_1/Dockerfile b/data-track/week-5/exercise_1/Dockerfile
@@ -0,0 +1,15 @@
+# TODO 1: Choose a base image. Use python:3.11-slim.
+FROM ???
+
+# TODO 2: Set the working directory inside the container to /app.
+WORKDIR ???
+
+# TODO 3: Copy requirements.txt into the container.
+#         (This exercise has no requirements.txt — skip this step.)
+
+# TODO 4: Copy the src/ folder into the container.
+COPY ??? ???
+
+# TODO 5: Set the default command to run the pipeline module.
+#         Use: python src/pipeline.py
+CMD ???
diff --git a/data-track/week-5/exercise_1/README.md b/data-track/week-5/exercise_1/README.md
@@ -0,0 +1,38 @@
+# Exercise 1: Minimal Pipeline to Container
+
+Package a small Python script into a Docker image and run it with an environment variable.
+
+## Setup
+
+No extra dependencies. `src/pipeline.py` uses only the standard library.
+
+## Task
+
+1. Open `Dockerfile` and fill in the five TODOs.
+2. Build the image:
+   ```bash
+   docker build -t pipeline-practice:1.0 .
+   ```
+3. Run the container **without** `API_KEY` and confirm the output:
+   ```
+   API key present: False
+   ```
+4. Run it **with** `API_KEY` set:
+   ```bash
+   docker run --rm -e API_KEY=demo pipeline-practice:1.0
+   ```
+   Expected output:
+   ```
+   API key present: True
+   ```
+
+## Success criteria
+
+- `docker build` completes without errors.
+- Running without `-e API_KEY` prints `API key present: False`.
+- Running with `-e API_KEY=demo` prints `API key present: True`.
+
+## Stretch
+
+- Change the `CMD` to use the exec form (`["python", "src/pipeline.py"]`) if you used the shell form. What is the difference?
+- Add a `LABEL maintainer="yourname"` instruction. Run `docker inspect pipeline-practice:1.0` and find it.
diff --git a/data-track/week-5/exercise_1/solutions/Dockerfile b/data-track/week-5/exercise_1/solutions/Dockerfile
@@ -0,0 +1,21 @@
+# TODO 1: Choose a base image. Use python:3.11-slim.
+# WHY python:3.11-slim: the full python:3.11 image is ~900MB. The slim variant strips
+# documentation, tests, and unused locale data, bringing it to ~130MB. For a pipeline
+# that only needs the standard library, slim is the right default.
+FROM python:3.11-slim
+
+# TODO 2: Set the working directory inside the container to /app.
+# WHY /app: a dedicated working directory keeps container paths predictable and avoids
+# accidentally writing files into system directories like /usr or /.
+WORKDIR /app
+
+# TODO 4: Copy the src/ folder into the container.
+# WHY COPY src/ src/: copies only the source directory, not the whole project. Smaller
+# context means faster builds and no risk of leaking .env or other local files.
+COPY src/ src/
+
+# TODO 5: Set the default command to run the pipeline module.
+# WHY CMD vs RUN: RUN executes at build time; CMD sets the default at run time.
+# Using a JSON array ("exec form") avoids spawning a shell, so signals like SIGTERM
+# reach the Python process directly instead of being swallowed by a shell wrapper.
+CMD ["python", "src/pipeline.py"]
diff --git a/data-track/week-5/exercise_1/src/pipeline.py b/data-track/week-5/exercise_1/src/pipeline.py
@@ -0,0 +1,4 @@
+import os
+
+api_key = os.environ.get("API_KEY", "missing")
+print(f"API key present: {api_key != 'missing'}")
diff --git a/data-track/week-5/exercise_2/Dockerfile b/data-track/week-5/exercise_2/Dockerfile
@@ -0,0 +1,16 @@
+# BAD: This Dockerfile copies all source code before installing dependencies.
+# Every code change — even a single blank line in pipeline.py — invalidates
+# the pip install cache layer, forcing a full reinstall on each build.
+
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# TODO 1: Identify which COPY + RUN pair below causes slow rebuilds.
+#         Then reorder the instructions so dependency installs are cached
+#         separately from source code changes.
+
+COPY . .
+RUN pip install -r requirements.txt
+
+CMD ["python", "src/pipeline.py"]
diff --git a/data-track/week-5/exercise_2/README.md b/data-track/week-5/exercise_2/README.md
@@ -0,0 +1,27 @@
+# Exercise 2: Cache-Friendly Dockerfile
+
+Reorder a Dockerfile so that `pip install` is cached separately from source code changes.
+
+## Setup
+
+No extra setup — `src/pipeline.py` uses only `os` from the standard library. `requests` is in `requirements.txt` but not imported yet; it represents a real project dependency.
+
+## Task
+
+1. Build the image as-is and note how long the install step takes:
+   ```bash
+   docker build -t pipeline-practice:2.0 .
+   ```
+2. Add a blank line to `src/pipeline.py` and build again. Observe that `pip install` runs again from scratch.
+3. Open `Dockerfile` and fix the `TODO 1`: reorder the `COPY` and `RUN` instructions so dependency installs are cached.
+4. Build again after the fix, then add another blank line to `src/pipeline.py` and build a fourth time. Confirm that `pip install` is now served from cache (`---> Using cache`).
+
+## Success criteria
+
+- After fixing the Dockerfile, editing `src/pipeline.py` does **not** trigger a pip reinstall.
+- `docker build` output shows `---> Using cache` for the pip install layer after a code-only change.
+
+## Stretch
+
+- Add a second package to `requirements.txt` (e.g. `pydantic==2.6.1`) and rebuild. Is the install layer invalidated? Why?
+- What happens if you remove `requirements.txt` from `.dockerignore` but you already have it in the image?
diff --git a/data-track/week-5/exercise_2/requirements.txt b/data-track/week-5/exercise_2/requirements.txt
@@ -0,0 +1 @@
+requests==2.31.0
diff --git a/data-track/week-5/exercise_2/solutions/Dockerfile b/data-track/week-5/exercise_2/solutions/Dockerfile
@@ -0,0 +1,24 @@
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# TODO 1: Identify which COPY + RUN pair below causes slow rebuilds.
+#         Then reorder the instructions so dependency installs are cached
+#         separately from source code changes.
+
+# WHY copy requirements.txt first: Docker builds images as a stack of layers.
+# Each instruction produces a layer. If a layer's inputs have not changed,
+# Docker reuses the cached layer and skips re-executing it.
+#
+# By copying requirements.txt before any source code, the pip install layer
+# only reruns when requirements.txt itself changes — not when pipeline.py changes.
+# For a project with many dependencies, this can save 30–120 seconds per build.
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+
+# WHY copy source code after pip install: source code changes on every feature commit.
+# Placing it after the dependency layer means code edits only invalidate this final
+# COPY layer, not the expensive install step above.
+COPY src/ src/
+
+CMD ["python", "src/pipeline.py"]
diff --git a/data-track/week-5/exercise_2/src/pipeline.py b/data-track/week-5/exercise_2/src/pipeline.py
@@ -0,0 +1,4 @@
+import os
+
+api_key = os.environ.get("API_KEY", "missing")
+print(f"API key present: {api_key != 'missing'}")
diff --git a/data-track/week-5/exercise_3/Dockerfile b/data-track/week-5/exercise_3/Dockerfile
@@ -0,0 +1,24 @@
+# TODO 1: Start from the official Python 3.11-slim base image.
+FROM ???
+
+WORKDIR /app
+
+# TODO 2: Copy the uv binary from the official uv image.
+#         Use: COPY --from=ghcr.io/astral-sh/uv:0.6 /uv /usr/local/bin/uv
+COPY --from=??? ??? ???
+
+# TODO 3: Copy pyproject.toml and uv.lock into the container.
+#         These must be copied BEFORE your source code so the install
+#         layer is cached separately from code changes.
+COPY ??? ???
+
+# TODO 4: Install dependencies using uv with the --frozen and --no-dev flags.
+#         --frozen: respect the lock file exactly, do not re-resolve.
+#         --no-dev: skip development dependencies (linters, test runners).
+RUN uv sync ???
+
+# TODO 5: Copy the rest of the source code.
+COPY ??? ???
+
+# TODO 6: Set the default run command using uv run.
+CMD ???
diff --git a/data-track/week-5/exercise_3/README.md b/data-track/week-5/exercise_3/README.md
@@ -0,0 +1,48 @@
+# Exercise 3: Cache-Friendly Dockerfile with uv
+
+Build a Docker image that uses `uv` for locked dependency installs.
+
+## Setup
+
+You need `uv` installed locally to regenerate `uv.lock`. Install it once:
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+The `uv.lock` is already committed in this exercise folder. If you want to see how it is generated from scratch:
+
+```bash
+uv lock
+```
+
+## Task
+
+1. Fill in the six TODOs in `Dockerfile`.
+2. Build the image:
+   ```bash
+   docker build -t pipeline-practice:3.0 .
+   ```
+3. Run it and confirm the output:
+   ```bash
+   docker run --rm -e API_KEY=demo pipeline-practice:3.0
+   ```
+   Expected:
+   ```
+   API key present: True
+   ```
+4. Edit only `src/pipeline.py` (add a comment) and build again. Confirm the `uv sync` layer stays cached.
+5. Add a second dependency to `pyproject.toml`, run `uv lock` to update `uv.lock`, and build again. Confirm `uv sync` now reruns.
+6. **Intentional failure:** bump the version in `pyproject.toml` without running `uv lock`. Try `uv sync --frozen` locally. Read the error — this is exactly what CI should throw when a lock file is stale.
+
+## Success criteria
+
+- Image builds and runs with the expected output.
+- Editing source code does not invalidate the `uv sync` layer.
+- Changing `pyproject.toml` + updating `uv.lock` does invalidate the layer.
+- Running `uv sync --frozen` after editing `pyproject.toml` (without `uv lock`) produces an error.
+
+## Stretch
+
+- Compare the image size of Exercise 2 (pip) vs Exercise 3 (uv): `docker images pipeline-practice`.
+- What would happen if you forgot `--frozen` in the `RUN` step?
diff --git a/data-track/week-5/exercise_3/pyproject.toml b/data-track/week-5/exercise_3/pyproject.toml
@@ -0,0 +1,7 @@
+[project]
+name = "weather-pipeline"
+version = "0.1.0"
+requires-python = ">=3.11"
+dependencies = [
+  "requests==2.31.0",
+]
diff --git a/data-track/week-5/exercise_3/solutions/Dockerfile b/data-track/week-5/exercise_3/solutions/Dockerfile
@@ -0,0 +1,38 @@
+# TODO 1: Start from the official Python 3.11-slim base image.
+# WHY slim: saves ~770MB vs the full image. No documentation or test data needed
+# in a production container.
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# TODO 2: Copy the uv binary from the official uv image.
+# WHY COPY --from: this is a Docker multi-stage copy. Instead of installing uv
+# via pip (which adds a pip dependency and is slower), we pull only the compiled
+# binary from the official uv image. The result: a smaller final image with no
+# pip layer involved.
+COPY --from=ghcr.io/astral-sh/uv:0.6 /uv /usr/local/bin/uv
+
+# TODO 3: Copy pyproject.toml and uv.lock into the container.
+# WHY before source code: same caching logic as Exercise 2. uv.lock changes
+# only when dependencies change; source code changes on every commit. Separating
+# them means the install layer stays cached across code edits.
+COPY pyproject.toml uv.lock ./
+
+# TODO 4: Install dependencies using uv with the --frozen and --no-dev flags.
+# WHY --frozen: without this flag, uv re-resolves the dependency graph from
+# pyproject.toml and may pick newer versions than what uv.lock specifies. This
+# defeats the purpose of committing a lock file — you would no longer be
+# guaranteed the same environment across machines and CI runs.
+# WHY --no-dev: pytest, ruff, and other dev tools are not needed at runtime.
+# Including them inflates the image and widens the attack surface.
+RUN uv sync --frozen --no-dev
+
+# TODO 5: Copy the rest of the source code.
+# WHY copy src/ separately: keeps source code changes from busting the uv sync layer.
+COPY src/ src/
+
+# TODO 6: Set the default run command using uv run.
+# WHY uv run: uv run activates the managed virtual environment and then executes
+# the command. This is the correct way to use the uv-managed venv inside Docker
+# rather than manually sourcing the venv or relying on PATH manipulation.
+CMD ["uv", "run", "python", "src/pipeline.py"]
diff --git a/data-track/week-5/exercise_3/src/pipeline.py b/data-track/week-5/exercise_3/src/pipeline.py
@@ -0,0 +1,4 @@
+import os
+
+api_key = os.environ.get("API_KEY", "missing")
+print(f"API key present: {api_key != 'missing'}")