Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions data-track/week-5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# HYF Data Track — Week 5 Practice Exercises

Seven exercises that consolidate Week 5 (containers & CI/CD): writing Dockerfiles, managing dependencies for reproducible builds, and automating checks with GitHub Actions.

Work through them in order. Exercises 1–4 build on each other (pipeline → caching → uv → comparison). Exercises 5–7 are standalone.

## Layout

| Folder | Topic | Concepts |
|---|---|---|
| [`exercise_1/`](exercise_1/) | Minimal Pipeline to Container | Dockerfile basics, `ENV`, `CMD` |
| [`exercise_2/`](exercise_2/) | Cache-Friendly Dockerfile | Layer ordering, `requirements.txt` |
| [`exercise_3/`](exercise_3/) | Cache-Friendly Dockerfile with uv | `uv sync --frozen`, `pyproject.toml`, `uv.lock` |
| [`exercise_4/`](exercise_4/) | Compare Both Docker Approaches | `requirements.txt` vs `uv`, trade-offs |
| [`exercise_5/`](exercise_5/) | CI Smoke Test | GitHub Actions, `pytest`, breaking CI intentionally |
| [`exercise_6/`](exercise_6/) | Environment Variable Patterns | `-e`, `--env-file`, `ARG` vs `ENV` |
| [`exercise_7/`](exercise_7/) | Image Tagging Strategy | `docker tag`, commit SHA, multi-environment tags |

```text
week-5/
├── exercise_1/
│ ├── src/pipeline.py # starter pipeline script
│ ├── Dockerfile # student fills the TODOs
│ ├── README.md
│ └── solutions/
│ └── Dockerfile # reference answer with # WHY comments
├── exercise_2/
│ ├── src/pipeline.py
│ ├── requirements.txt
│ ├── Dockerfile # BAD ordering — student fixes it
│ ├── README.md
│ └── solutions/
│ └── Dockerfile
├── exercise_3/
│ ├── src/pipeline.py
│ ├── pyproject.toml
│ ├── uv.lock
│ ├── Dockerfile # student fills the TODOs
│ ├── README.md
│ └── solutions/
│ └── Dockerfile
├── exercise_4/
│ ├── README.md # written comparison task
│ └── solutions/
│ └── answers.md
├── exercise_5/
│ ├── tests/
│ │ └── test_smoke.py # student creates this
│ ├── .github/
│ │ └── workflows/
│ │ └── ci.yml # student creates this
│ ├── README.md
│ └── solutions/
│ ├── test_smoke.py
│ └── ci.yml
├── exercise_6/
│ ├── src/pipeline.py
│ ├── .env.example
│ ├── Dockerfile
│ ├── README.md
│ └── solutions/
│ └── Dockerfile
└── exercise_7/
├── README.md
└── solutions/
└── answers.md
```

## Open in GitHub Codespaces

> 💻 [Open in GitHub Codespaces](https://github.com/codespaces/new/HackYourFuture/Learning-Resources?devcontainer_path=.devcontainer%2Fdata-track%2Fdevcontainer.json)

One Codespace covers all seven exercises. From the Explorer, navigate into `data-track/week-5/exercise_N/`.

**Note:** Exercises 1–3, 6–7 require Docker. The Codespace devcontainer includes Docker-in-Docker. If you work locally, make sure Docker Desktop is running.

## Clone locally

```bash
git clone https://github.com/HackYourFuture/Learning-Resources.git
cd Learning-Resources/data-track/week-5
```

## Reference solutions (peek only after attempting)

Each `exercise_N/solutions/` folder holds the reference answer. The original `# TODO` comments are preserved, and `# WHY ...:` notes explain the non-obvious choices.

**Read the WHY notes, not just the code.** The reasoning is what carries into real projects.

Time-box yourself: 15–30 minutes of honest attempt before opening `solutions/`. You can diff your work against the reference:

```bash
diff exercise_1/Dockerfile exercise_1/solutions/Dockerfile
```
15 changes: 15 additions & 0 deletions data-track/week-5/exercise_1/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# TODO 1: Choose a base image. Use python:3.11-slim.
FROM ???

# TODO 2: Set the working directory inside the container to /app.
WORKDIR ???

# TODO 3: Copy requirements.txt into the container.
# (This exercise has no requirements.txt — skip this step.)

# TODO 4: Copy the src/ folder into the container.
COPY ??? ???

# TODO 5: Set the default command to run the pipeline module.
# Use: python src/pipeline.py
CMD ???
38 changes: 38 additions & 0 deletions data-track/week-5/exercise_1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Exercise 1: Minimal Pipeline to Container

Package a small Python script into a Docker image and run it with an environment variable.

## Setup

No extra dependencies. `src/pipeline.py` uses only the standard library.

## Task

1. Open `Dockerfile` and fill in the five TODOs.
2. Build the image:
```bash
docker build -t pipeline-practice:1.0 .
```
3. Run the container **without** `API_KEY` and confirm the output:
```
API key present: False
```
4. Run it **with** `API_KEY` set:
```bash
docker run --rm -e API_KEY=demo pipeline-practice:1.0
```
Expected output:
```
API key present: True
```

## Success criteria

- `docker build` completes without errors.
- Running without `-e API_KEY` prints `API key present: False`.
- Running with `-e API_KEY=demo` prints `API key present: True`.

## Stretch

- Change the `CMD` to use the exec form (`["python", "src/pipeline.py"]`) if you used the shell form. What is the difference?
- Add a `LABEL maintainer="yourname"` instruction. Run `docker inspect pipeline-practice:1.0` and find it.
21 changes: 21 additions & 0 deletions data-track/week-5/exercise_1/solutions/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# TODO 1: Choose a base image. Use python:3.11-slim.
# WHY python:3.11-slim: the full python:3.11 image is ~900MB. The slim variant strips
# documentation, tests, and unused locale data, bringing it to ~130MB. For a pipeline
# that only needs the standard library, slim is the right default.
FROM python:3.11-slim

# TODO 2: Set the working directory inside the container to /app.
# WHY /app: a dedicated working directory keeps container paths predictable and avoids
# accidentally writing files into system directories like /usr or /.
WORKDIR /app

# TODO 4: Copy the src/ folder into the container.
# WHY COPY src/ src/: copies only the source directory, not the whole project. Smaller
# context means faster builds and no risk of leaking .env or other local files.
COPY src/ src/

# TODO 5: Set the default command to run the pipeline module.
# WHY CMD vs RUN: RUN executes at build time; CMD sets the default at run time.
# Using a JSON array ("exec form") avoids spawning a shell, so signals like SIGTERM
# reach the Python process directly instead of being swallowed by a shell wrapper.
CMD ["python", "src/pipeline.py"]
4 changes: 4 additions & 0 deletions data-track/week-5/exercise_1/src/pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import os

api_key = os.environ.get("API_KEY", "missing")
print(f"API key present: {api_key != 'missing'}")
16 changes: 16 additions & 0 deletions data-track/week-5/exercise_2/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# BAD: This Dockerfile copies all source code before installing dependencies.
# Every code change — even a single blank line in pipeline.py — invalidates
# the pip install cache layer, forcing a full reinstall on each build.

FROM python:3.11-slim

WORKDIR /app

# TODO 1: Identify which COPY + RUN pair below causes slow rebuilds.
# Then reorder the instructions so dependency installs are cached
# separately from source code changes.

COPY . .
RUN pip install -r requirements.txt

CMD ["python", "src/pipeline.py"]
27 changes: 27 additions & 0 deletions data-track/week-5/exercise_2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Exercise 2: Cache-Friendly Dockerfile

Reorder a Dockerfile so that `pip install` is cached separately from source code changes.

## Setup

No extra setup — `src/pipeline.py` uses only `os` from the standard library. `requests` is in `requirements.txt` but not imported yet; it represents a real project dependency.

## Task

1. Build the image as-is and note how long the install step takes:
```bash
docker build -t pipeline-practice:2.0 .
```
2. Add a blank line to `src/pipeline.py` and build again. Observe that `pip install` runs again from scratch.
3. Open `Dockerfile` and fix the `TODO 1`: reorder the `COPY` and `RUN` instructions so dependency installs are cached.
4. Build again after the fix, then add another blank line to `src/pipeline.py` and build a fourth time. Confirm that `pip install` is now served from cache (`---> Using cache`).

## Success criteria

- After fixing the Dockerfile, editing `src/pipeline.py` does **not** trigger a pip reinstall.
- `docker build` output shows `---> Using cache` for the pip install layer after a code-only change.

## Stretch

- Add a second package to `requirements.txt` (e.g. `pydantic==2.6.1`) and rebuild. Is the install layer invalidated? Why?
- What happens if you remove `requirements.txt` from `.dockerignore` but you already have it in the image?
1 change: 1 addition & 0 deletions data-track/week-5/exercise_2/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
requests==2.31.0
24 changes: 24 additions & 0 deletions data-track/week-5/exercise_2/solutions/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
FROM python:3.11-slim

WORKDIR /app

# TODO 1: Identify which COPY + RUN pair below causes slow rebuilds.
# Then reorder the instructions so dependency installs are cached
# separately from source code changes.

# WHY copy requirements.txt first: Docker builds images as a stack of layers.
# Each instruction produces a layer. If a layer's inputs have not changed,
# Docker reuses the cached layer and skips re-executing it.
#
# By copying requirements.txt before any source code, the pip install layer
# only reruns when requirements.txt itself changes — not when pipeline.py changes.
# For a project with many dependencies, this can save 30–120 seconds per build.
COPY requirements.txt .
RUN pip install -r requirements.txt

# WHY copy source code after pip install: source code changes on every feature commit.
# Placing it after the dependency layer means code edits only invalidate this final
# COPY layer, not the expensive install step above.
COPY src/ src/

CMD ["python", "src/pipeline.py"]
4 changes: 4 additions & 0 deletions data-track/week-5/exercise_2/src/pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import os

api_key = os.environ.get("API_KEY", "missing")
print(f"API key present: {api_key != 'missing'}")
24 changes: 24 additions & 0 deletions data-track/week-5/exercise_3/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# TODO 1: Start from the official Python 3.11-slim base image.
FROM ???

WORKDIR /app

# TODO 2: Copy the uv binary from the official uv image.
# Use: COPY --from=ghcr.io/astral-sh/uv:0.6 /uv /usr/local/bin/uv
COPY --from=??? ??? ???

# TODO 3: Copy pyproject.toml and uv.lock into the container.
# These must be copied BEFORE your source code so the install
# layer is cached separately from code changes.
COPY ??? ???

# TODO 4: Install dependencies using uv with the --frozen and --no-dev flags.
# --frozen: respect the lock file exactly, do not re-resolve.
# --no-dev: skip development dependencies (linters, test runners).
RUN uv sync ???

# TODO 5: Copy the rest of the source code.
COPY ??? ???

# TODO 6: Set the default run command using uv run.
CMD ???
48 changes: 48 additions & 0 deletions data-track/week-5/exercise_3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Exercise 3: Cache-Friendly Dockerfile with uv

Build a Docker image that uses `uv` for locked dependency installs.

## Setup

You need `uv` installed locally to regenerate `uv.lock`. Install it once:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

The `uv.lock` is already committed in this exercise folder. If you want to see how it is generated from scratch:

```bash
uv lock
```

## Task

1. Fill in the six TODOs in `Dockerfile`.
2. Build the image:
```bash
docker build -t pipeline-practice:3.0 .
```
3. Run it and confirm the output:
```bash
docker run --rm -e API_KEY=demo pipeline-practice:3.0
```
Expected:
```
API key present: True
```
4. Edit only `src/pipeline.py` (add a comment) and build again. Confirm the `uv sync` layer stays cached.
5. Add a second dependency to `pyproject.toml`, run `uv lock` to update `uv.lock`, and build again. Confirm `uv sync` now reruns.
6. **Intentional failure:** bump the version in `pyproject.toml` without running `uv lock`. Try `uv sync --frozen` locally. Read the error — this is exactly what CI should throw when a lock file is stale.

## Success criteria

- Image builds and runs with the expected output.
- Editing source code does not invalidate the `uv sync` layer.
- Changing `pyproject.toml` + updating `uv.lock` does invalidate the layer.
- Running `uv sync --frozen` after editing `pyproject.toml` (without `uv lock`) produces an error.

## Stretch

- Compare the image size of Exercise 2 (pip) vs Exercise 3 (uv): `docker images pipeline-practice`.
- What would happen if you forgot `--frozen` in the `RUN` step?
7 changes: 7 additions & 0 deletions data-track/week-5/exercise_3/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[project]
name = "weather-pipeline"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"requests==2.31.0",
]
38 changes: 38 additions & 0 deletions data-track/week-5/exercise_3/solutions/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# TODO 1: Start from the official Python 3.11-slim base image.
# WHY slim: saves ~770MB vs the full image. No documentation or test data needed
# in a production container.
FROM python:3.11-slim

WORKDIR /app

# TODO 2: Copy the uv binary from the official uv image.
# WHY COPY --from: this is a Docker multi-stage copy. Instead of installing uv
# via pip (which adds a pip dependency and is slower), we pull only the compiled
# binary from the official uv image. The result: a smaller final image with no
# pip layer involved.
COPY --from=ghcr.io/astral-sh/uv:0.6 /uv /usr/local/bin/uv

# TODO 3: Copy pyproject.toml and uv.lock into the container.
# WHY before source code: same caching logic as Exercise 2. uv.lock changes
# only when dependencies change; source code changes on every commit. Separating
# them means the install layer stays cached across code edits.
COPY pyproject.toml uv.lock ./

# TODO 4: Install dependencies using uv with the --frozen and --no-dev flags.
# WHY --frozen: without this flag, uv re-resolves the dependency graph from
# pyproject.toml and may pick newer versions than what uv.lock specifies. This
# defeats the purpose of committing a lock file — you would no longer be
# guaranteed the same environment across machines and CI runs.
# WHY --no-dev: pytest, ruff, and other dev tools are not needed at runtime.
# Including them inflates the image and widens the attack surface.
RUN uv sync --frozen --no-dev

# TODO 5: Copy the rest of the source code.
# WHY copy src/ separately: keeps source code changes from busting the uv sync layer.
COPY src/ src/

# TODO 6: Set the default run command using uv run.
# WHY uv run: uv run activates the managed virtual environment and then executes
# the command. This is the correct way to use the uv-managed venv inside Docker
# rather than manually sourcing the venv or relying on PATH manipulation.
CMD ["uv", "run", "python", "src/pipeline.py"]
4 changes: 4 additions & 0 deletions data-track/week-5/exercise_3/src/pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import os

api_key = os.environ.get("API_KEY", "missing")
print(f"API key present: {api_key != 'missing'}")
Loading