Skip to content

Commit 72bc4cf

Browse files
travisjneumanclaude
andcommitted
feat: add step-by-step WALKTHROUGH.md for 9 key projects (levels 6-8)
Add pedagogical walkthroughs for first, mid, and capstone projects across levels 6-8: SQL Connection Simulator, Data Lineage Capture, Level 6 Mini Capstone, API Query Adapter, Ingestion Observability Kit, Level 7 Mini Capstone, Dashboard KPI Assembler, Fault Injection Harness, and Level 8 Mini Capstone. Each walkthrough guides thinking process, provides 5-6 incremental build steps with predict prompts, common mistakes table, and learning summary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c6248c4 commit 72bc4cf

File tree

9 files changed

+1855
-0
lines changed

9 files changed

+1855
-0
lines changed
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# SQL Connection Simulator — Step-by-Step Walkthrough
2+
3+
[<- Back to Project README](./README.md) | [Solution](./SOLUTION.md)
4+
5+
## Before You Start
6+
7+
Read the [project README](./README.md) first. Try to solve it on your own before following this guide. The goal is to build a connection pool for SQLite that reuses connections instead of creating new ones for every query, with retry logic for transient failures. If you can write a class that manages a list of connections and hands them out on request, you are already most of the way there.
8+
9+
## Thinking Process
10+
11+
The core problem here is resource management. Every time you open a database connection, there is overhead: memory allocation, file handles, and (in real databases) network handshakes. If your application makes hundreds of queries per second, opening and closing a connection for each one wastes enormous time. The solution is a pool: keep a small number of connections alive and reuse them.
12+
13+
Think of it like a library lending desk. Instead of buying a new book every time someone wants to read, the library keeps books on shelves and lends them out. When a reader finishes, the book goes back on the shelf for the next person. Your `ConnectionPool` is that lending desk. `acquire()` lends a connection out, and `release()` puts it back on the shelf.
14+
15+
The second challenge is handling failures gracefully. Databases can be temporarily unavailable (locked, restarting, overloaded). Rather than crashing immediately, your code should retry a few times with increasing delays between attempts. This is called exponential backoff: wait 10ms, then 20ms, then 40ms. The increasing gaps give the database time to recover without hammering it with rapid-fire retry attempts.
16+
17+
## Step 1: Define the Configuration
18+
19+
**What to do:** Create a `ConnectionConfig` dataclass that holds all the settings your pool needs: the database path, timeout, number of retries, and pool size.
20+
21+
**Why:** Centralizing configuration in a dataclass means you can pass one object around instead of four separate arguments. It also makes testing easier because you can create different configs for different scenarios.
22+
23+
```python
24+
from dataclasses import dataclass
25+
26+
MAX_POOL_SIZE = 5
27+
MAX_RETRIES = 3
28+
BASE_BACKOFF_SEC = 0.01
29+
30+
@dataclass
31+
class ConnectionConfig:
32+
db_path: str = ":memory:"
33+
timeout: float = 5.0
34+
max_retries: int = MAX_RETRIES
35+
pool_size: int = MAX_POOL_SIZE
36+
```
37+
38+
**Predict:** What happens if you set `pool_size` to 0? Would connections still work, or would every connection be immediately closed after use?
39+
40+
## Step 2: Build the Connection Pool Class
41+
42+
**What to do:** Create a `ConnectionPool` class with an internal list `_pool` to hold idle connections and counters to track how many connections were created vs. reused.
43+
44+
**Why:** The pool is the heart of this project. The `_pool` list acts as a stack: connections are appended when released and popped when acquired. This last-in-first-out pattern keeps recently-used connections warm.
45+
46+
```python
47+
class ConnectionPool:
48+
def __init__(self, config: ConnectionConfig) -> None:
49+
self.config = config
50+
self._pool: list[sqlite3.Connection] = []
51+
self._created = 0
52+
self._reused = 0
53+
```
54+
55+
**Predict:** Why use a list as a stack (append/pop) rather than a queue (append/pop from front)? Think about which connection is most likely to still be valid.
56+
57+
## Step 3: Implement acquire() and release()
58+
59+
**What to do:** Write `acquire()` to check the pool first (reuse if possible, create if empty) and `release()` to return connections to the pool (or close them if the pool is full).
60+
61+
**Why:** This is where the performance gain comes from. The first call to `acquire()` creates a new connection. When you `release()` it, the connection goes into the pool. The next `acquire()` finds it there and reuses it — no creation overhead.
62+
63+
```python
64+
def acquire(self) -> sqlite3.Connection:
65+
if self._pool:
66+
self._reused += 1
67+
return self._pool.pop()
68+
conn = self._connect_with_retry()
69+
self._created += 1
70+
return conn
71+
72+
def release(self, conn: sqlite3.Connection) -> None:
73+
if len(self._pool) < self.config.pool_size:
74+
self._pool.append(conn)
75+
else:
76+
conn.close()
77+
```
78+
79+
**Predict:** What would happen if `release()` never checked the pool size and just always appended? How many open connections could you end up with?
80+
81+
## Step 4: Add Retry Logic with Exponential Backoff
82+
83+
**What to do:** Write `_connect_with_retry()` that attempts to connect up to `max_retries` times. On each failure, wait longer before trying again (exponential backoff). After the connection succeeds, run `SELECT 1` to verify it actually works.
84+
85+
**Why:** Databases can be temporarily locked or slow to start. A single failed attempt does not mean the database is permanently down. Exponential backoff avoids flooding a struggling server with retries while still recovering quickly from brief glitches.
86+
87+
```python
88+
def _connect_with_retry(self) -> sqlite3.Connection:
89+
last_err = None
90+
for attempt in range(1, self.config.max_retries + 1):
91+
try:
92+
conn = sqlite3.connect(self.config.db_path, timeout=self.config.timeout)
93+
conn.execute("SELECT 1") # health-check ping
94+
return conn
95+
except sqlite3.OperationalError as exc:
96+
last_err = exc
97+
wait = BASE_BACKOFF_SEC * (2 ** (attempt - 1))
98+
time.sleep(wait)
99+
raise ConnectionError(f"Failed after {self.config.max_retries} retries: {last_err}")
100+
```
101+
102+
**Predict:** If `BASE_BACKOFF_SEC` is 0.01, what are the wait times for attempts 1, 2, and 3? (Calculate: 0.01 * 2^0, 0.01 * 2^1, 0.01 * 2^2.)
103+
104+
## Step 5: Write the Demo Workload
105+
106+
**What to do:** Create a function that acquires a connection, creates a table, inserts rows, queries them back, and releases the connection. Then acquire a second connection to demonstrate that the pool reuses the first one.
107+
108+
**Why:** This proves the pool works end-to-end. The second `acquire()` call should show a reused connection (not a newly created one) in the pool stats.
109+
110+
```python
111+
def run_demo_queries(pool: ConnectionPool, labels: list[str]) -> list[dict]:
112+
conn = pool.acquire()
113+
try:
114+
conn.execute("CREATE TABLE IF NOT EXISTS events (id INTEGER PRIMARY KEY, label TEXT NOT NULL)")
115+
for label in labels:
116+
conn.execute("INSERT INTO events (label) VALUES (?)", (label,))
117+
conn.commit()
118+
rows = conn.execute("SELECT id, label FROM events ORDER BY id").fetchall()
119+
return [{"id": r[0], "label": r[1]} for r in rows]
120+
finally:
121+
pool.release(conn)
122+
```
123+
124+
**Predict:** Why is `pool.release(conn)` inside a `finally` block instead of after the return statement? What would happen if an exception occurred during the INSERT?
125+
126+
## Step 6: Wire Up the Orchestrator and CLI
127+
128+
**What to do:** Write a `run()` function that reads labels from an input file, runs the demo queries, performs a health check, and writes a JSON summary. Add `argparse` for CLI arguments.
129+
130+
**Why:** The orchestrator ties everything together and proves the system works as a whole. The JSON output gives you concrete evidence of what happened: how many rows were inserted, whether the health check passed, and how many connections were created vs. reused.
131+
132+
```python
133+
def run(input_path: Path, output_path: Path, config: ConnectionConfig | None = None) -> dict:
134+
config = config or ConnectionConfig()
135+
pool = ConnectionPool(config)
136+
labels = [ln.strip() for ln in input_path.read_text().splitlines() if ln.strip()]
137+
rows = run_demo_queries(pool, labels)
138+
139+
conn2 = pool.acquire() # should be a pool REUSE
140+
hc = health_check(conn2)
141+
pool.release(conn2)
142+
pool.close_all()
143+
144+
summary = {"rows_inserted": len(rows), "rows": rows, "health": hc, "pool_stats": pool.stats()}
145+
output_path.write_text(json.dumps(summary, indent=2))
146+
return summary
147+
```
148+
149+
**Predict:** Look at the pool stats after running. `created` should be 1 and `reused` should be 1. Why not `created: 2`?
150+
151+
## Common Mistakes
152+
153+
| Mistake | Why It Happens | Fix |
154+
|---------|---------------|-----|
155+
| Forgetting to call `conn.commit()` after INSERTs | SQLite defaults to autocommit off for DML statements, so changes are invisible without an explicit commit | Always call `conn.commit()` after writes, or use `conn.execute()` with DDL (which auto-commits) |
156+
| Not closing connections on error | If an exception occurs between `acquire()` and `release()`, the connection leaks | Wrap the usage in `try/finally` so `release()` always runs |
157+
| Using `time.sleep()` with large values in retry logic | Copy-pasting production-style backoff values (1s, 2s, 4s) makes tests painfully slow | Use a small `BASE_BACKOFF_SEC` (e.g., 0.01) for demos and tests |
158+
| Returning a closed connection to the pool | If you manually close a connection and then release it, the next acquire gets a broken connection | Check connection health before returning it to the pool (the "Fix it" exercise) |
159+
160+
## Testing Your Solution
161+
162+
```bash
163+
pytest -q
164+
```
165+
166+
You should see 8+ tests pass. The tests verify connection creation, pool reuse, retry behavior, health checks, and the full end-to-end pipeline.
167+
168+
## What You Learned
169+
170+
- **Connection pooling** reduces overhead by reusing database connections instead of creating new ones for every query. The pattern applies to any expensive resource: HTTP connections, thread pools, GPU memory.
171+
- **Exponential backoff** is the standard approach for retrying transient failures. It prevents retry storms from overwhelming a recovering service.
172+
- **Context managers and try/finally** ensure resources are always cleaned up, even when exceptions occur. Leaked connections are one of the most common causes of production database outages.

0 commit comments

Comments
 (0)