|
| 1 | +# SQL Connection Simulator — Step-by-Step Walkthrough |
| 2 | + |
| 3 | +[<- Back to Project README](./README.md) | [Solution](./SOLUTION.md) |
| 4 | + |
| 5 | +## Before You Start |
| 6 | + |
| 7 | +Read the [project README](./README.md) first. Try to solve it on your own before following this guide. The goal is to build a connection pool for SQLite that reuses connections instead of creating new ones for every query, with retry logic for transient failures. If you can write a class that manages a list of connections and hands them out on request, you are already most of the way there. |
| 8 | + |
| 9 | +## Thinking Process |
| 10 | + |
| 11 | +The core problem here is resource management. Every time you open a database connection, there is overhead: memory allocation, file handles, and (in real databases) network handshakes. If your application makes hundreds of queries per second, opening and closing a connection for each one wastes enormous time. The solution is a pool: keep a small number of connections alive and reuse them. |
| 12 | + |
| 13 | +Think of it like a library lending desk. Instead of buying a new book every time someone wants to read, the library keeps books on shelves and lends them out. When a reader finishes, the book goes back on the shelf for the next person. Your `ConnectionPool` is that lending desk. `acquire()` lends a connection out, and `release()` puts it back on the shelf. |
| 14 | + |
| 15 | +The second challenge is handling failures gracefully. Databases can be temporarily unavailable (locked, restarting, overloaded). Rather than crashing immediately, your code should retry a few times with increasing delays between attempts. This is called exponential backoff: wait 10ms, then 20ms, then 40ms. The increasing gaps give the database time to recover without hammering it with rapid-fire retry attempts. |
| 16 | + |
| 17 | +## Step 1: Define the Configuration |
| 18 | + |
| 19 | +**What to do:** Create a `ConnectionConfig` dataclass that holds all the settings your pool needs: the database path, timeout, number of retries, and pool size. |
| 20 | + |
| 21 | +**Why:** Centralizing configuration in a dataclass means you can pass one object around instead of four separate arguments. It also makes testing easier because you can create different configs for different scenarios. |
| 22 | + |
| 23 | +```python |
| 24 | +from dataclasses import dataclass |
| 25 | + |
| 26 | +MAX_POOL_SIZE = 5 |
| 27 | +MAX_RETRIES = 3 |
| 28 | +BASE_BACKOFF_SEC = 0.01 |
| 29 | + |
| 30 | +@dataclass |
| 31 | +class ConnectionConfig: |
| 32 | + db_path: str = ":memory:" |
| 33 | + timeout: float = 5.0 |
| 34 | + max_retries: int = MAX_RETRIES |
| 35 | + pool_size: int = MAX_POOL_SIZE |
| 36 | +``` |
| 37 | + |
| 38 | +**Predict:** What happens if you set `pool_size` to 0? Would connections still work, or would every connection be immediately closed after use? |
| 39 | + |
| 40 | +## Step 2: Build the Connection Pool Class |
| 41 | + |
| 42 | +**What to do:** Create a `ConnectionPool` class with an internal list `_pool` to hold idle connections and counters to track how many connections were created vs. reused. |
| 43 | + |
| 44 | +**Why:** The pool is the heart of this project. The `_pool` list acts as a stack: connections are appended when released and popped when acquired. This last-in-first-out pattern keeps recently-used connections warm. |
| 45 | + |
| 46 | +```python |
| 47 | +class ConnectionPool: |
| 48 | + def __init__(self, config: ConnectionConfig) -> None: |
| 49 | + self.config = config |
| 50 | + self._pool: list[sqlite3.Connection] = [] |
| 51 | + self._created = 0 |
| 52 | + self._reused = 0 |
| 53 | +``` |
| 54 | + |
| 55 | +**Predict:** Why use a list as a stack (append/pop) rather than a queue (append/pop from front)? Think about which connection is most likely to still be valid. |
| 56 | + |
| 57 | +## Step 3: Implement acquire() and release() |
| 58 | + |
| 59 | +**What to do:** Write `acquire()` to check the pool first (reuse if possible, create if empty) and `release()` to return connections to the pool (or close them if the pool is full). |
| 60 | + |
| 61 | +**Why:** This is where the performance gain comes from. The first call to `acquire()` creates a new connection. When you `release()` it, the connection goes into the pool. The next `acquire()` finds it there and reuses it — no creation overhead. |
| 62 | + |
| 63 | +```python |
| 64 | +def acquire(self) -> sqlite3.Connection: |
| 65 | + if self._pool: |
| 66 | + self._reused += 1 |
| 67 | + return self._pool.pop() |
| 68 | + conn = self._connect_with_retry() |
| 69 | + self._created += 1 |
| 70 | + return conn |
| 71 | + |
| 72 | +def release(self, conn: sqlite3.Connection) -> None: |
| 73 | + if len(self._pool) < self.config.pool_size: |
| 74 | + self._pool.append(conn) |
| 75 | + else: |
| 76 | + conn.close() |
| 77 | +``` |
| 78 | + |
| 79 | +**Predict:** What would happen if `release()` never checked the pool size and just always appended? How many open connections could you end up with? |
| 80 | + |
| 81 | +## Step 4: Add Retry Logic with Exponential Backoff |
| 82 | + |
| 83 | +**What to do:** Write `_connect_with_retry()` that attempts to connect up to `max_retries` times. On each failure, wait longer before trying again (exponential backoff). After the connection succeeds, run `SELECT 1` to verify it actually works. |
| 84 | + |
| 85 | +**Why:** Databases can be temporarily locked or slow to start. A single failed attempt does not mean the database is permanently down. Exponential backoff avoids flooding a struggling server with retries while still recovering quickly from brief glitches. |
| 86 | + |
| 87 | +```python |
| 88 | +def _connect_with_retry(self) -> sqlite3.Connection: |
| 89 | + last_err = None |
| 90 | + for attempt in range(1, self.config.max_retries + 1): |
| 91 | + try: |
| 92 | + conn = sqlite3.connect(self.config.db_path, timeout=self.config.timeout) |
| 93 | + conn.execute("SELECT 1") # health-check ping |
| 94 | + return conn |
| 95 | + except sqlite3.OperationalError as exc: |
| 96 | + last_err = exc |
| 97 | + wait = BASE_BACKOFF_SEC * (2 ** (attempt - 1)) |
| 98 | + time.sleep(wait) |
| 99 | + raise ConnectionError(f"Failed after {self.config.max_retries} retries: {last_err}") |
| 100 | +``` |
| 101 | + |
| 102 | +**Predict:** If `BASE_BACKOFF_SEC` is 0.01, what are the wait times for attempts 1, 2, and 3? (Calculate: 0.01 * 2^0, 0.01 * 2^1, 0.01 * 2^2.) |
| 103 | + |
| 104 | +## Step 5: Write the Demo Workload |
| 105 | + |
| 106 | +**What to do:** Create a function that acquires a connection, creates a table, inserts rows, queries them back, and releases the connection. Then acquire a second connection to demonstrate that the pool reuses the first one. |
| 107 | + |
| 108 | +**Why:** This proves the pool works end-to-end. The second `acquire()` call should show a reused connection (not a newly created one) in the pool stats. |
| 109 | + |
| 110 | +```python |
| 111 | +def run_demo_queries(pool: ConnectionPool, labels: list[str]) -> list[dict]: |
| 112 | + conn = pool.acquire() |
| 113 | + try: |
| 114 | + conn.execute("CREATE TABLE IF NOT EXISTS events (id INTEGER PRIMARY KEY, label TEXT NOT NULL)") |
| 115 | + for label in labels: |
| 116 | + conn.execute("INSERT INTO events (label) VALUES (?)", (label,)) |
| 117 | + conn.commit() |
| 118 | + rows = conn.execute("SELECT id, label FROM events ORDER BY id").fetchall() |
| 119 | + return [{"id": r[0], "label": r[1]} for r in rows] |
| 120 | + finally: |
| 121 | + pool.release(conn) |
| 122 | +``` |
| 123 | + |
| 124 | +**Predict:** Why is `pool.release(conn)` inside a `finally` block instead of after the return statement? What would happen if an exception occurred during the INSERT? |
| 125 | + |
| 126 | +## Step 6: Wire Up the Orchestrator and CLI |
| 127 | + |
| 128 | +**What to do:** Write a `run()` function that reads labels from an input file, runs the demo queries, performs a health check, and writes a JSON summary. Add `argparse` for CLI arguments. |
| 129 | + |
| 130 | +**Why:** The orchestrator ties everything together and proves the system works as a whole. The JSON output gives you concrete evidence of what happened: how many rows were inserted, whether the health check passed, and how many connections were created vs. reused. |
| 131 | + |
| 132 | +```python |
| 133 | +def run(input_path: Path, output_path: Path, config: ConnectionConfig | None = None) -> dict: |
| 134 | + config = config or ConnectionConfig() |
| 135 | + pool = ConnectionPool(config) |
| 136 | + labels = [ln.strip() for ln in input_path.read_text().splitlines() if ln.strip()] |
| 137 | + rows = run_demo_queries(pool, labels) |
| 138 | + |
| 139 | + conn2 = pool.acquire() # should be a pool REUSE |
| 140 | + hc = health_check(conn2) |
| 141 | + pool.release(conn2) |
| 142 | + pool.close_all() |
| 143 | + |
| 144 | + summary = {"rows_inserted": len(rows), "rows": rows, "health": hc, "pool_stats": pool.stats()} |
| 145 | + output_path.write_text(json.dumps(summary, indent=2)) |
| 146 | + return summary |
| 147 | +``` |
| 148 | + |
| 149 | +**Predict:** Look at the pool stats after running. `created` should be 1 and `reused` should be 1. Why not `created: 2`? |
| 150 | + |
| 151 | +## Common Mistakes |
| 152 | + |
| 153 | +| Mistake | Why It Happens | Fix | |
| 154 | +|---------|---------------|-----| |
| 155 | +| Forgetting to call `conn.commit()` after INSERTs | SQLite defaults to autocommit off for DML statements, so changes are invisible without an explicit commit | Always call `conn.commit()` after writes, or use `conn.execute()` with DDL (which auto-commits) | |
| 156 | +| Not closing connections on error | If an exception occurs between `acquire()` and `release()`, the connection leaks | Wrap the usage in `try/finally` so `release()` always runs | |
| 157 | +| Using `time.sleep()` with large values in retry logic | Copy-pasting production-style backoff values (1s, 2s, 4s) makes tests painfully slow | Use a small `BASE_BACKOFF_SEC` (e.g., 0.01) for demos and tests | |
| 158 | +| Returning a closed connection to the pool | If you manually close a connection and then release it, the next acquire gets a broken connection | Check connection health before returning it to the pool (the "Fix it" exercise) | |
| 159 | + |
| 160 | +## Testing Your Solution |
| 161 | + |
| 162 | +```bash |
| 163 | +pytest -q |
| 164 | +``` |
| 165 | + |
| 166 | +You should see 8+ tests pass. The tests verify connection creation, pool reuse, retry behavior, health checks, and the full end-to-end pipeline. |
| 167 | + |
| 168 | +## What You Learned |
| 169 | + |
| 170 | +- **Connection pooling** reduces overhead by reusing database connections instead of creating new ones for every query. The pattern applies to any expensive resource: HTTP connections, thread pools, GPU memory. |
| 171 | +- **Exponential backoff** is the standard approach for retrying transient failures. It prevents retry storms from overwhelming a recovering service. |
| 172 | +- **Context managers and try/finally** ensure resources are always cleaned up, even when exceptions occur. Leaked connections are one of the most common causes of production database outages. |
0 commit comments