|
1 | | -# Solution: Level 2 / Project 01 - Dictionary Lookup Service |
| 1 | +# Dictionary Lookup Service — Annotated Solution |
2 | 2 |
|
3 | | -> **STOP** — Have you attempted this project yourself first? |
4 | | -> |
5 | | -> Learning happens in the struggle, not in reading answers. |
6 | | -> Spend at least 20 minutes trying before reading this solution. |
7 | | -> If you are stuck, try the [Walkthrough](./WALKTHROUGH.md) first — it guides |
8 | | -> your thinking without giving away the answer. |
| 3 | +> **STOP!** Try solving this yourself first. Use the [project README](./README.md) and [walkthrough](./WALKTHROUGH.md) before reading the solution. |
9 | 4 |
|
10 | 5 | --- |
11 | 6 |
|
12 | | - |
13 | | -## Complete solution |
| 7 | +## Complete Solution |
14 | 8 |
|
15 | 9 | ```python |
16 | | -# WHY load_dictionary: [explain the design reason] |
17 | | -# WHY lookup: [explain the design reason] |
18 | | -# WHY batch_lookup: [explain the design reason] |
19 | | -# WHY dictionary_stats: [explain the design reason] |
20 | | -# WHY parse_args: [explain the design reason] |
21 | | -# WHY main: [explain the design reason] |
22 | | - |
23 | | -# [paste the complete working solution here] |
24 | | -# Include WHY comments on every non-obvious line. |
| 10 | +"""Dictionary Lookup Service — complete annotated solution.""" |
| 11 | + |
| 12 | +from __future__ import annotations |
| 13 | + |
| 14 | +import argparse |
| 15 | +import json |
| 16 | +from pathlib import Path |
| 17 | + |
| 18 | +# WHY: difflib is a stdlib module that provides fuzzy string matching. |
| 19 | +# We use it to suggest corrections when a lookup misses, which is a |
| 20 | +# much better user experience than just "not found". |
| 21 | +import difflib |
| 22 | + |
| 23 | + |
| 24 | +def load_dictionary(path: Path) -> dict[str, str]: |
| 25 | + """Load a dictionary from a file of 'key=value' lines.""" |
| 26 | + if not path.exists(): |
| 27 | + raise FileNotFoundError(f"Dictionary file not found: {path}") |
| 28 | + |
| 29 | + raw = path.read_text(encoding="utf-8").splitlines() |
| 30 | + |
| 31 | + # WHY: Dict comprehension with split("=", 1) — the maxsplit=1 argument |
| 32 | + # ensures definitions containing '=' characters are preserved intact. |
| 33 | + # Without it, "url=https://a.com/b=c" would split into 3 parts and break. |
| 34 | + entries = { |
| 35 | + parts[0].strip().lower(): parts[1].strip() |
| 36 | + for line in raw |
| 37 | + if "=" in line |
| 38 | + for parts in [line.split("=", 1)] |
| 39 | + } |
| 40 | + return entries |
| 41 | + |
| 42 | + |
| 43 | +def lookup(dictionary: dict[str, str], term: str) -> dict: |
| 44 | + """Look up a term with fuzzy matching fallback.""" |
| 45 | + # WHY: Normalise to lowercase so "Python", "PYTHON", and "python" |
| 46 | + # all find the same entry — users should not need to know the exact case. |
| 47 | + normalised = term.strip().lower() |
| 48 | + |
| 49 | + try: |
| 50 | + # WHY: Using dict[key] with try/except instead of dict.get() because |
| 51 | + # we want different behaviour for hit vs miss — try/except makes |
| 52 | + # the two paths explicit and easy to extend. |
| 53 | + definition = dictionary[normalised] |
| 54 | + return { |
| 55 | + "found": True, |
| 56 | + "term": normalised, |
| 57 | + "definition": definition, |
| 58 | + "suggestions": [], |
| 59 | + } |
| 60 | + except KeyError: |
| 61 | + # WHY: get_close_matches uses SequenceMatcher internally. The cutoff |
| 62 | + # of 0.6 means a word must be at least 60% similar to be suggested. |
| 63 | + # n=3 limits suggestions to the 3 best matches. |
| 64 | + suggestions = difflib.get_close_matches( |
| 65 | + normalised, dictionary.keys(), n=3, cutoff=0.6 |
| 66 | + ) |
| 67 | + return { |
| 68 | + "found": False, |
| 69 | + "term": normalised, |
| 70 | + "definition": None, |
| 71 | + "suggestions": suggestions, |
| 72 | + } |
| 73 | + |
| 74 | + |
| 75 | +def batch_lookup( |
| 76 | + dictionary: dict[str, str], terms: list[str] |
| 77 | +) -> list[dict]: |
| 78 | + """Look up many terms, tracking original order with enumerate.""" |
| 79 | + results = [] |
| 80 | + for idx, term in enumerate(terms): |
| 81 | + result = lookup(dictionary, term) |
| 82 | + # WHY: Attaching the original index lets callers correlate results |
| 83 | + # back to input order, which matters for batch processing. |
| 84 | + result["index"] = idx |
| 85 | + results.append(result) |
| 86 | + return results |
| 87 | + |
| 88 | + |
| 89 | +def dictionary_stats(dictionary: dict[str, str]) -> dict: |
| 90 | + """Compute statistics about the dictionary.""" |
| 91 | + # WHY: Set comprehension collects unique first letters in O(n) time. |
| 92 | + # Sets automatically discard duplicates. |
| 93 | + first_letters: set[str] = {k[0] for k in dictionary if k} |
| 94 | + |
| 95 | + # WHY: sorted() with a key function lets us rank entries by definition |
| 96 | + # length without modifying the original dict. |
| 97 | + sorted_by_length = sorted( |
| 98 | + dictionary.keys(), |
| 99 | + key=lambda k: len(dictionary[k]), |
| 100 | + reverse=True, |
| 101 | + ) |
| 102 | + |
| 103 | + return { |
| 104 | + "total_entries": len(dictionary), |
| 105 | + "unique_first_letters": sorted(first_letters), |
| 106 | + "longest_definitions": sorted_by_length[:5], |
| 107 | + "shortest_definitions": sorted_by_length[-5:], |
| 108 | + } |
| 109 | + |
| 110 | + |
| 111 | +def parse_args() -> argparse.Namespace: |
| 112 | + """Parse command-line arguments.""" |
| 113 | + parser = argparse.ArgumentParser( |
| 114 | + description="Dictionary lookup with fuzzy matching" |
| 115 | + ) |
| 116 | + parser.add_argument( |
| 117 | + "--dict", |
| 118 | + default="data/sample_input.txt", |
| 119 | + help="Path to the dictionary file (key=value per line)", |
| 120 | + ) |
| 121 | + parser.add_argument( |
| 122 | + "--lookup", |
| 123 | + nargs="*", |
| 124 | + default=[], |
| 125 | + help="Terms to look up", |
| 126 | + ) |
| 127 | + parser.add_argument( |
| 128 | + "--stats", |
| 129 | + action="store_true", |
| 130 | + help="Print dictionary statistics", |
| 131 | + ) |
| 132 | + return parser.parse_args() |
| 133 | + |
| 134 | + |
| 135 | +def main() -> None: |
| 136 | + """Entry point: load dictionary, run lookups, print results.""" |
| 137 | + args = parse_args() |
| 138 | + dictionary = load_dictionary(Path(args.dict)) |
| 139 | + |
| 140 | + if args.stats: |
| 141 | + stats = dictionary_stats(dictionary) |
| 142 | + print(f"=== Dictionary Statistics ===") |
| 143 | + for key, value in stats.items(): |
| 144 | + print(f" {key}: {value}") |
| 145 | + return |
| 146 | + |
| 147 | + if args.lookup: |
| 148 | + results = batch_lookup(dictionary, args.lookup) |
| 149 | + else: |
| 150 | + samples = list(dictionary.keys())[:3] + ["nonexistent"] |
| 151 | + results = batch_lookup(dictionary, samples) |
| 152 | + |
| 153 | + print(f"=== Lookup Results ===\n") |
| 154 | + print(f" {'Term':<20} {'Found':>6} {'Definition / Suggestions'}") |
| 155 | + print(f" {'-'*20} {'-'*6} {'-'*30}") |
| 156 | + for r in results: |
| 157 | + term = r["term"] |
| 158 | + if r["found"]: |
| 159 | + print(f" {term:<20} {'yes':>6} {r['definition']}") |
| 160 | + else: |
| 161 | + suggestions = ", ".join(r.get("suggestions", [])) |
| 162 | + hint = f"Did you mean: {suggestions}" if suggestions else "(no matches)" |
| 163 | + print(f" {term:<20} {'no':>6} {hint}") |
| 164 | + |
| 165 | + |
| 166 | +if __name__ == "__main__": |
| 167 | + main() |
25 | 168 | ``` |
26 | 169 |
|
27 | | -## Design decisions |
| 170 | +## Design Decisions |
28 | 171 |
|
29 | | -| Decision | Why | Alternative considered | |
30 | | -|----------|-----|----------------------| |
31 | | -| load_dictionary function | [reason] | [alternative] | |
32 | | -| lookup function | [reason] | [alternative] | |
33 | | -| batch_lookup function | [reason] | [alternative] | |
| 172 | +| Decision | Why | |
| 173 | +|----------|-----| |
| 174 | +| `split("=", 1)` for parsing | Definitions may contain `=` characters (e.g. URLs). Splitting on only the first `=` preserves the full definition. | |
| 175 | +| `try/except KeyError` instead of `dict.get()` | The two code paths (found vs not found) are very different. Using try/except makes each path explicit and keeps the happy path clean. | |
| 176 | +| Lowercase normalisation on load | Case-insensitive lookups are the expected default. Normalising once at load time avoids doing it on every lookup. | |
| 177 | +| Fuzzy matching with `difflib` | Returning "did you mean?" suggestions transforms a dead-end miss into a helpful interaction, which is critical for user-facing tools. | |
| 178 | +| Returning structured dicts, not strings | Dicts are machine-readable. Callers can decide how to display results (table, JSON, GUI) without parsing strings. | |
34 | 179 |
|
35 | | -## Alternative approaches |
| 180 | +## Alternative Approaches |
36 | 181 |
|
37 | | -### Approach B: [Name] |
| 182 | +### Using `dict.get()` instead of `try/except` |
38 | 183 |
|
39 | 184 | ```python |
40 | | -# [Different valid approach with trade-offs explained] |
| 185 | +def lookup_with_get(dictionary, term): |
| 186 | + normalised = term.strip().lower() |
| 187 | + definition = dictionary.get(normalised) |
| 188 | + if definition is not None: |
| 189 | + return {"found": True, "term": normalised, "definition": definition} |
| 190 | + suggestions = difflib.get_close_matches(normalised, dictionary.keys()) |
| 191 | + return {"found": False, "term": normalised, "suggestions": suggestions} |
41 | 192 | ``` |
42 | 193 |
|
43 | | -**Trade-off:** [When you would prefer this approach vs the primary one] |
| 194 | +This is simpler and perfectly valid. The trade-off: `dict.get()` cannot distinguish between a key that maps to `None` and a missing key. In this project that does not matter (all values are strings), but the `try/except` pattern is more general and worth practicing. |
| 195 | + |
| 196 | +### Using the `csv` module instead of manual parsing |
| 197 | + |
| 198 | +For more complex dictionary files (quoted values, multi-line entries), Python's `csv` module handles edge cases automatically. The manual approach here is chosen because the file format is simple and it teaches `str.split()` mechanics directly. |
44 | 199 |
|
45 | | -## What could go wrong |
| 200 | +## Common Pitfalls |
46 | 201 |
|
47 | | -| Scenario | What happens | Prevention | |
48 | | -|----------|-------------|------------| |
49 | | -| [bad input] | [error/behavior] | [how to handle] | |
50 | | -| [edge case] | [behavior] | [how to handle] | |
| 202 | +1. **Splitting on every `=` sign** — Using `line.split("=")` without the `maxsplit=1` argument will break definitions containing `=`. For example, `url=https://example.com/a=b` would incorrectly split into three parts instead of two. |
51 | 203 |
|
52 | | -## Key takeaways |
| 204 | +2. **Forgetting to normalise input** — If you normalise keys to lowercase at load time but forget to lowercase the search term, "Python" will not match "python". Always normalise both sides of a comparison. |
53 | 205 |
|
54 | | -1. [Most important lesson from this project] |
55 | | -2. [Second lesson] |
56 | | -3. [Connection to future concepts] |
| 206 | +3. **Bare `except:` instead of `except KeyError`** — Catching all exceptions hides real bugs (like `TypeError` from passing a non-string key). Always catch the most specific exception type you expect. |
0 commit comments